affect-lab · cognitive science, neural networks, emotion & personality for AI

1. The artificial neuron

A model neuron computes a weighted sum of its inputs plus a bias, then squashes it through an activation: $y=\sigma\!\left(\sum_i w_i x_i + b\right)$. Drag the weights and bias and watch the pre-activation $z$ and the firing rate $y$ respond — the artificial echo of a biological neuron's dendrites, soma and axon.

$x_1$ 1.0 $x_2$ 0.5 $w_1$ 0.8 $w_2$ -0.6 bias $b$ 0.0

$z=\sum w_ix_i+b$—

$y=\sigma(z)$—

2. The perceptron & its decision boundary

A single perceptron splits the plane with a line $w_1x_1+w_2x_2+b=0$. Click to drop points of the active class, then run the perceptron learning rule — each misclassified point nudges the weights $w \mathrel{+}= \eta\,(t-y)\,x$ until the classes separate (if they are linearly separable).

add class learning rate $\eta$ 0.10

points0

epoch0

accuracy—

click canvas to add a point of the selected class

3. Activation functions

Non-linearities are what let deep networks model curved decision surfaces. Compare the classic squashing functions and ReLU family — and inspect the derivative, since vanishing gradients (where $\sigma'(z)\approx 0$) are exactly why some activations train poorly.

function evaluate at $z$ 1.0 show derivative $\sigma'(z)$

$\sigma(z)$—

$\sigma'(z)$—

4. Gradient descent on a loss surface

Training minimises a loss. Watch the optimiser roll downhill on a 1-D loss curve via $\theta \mathrel{-}= \eta\,\nabla L(\theta)$. Too small a learning rate crawls; too large overshoots or diverges — the central tuning problem of stochastic gradient descent.

loss surface learning rate $\eta$ 0.12 start $\theta_0$ -2.4

step0

$\theta$—

loss $L(\theta)$—

5. Multi-layer perceptron & the XOR problem

A single perceptron cannot learn XOR — it is not linearly separable. A small feedforward network with one hidden layer can. This MLP trains by backpropagation; watch the loss fall and the learned decision surface bend to carve out the XOR pattern.

target pattern hidden units 4 learning rate $\eta$ 0.50

epoch0

loss (MSE)—

6. Hebbian learning & dimensionality reduction

"Cells that fire together, wire together." A linear Hebbian unit with Oja's stabilising rule $\Delta w = \eta\,y\,(x - y\,w)$ converges onto the first principal component — the direction of greatest variance. This is unsupervised learning finding the latent axis of a cloud of data.

cloud correlation 0.80 learning rate $\eta$ 0.02

weight $w$—

angle of $w$—

PC1 angle—

7. Convolution — the building block of CNNs

Convolutional nets see by sliding small kernels over an image and summing local products. Pick an edge / blur / sharpen kernel and watch the feature map it produces — the same operation the early visual cortex approximates with oriented receptive fields.

input kernel

input size9 × 9

output size7 × 7

left = input · centre = 3×3 kernel · right = feature map

8. The valence–arousal circumplex

Russell's circumplex models emotion as a point in a 2-D plane: horizontal valence (unpleasant → pleasant) and vertical arousal (calm → activated). Drag the marker and read off the nearest discrete emotion label — the bridge between dimensional and categorical (Ekman) models.

valence 0.50 arousal 0.50

angle—

intensity—

nearest emotion—

drag the marker on the canvas too

9. PAD — the three-dimensional emotion space

The Pleasure–Arousal–Dominance model adds a third axis, dominance (feeling in control vs. controlled), which separates emotions the 2-D circumplex confuses — e.g. anger and fear share low pleasure and high arousal but differ sharply in dominance.

pleasure $P$ 0.4 arousal $A$ 0.3 dominance $D$ 0.2

octant—

nearest prototype—

10. Big Five (OCEAN) personality profile

The dominant trait model in psychology describes personality along five continuous dimensions. Set the sliders to build a profile; the radar chart and the nearest archetype update live — the same vector an AI persona designer would tune to give an agent a consistent character.

openness 70 conscientiousness 60 extraversion 45 agreeableness 75 neuroticism 30

nearest archetype—

11. Attention weights

Attention — the mechanism behind large language models — lets a query token decide how much to read from every other token. Scores $q\cdot k_i$ are turned into a probability distribution by softmax; raise the temperature to spread attention, lower it to make it sharp and peaked.

query token temperature $\tau$ 1.0

peak token—

entropy—

max weight—

bars = softmax attention from query over all keys