affect-lab cognition · neural nets · emotion · personality

1. The artificial neuron

A model neuron computes a weighted sum of its inputs plus a bias, then squashes it through an activation: $y=\sigma\!\left(\sum_i w_i x_i + b\right)$. Drag the weights and bias and watch the pre-activation $z$ and the firing rate $y$ respond — the artificial echo of a biological neuron's dendrites, soma and axon.

$z=\sum w_ix_i+b$
$y=\sigma(z)$

2. The perceptron & its decision boundary

A single perceptron splits the plane with a line $w_1x_1+w_2x_2+b=0$. Click to drop points of the active class, then run the perceptron learning rule — each misclassified point nudges the weights $w \mathrel{+}= \eta\,(t-y)\,x$ until the classes separate (if they are linearly separable).

points0
epoch0
accuracy

click canvas to add a point of the selected class

3. Activation functions

Non-linearities are what let deep networks model curved decision surfaces. Compare the classic squashing functions and ReLU family — and inspect the derivative, since vanishing gradients (where $\sigma'(z)\approx 0$) are exactly why some activations train poorly.

$\sigma(z)$
$\sigma'(z)$

4. Gradient descent on a loss surface

Training minimises a loss. Watch the optimiser roll downhill on a 1-D loss curve via $\theta \mathrel{-}= \eta\,\nabla L(\theta)$. Too small a learning rate crawls; too large overshoots or diverges — the central tuning problem of stochastic gradient descent.

step0
$\theta$
loss $L(\theta)$

5. Multi-layer perceptron & the XOR problem

A single perceptron cannot learn XOR — it is not linearly separable. A small feedforward network with one hidden layer can. This MLP trains by backpropagation; watch the loss fall and the learned decision surface bend to carve out the XOR pattern.

epoch0
loss (MSE)

6. Hebbian learning & dimensionality reduction

"Cells that fire together, wire together." A linear Hebbian unit with Oja's stabilising rule $\Delta w = \eta\,y\,(x - y\,w)$ converges onto the first principal component — the direction of greatest variance. This is unsupervised learning finding the latent axis of a cloud of data.

weight $w$
angle of $w$
PC1 angle

7. Convolution — the building block of CNNs

Convolutional nets see by sliding small kernels over an image and summing local products. Pick an edge / blur / sharpen kernel and watch the feature map it produces — the same operation the early visual cortex approximates with oriented receptive fields.

input size9 × 9
output size7 × 7

left = input · centre = 3×3 kernel · right = feature map

8. The valence–arousal circumplex

Russell's circumplex models emotion as a point in a 2-D plane: horizontal valence (unpleasant → pleasant) and vertical arousal (calm → activated). Drag the marker and read off the nearest discrete emotion label — the bridge between dimensional and categorical (Ekman) models.

angle
intensity
nearest emotion

drag the marker on the canvas too

9. PAD — the three-dimensional emotion space

The Pleasure–Arousal–Dominance model adds a third axis, dominance (feeling in control vs. controlled), which separates emotions the 2-D circumplex confuses — e.g. anger and fear share low pleasure and high arousal but differ sharply in dominance.

octant
nearest prototype

10. Big Five (OCEAN) personality profile

The dominant trait model in psychology describes personality along five continuous dimensions. Set the sliders to build a profile; the radar chart and the nearest archetype update live — the same vector an AI persona designer would tune to give an agent a consistent character.

nearest archetype

11. Attention weights

Attention — the mechanism behind large language models — lets a query token decide how much to read from every other token. Scores $q\cdot k_i$ are turned into a probability distribution by softmax; raise the temperature to spread attention, lower it to make it sharp and peaked.

peak token
entropy
max weight

bars = softmax attention from query over all keys