Vectors and Distance Metrics

Data as Vectors

Each sample is a vector in feature space
Dimensionality = number of features
Relationships measured by distance/similarity

Euclidean Distance (L2)

d = √Σ(aᵢ - bᵢ)²
Straight-line distance
Sensitive to scale

Manhattan Distance (L1)

d = Σ|aᵢ - bᵢ|
Grid/taxicab distance
More robust to outliers

Cosine Similarity

cos(θ) = (a·b) / (||a|| ||b||)
Range: [-1, 1]
Ignores magnitude, only direction
Best for text and embeddings

When to Use Which

Euclidean: same-scale continuous features
Manhattan: high-dim, different scales
Cosine: text, direction matters

K-Nearest Neighbors

Compute distances to all points
Find k closest neighbors
Predict: majority vote (classification)
No training phase

Feature Scaling

Large features dominate distances
Standardize: z = (x - μ) / σ
Always normalize before distance computation

1 / 1

Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.