Vectors and Distance Metrics

Data as Vectors

  • Each sample is a vector in feature space
  • Dimensionality = number of features
  • Relationships measured by distance/similarity

Euclidean Distance (L2)

  • d = √Σ(aᵢ - bᵢ)²
  • Straight-line distance
  • Sensitive to scale

Manhattan Distance (L1)

  • d = Σ|aᵢ - bᵢ|
  • Grid/taxicab distance
  • More robust to outliers

Cosine Similarity

  • cos(θ) = (a·b) / (||a|| ||b||)
  • Range: [-1, 1]
  • Ignores magnitude, only direction
  • Best for text and embeddings

When to Use Which

  • Euclidean: same-scale continuous features
  • Manhattan: high-dim, different scales
  • Cosine: text, direction matters

K-Nearest Neighbors

  • Compute distances to all points
  • Find k closest neighbors
  • Predict: majority vote (classification)
  • No training phase

Feature Scaling

  • Large features dominate distances
  • Standardize: z = (x - μ) / σ
  • Always normalize before distance computation
1 / 1
Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.