Vectors and Distance Metrics

Lesson, slides, and applied problem sets.

Lesson

Vectors and Distance Metrics

Why this module exists

In machine learning, data points are vectors, and the relationships between them are measured by distances and similarities. Whether you're clustering, classifying, or finding nearest neighbors, you need to know how to measure "closeness."

Different distance metrics capture different notions of similarity. Choosing the right one matters.

1) Vectors as data points

Every data point in ML is a vector in some feature space:

# A person represented as a vector
person = [age, height, weight, income]

# An image as a flattened vector
image = [pixel_0, pixel_1, ..., pixel_n]

# A word as an embedding vector
word = [0.2, -0.5, 0.8, ...]  # learned representation

The dimensionality is the number of features. Real-world data is often high-dimensional.

2) Euclidean distance (L2)

The straight-line distance between two points:

def euclidean_distance(a, b):
    return sqrt(sum((a[i] - b[i])**2 for i in range(len(a))))

Formula: d(a, b) = √Σ(aᵢ - bᵢ)²

Properties:

Most intuitive "distance"
Sensitive to scale (large features dominate)
Works well when features are comparable in scale

3) Manhattan distance (L1)

Sum of absolute differences along each dimension:

def manhattan_distance(a, b):
    return sum(abs(a[i] - b[i]) for i in range(len(a)))

Formula: d(a, b) = Σ|aᵢ - bᵢ|

Also called "city block" or "taxicab" distance (like walking in a grid city).

Properties:

More robust to outliers than Euclidean
Good for sparse, high-dimensional data
Useful when features are on different scales

4) Cosine similarity

Measures the angle between vectors, ignoring magnitude:

def cosine_similarity(a, b):
    dot = sum(a[i] * b[i] for i in range(len(a)))
    mag_a = sqrt(sum(x**2 for x in a))
    mag_b = sqrt(sum(x**2 for x in b))
    return dot / (mag_a * mag_b)

Formula: cos(θ) = (a · b) / (||a|| ||b||)

Properties:

Range: [-1, 1] (or [0, 1] for non-negative vectors)
1 = same direction, 0 = perpendicular, -1 = opposite
Ignores magnitude: only cares about direction
Perfect for text (TF-IDF), recommendations, embeddings

Cosine distance: 1 - cosine_similarity

5) When to use which metric

Metric	Best for	Sensitive to
Euclidean	Continuous, same-scale features	Magnitude, scale
Manhattan	High-dimensional, different scales	Less outlier-sensitive
Cosine	Text, embeddings, direction matters	Only angle

General guidance:

Normalize data when using Euclidean
Use cosine for text/document similarity
Experiment! The best metric is data-dependent

6) K-Nearest Neighbors (KNN)

A simple but powerful algorithm that uses distances:

def knn_predict(query, data, labels, k):
    # 1. Compute distances to all training points
    distances = [(distance(query, x), label)
                 for x, label in zip(data, labels)]

    # 2. Find k nearest neighbors
    nearest = sorted(distances)[:k]

    # 3. Vote: most common label wins
    return most_common([label for _, label in nearest])

Properties:

No training phase (lazy learner)
Works for classification and regression
Choice of k matters: too small = noise, too large = blur
Choice of distance metric matters!

7) The curse of dimensionality

In high dimensions, distances become less meaningful:

All points become "approximately equidistant"
Volume concentrates in corners
More data needed to cover the space

Implications:

KNN struggles in very high dimensions
Need dimensionality reduction (PCA, embeddings)
Feature selection becomes important

Rule of thumb: If dimensions > samples, be careful.

8) Feature scaling for distances

Euclidean distance is dominated by large-scale features:

# Height in cm, age in years
person1 = [170, 25]
person2 = [180, 26]
# Distance ≈ 10 (height dominates!)

Solutions:

Standardization: z = (x - mean) / std
Min-max normalization: x' = (x - min) / (max - min)
Use scale-invariant metrics: cosine similarity

Always normalize before computing distances (unless you have a reason not to).

9) Similarity vs distance

They're related but inverted:

High similarity = low distance
Distance ≥ 0, similarity can be any range

Conversions:

similarity = 1 / (1 + distance)
similarity = exp(-distance)
cosine_distance = 1 - cosine_similarity

10) Applications

Information retrieval: Find documents similar to a query Recommendation systems: Find users/items similar to current Anomaly detection: Flag points far from normal Clustering: Group similar points together Classification: KNN, kernel methods

Key takeaways

Data points are vectors; similarity is measured by distances
Euclidean: straight-line, sensitive to scale
Manhattan: grid-based, robust to outliers
Cosine: angle-based, ignores magnitude (great for text)
Always normalize features before computing distances
KNN is simple: find nearest points, vote on label
High dimensions break intuition (curse of dimensionality)

Lesson

Vectors and Distance Metrics

Why this module exists

1) Vectors as data points

2) Euclidean distance (L2)

3) Manhattan distance (L1)

4) Cosine similarity

5) When to use which metric

6) K-Nearest Neighbors (KNN)

7) The curse of dimensionality

8) Feature scaling for distances

9) Similarity vs distance

10) Applications

Key takeaways

Module Items

Distance Metrics

K-Nearest Neighbors