Classification: Decision Boundaries

Lesson, slides, and applied problem sets.

Lesson

Classification: Decision Boundaries

Why this module exists

Classification is predicting categories: spam or not spam, which digit, what disease. It's one of the most common ML tasks. Understanding classification means understanding decision boundaries, probability outputs, and how models separate classes.

1) Classification vs regression

Regression: Predict continuous values (price, temperature) Classification: Predict discrete categories (spam, cat/dog, digit 0-9)

Classification can be:

Binary: Two classes (yes/no, positive/negative)
Multi-class: Many classes (digit recognition: 0-9)
Multi-label: Multiple labels per sample (tags on a photo)

2) The decision boundary

A classifier learns a boundary that separates classes in feature space.

       Class B
         |
    x x  |  o o o
   x x x | o o
  x x    |   o o
---------+----------
       Class A

The boundary can be:

Linear: A straight line (or hyperplane in higher dimensions)
Non-linear: Curves, complex shapes

3) Linear classifiers

A linear classifier uses a weighted sum of features:

score = w0 + w1*x1 + w2*x2 + ... + wn*xn
      = w · x + b  # dot product + bias

Decision rule:

If score > 0: predict class 1
If score ≤ 0: predict class 0

The weights define the decision boundary orientation.

4) From scores to probabilities

Raw scores are hard to interpret. We convert to probabilities.

Sigmoid function (for binary classification):

def sigmoid(z):
    return 1 / (1 + exp(-z))

# Properties:
# sigmoid(0) = 0.5
# sigmoid(large positive) → 1
# sigmoid(large negative) → 0

Output is P(class=1 | features).

5) Logistic regression

Despite the name, it's a classification algorithm:

def logistic_regression_predict(x, w, b):
    z = dot(w, x) + b
    prob = sigmoid(z)
    return prob  # P(class=1)

Training minimizes binary cross-entropy:

L = -[y log(p) + (1-y) log(1-p)]

Gradient update:

∂L/∂w = (p - y) × x
w = w - lr × (p - y) × x

6) Multi-class classification

For K classes, predict a probability distribution:

scores = [w0·x, w1·x, ..., wK·x]  # one score per class
probs = softmax(scores)           # sum to 1
prediction = argmax(probs)        # class with highest prob

Softmax:

def softmax(scores):
    exp_scores = [exp(s) for s in scores]
    total = sum(exp_scores)
    return [e / total for e in exp_scores]

7) Cross-entropy loss for multi-class

def cross_entropy(y_true, probs):
    # y_true is the correct class index
    return -log(probs[y_true])

This encourages high probability on the correct class.

Over a batch:

L = -(1/n) Σ log(p[y_true_i])

8) One-vs-all (OvA) classification

Another approach for multi-class: train K binary classifiers.

Classifier 1: class 0 vs rest
Classifier 2: class 1 vs rest
...
Classifier K: class K-1 vs rest

At prediction, use the classifier with highest confidence.

9) Training a logistic regression model

def train_logistic_regression(X, y, lr=0.01, epochs=1000):
    n_features = len(X[0])
    w = [0.0] * n_features
    b = 0.0

    for epoch in range(epochs):
        for i in range(len(X)):
            # Forward pass
            z = dot(w, X[i]) + b
            p = sigmoid(z)

            # Compute gradient
            error = p - y[i]

            # Update weights
            for j in range(n_features):
                w[j] -= lr * error * X[i][j]
            b -= lr * error

    return w, b

10) Non-linear decision boundaries

Linear models can't separate non-linear patterns (like XOR).

Solutions:

Feature engineering: Add x², x×y, etc.
Kernel methods: Implicit high-dimensional mapping
Neural networks: Learn non-linear transformations

11) Margin and confidence

The margin is the distance from a point to the decision boundary.

Large margin: confident prediction
Small margin: uncertain (near the boundary)

Support Vector Machines (SVMs) maximize the margin—they find the boundary with maximum separation.

12) Probabilistic interpretation

Logistic regression gives calibrated probabilities:

P(spam | email) = 0.95 means 95% confidence

This is useful for:

Ranking (sort by probability)
Threshold adjustment (flag if p > 0.9)
Combining with other information

Not all classifiers give calibrated probabilities; logistic regression does.

13) Class imbalance

When one class is much more common:

Model might predict majority class always
Accuracy can be misleading (99% accuracy if 99% are class 0)

Solutions:

Resampling: Oversample minority or undersample majority
Class weights: Penalize majority class mistakes less
Different metrics: Use F1, AUC instead of accuracy

Key takeaways

Classification predicts discrete categories
Decision boundaries separate classes in feature space
Sigmoid converts scores to binary probabilities
Softmax converts scores to multi-class probabilities
Logistic regression: simple, interpretable, well-calibrated
Cross-entropy loss trains classifiers
Handle class imbalance carefully

Classification: Decision Boundaries

Lesson

Classification: Decision Boundaries

Why this module exists

1) Classification vs regression

2) The decision boundary

3) Linear classifiers

4) From scores to probabilities

5) Logistic regression

6) Multi-class classification

7) Cross-entropy loss for multi-class

8) One-vs-all (OvA) classification

9) Training a logistic regression model

10) Non-linear decision boundaries

11) Margin and confidence

12) Probabilistic interpretation

13) Class imbalance

Key takeaways

Module Items

Logistic Regression

Multiclass Classification with Softmax