One-Hot Encoding

easy · features, encoding, categorical

One-Hot Encoding

Implement one-hot encoding for categorical features.

Functions to implement

1. get_categories(column)

Extract unique categories from a column.

  • Input: List of category values
  • Output: Sorted list of unique categories

2. encode_value(value, categories)

Encode a single value as one-hot vector.

  • Input: A value and list of categories
  • Output: One-hot vector (list of 0s with one 1)

3. one_hot_encode(column)

Encode an entire column.

  • Input: List of category values
  • Output: (encoded matrix, categories)

4. decode(one_hot_row, categories)

Decode a one-hot vector back to original value.

  • Input: One-hot vector and categories
  • Output: Original category value

Examples

column = ["red", "blue", "red", "green"]
categories = get_categories(column)
# categories: ["blue", "green", "red"]

encode_value("blue", categories)
# [1, 0, 0]

matrix, cats = one_hot_encode(column)
# matrix: [[0, 0, 1], [1, 0, 0], [0, 0, 1], [0, 1, 0]]

decode([1, 0, 0], categories)
# "blue"

Notes

  • Categories should be sorted for consistent ordering
  • Each one-hot vector has exactly one 1
Run tests to see results
No issues detected