One-Hot Encoding
One-Hot Encoding
Implement one-hot encoding for categorical features.
Functions to implement
1. get_categories(column)
Extract unique categories from a column.
- Input: List of category values
- Output: Sorted list of unique categories
2. encode_value(value, categories)
Encode a single value as one-hot vector.
- Input: A value and list of categories
- Output: One-hot vector (list of 0s with one 1)
3. one_hot_encode(column)
Encode an entire column.
- Input: List of category values
- Output: (encoded matrix, categories)
4. decode(one_hot_row, categories)
Decode a one-hot vector back to original value.
- Input: One-hot vector and categories
- Output: Original category value
Examples
column = ["red", "blue", "red", "green"]
categories = get_categories(column)
# categories: ["blue", "green", "red"]
encode_value("blue", categories)
# [1, 0, 0]
matrix, cats = one_hot_encode(column)
# matrix: [[0, 0, 1], [1, 0, 0], [0, 0, 1], [0, 1, 0]]
decode([1, 0, 0], categories)
# "blue"
Notes
- Categories should be sorted for consistent ordering
- Each one-hot vector has exactly one 1
Run tests to see results
No issues detected