Word Similarity with Embeddings

medium · embeddings, similarity, nlp

Word Similarity with Embeddings

Work with pre-computed word embeddings to find similar words and solve word analogies.

Background

Word embeddings represent words as dense vectors where similar words have similar vectors. You'll implement functions to work with these embeddings.

Functions to implement

1. cosine_similarity(v1, v2)

Compute cosine similarity between two vectors.

2. find_most_similar(word, embeddings, top_k)

Find the k most similar words to a given word.

  • embeddings is a dict mapping words to their vectors
  • Return list of (word, similarity) tuples, sorted by similarity descending

3. word_analogy(a, b, c, embeddings)

Solve "a is to b as c is to ?" using vector arithmetic.

  • Compute: result = embeddings[b] - embeddings[a] + embeddings[c]
  • Return the word most similar to the result vector (excluding a, b, c)

Examples

embeddings = {
    "king": [0.5, 0.7, 0.1],
    "queen": [0.6, 0.8, 0.1],
    "man": [0.4, 0.2, 0.1],
    "woman": [0.5, 0.3, 0.1],
}

# king - man + woman ≈ queen
word_analogy("man", "king", "woman", embeddings)  # "queen"
Run tests to see results
No issues detected