πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Word Embeddings

Neural NLPDistributed Representations🟒 Free Lesson

Advertisement

Word Embeddings

Word embeddings are dense, continuous vector representations of words in a low-dimensional space. Unlike sparse BoW or TF-IDF vectors, embeddings capture semantic relationshipsβ€”words with similar meanings have similar vectors.

Skip-gram Objective

maxβ‘βˆ‘t=1Tβˆ‘βˆ’c≀j≀c,jβ‰ 0log⁑P(wt+j∣wt)\max \sum_{t=1}^{T} \sum_{-c \leq j \leq c, j \neq 0} \log P(w_{t+j} | w_t)

Skip-gram Probability

P(wO∣wI)=exp⁑(vwOβ€²TvwI)βˆ‘w=1Wexp⁑(vwβ€²TvwI)P(w_O | w_I) = \frac{\exp(v'_{w_O}{}^T v_{w_I})}{\sum_{w=1}^{W} \exp(v'_w{}^T v_{w_I})}

Word Embedding Methods Comparison

MethodTrainingContext WindowCaptures MorphologyYear
Word2Vec (CBOW)Shallow NNFixed windowNo2013
Word2Vec (Skip-gram)Shallow NNFixed windowNo2013
GloVeMatrix factorizationGlobal co-occurrenceNo2014
FastTextSubword embeddingsFixed windowYes2017

Using Pre-trained Embeddings

import numpy as np
from gensim.models import KeyedVectors

# Load pre-trained Word2Vec
model = KeyedVectors.load_word2vec_format(
    'GoogleNews-vectors-negative300.bin', binary=True
)

# Similarity
print(model.similarity('king', 'queen'))  # ~0.65
print(model.similarity('king', 'man'))    # ~0.32

# Most similar
print(model.most_similar('computer', topn=5))
# [('computers', 0.87), ('software', 0.78), ...]

# Analogy: king - man + woman = queen
result = model.most_similar(
    positive=['king', 'woman'],
    negative=['man'],
    topn=1
)
print(result)  # [('queen', 0.71)]

Cosine Similarity for Embeddings

Cosine Similarity for Vectors

sim(u,v)=uβ‹…vβˆ₯uβˆ₯β‹…βˆ₯vβˆ₯\text{sim}(u, v) = \frac{u \cdot v}{\|u\| \cdot \|v\|}
def cosine_sim(v1, v2):
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

king = model['king']
queen = model['queen']
man = model['man']

print(f"king-queen: {cosine_sim(king, queen):.3f}")
print(f"king-man: {cosine_sim(king, man):.3f}")

Visualizing Embeddings

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

words = ['king', 'queen', 'man', 'woman', 'prince', 'princess',
         'cat', 'dog', 'fish', 'bird']

vectors = np.array([model[w] for w in words])
tsne = TSNE(n_components=2, random_state=42)
coords = tsne.fit_transform(vectors)

plt.figure(figsize=(10, 8))
plt.scatter(coords[:, 0], coords[:, 1], c='red', alpha=0.7)
for i, word in enumerate(words):
    plt.annotate(word, (coords[i, 0], coords[i, 1]))
plt.title("Word Embeddings Visualization")
plt.show()

Properties of Word Embeddings

  • Semantic similarity: Similar words have close vectors
  • Linear relationships: king - man + woman β‰ˆ queen
  • Clustering: Semantically related words form clusters
  • Compositionality: Can combine vectors for phrases
⭐

Premium Content

Word Embeddings

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert NLP Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement