Word2Vec and General on NLP

Word2vec is  technique for natural language processing to learn word association from a large corpus of text. Previousely, in techniques such as LSA, words represented locallly without consideration of distributional similarity or inherent notion.
In 2013, Mikolov published his paper on word2vec and this theory is the most advanced up till now.

What’s the main idea of word2vec? It is composed of two algorithms: Skip-grams and CBOW (continuous bag of words).

  1. Skip-Grams

here is an updated prettier version of this concept by Dr.Manning

To grasp word2vec, let’s start from the very basic, the vector space model

This measure obviously is not adequate, we know D(1, 3) and D(10, 30) similarity is one, so the number of the word showing up is not taken into account in this cosine similarity computation based on simple bag of words vector space, furthermore, we also don’t consider the distribution of words – you could have same words shown up but one clustered, the other scattered around. Third, if we count every distinctive word as orthogonal vector, it defies the reality that some words such as building and edifice are synonyms so their vectors should be closer to each other.

NLP’s Passing Methods
There are two main approaches bottom-up and top-down. And it relies on dynamic programming which caches intermediate results (memoization). A famous method is Cocke-Kasami-Younger (CKY) parser.

Probabilistic NLP: Bayes Theorem, on which we explore language models: N-gram model

Information Retrieval Toolkits: Smart, MG, Lemur, Terrir, Clairlib, Lucene.

Sentiment Analysis: classification problem, MaxEnt, SVM and Naïve Bayes

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.