Keras and IMDB: Love, hate, sex and money

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.

I have recently installed Anaconda, TensorFlow, and Keras in my laptop PC as part of my Deep Machine Learning (DML) plan.

I am reading two books in for my learning effort in parallel: MIT’s Deep Learning book (free online), who provides a strong theoretical background, and Deep Learning with Python. The latter is focused on hands-on training on DML, and it contains almost no math (I must confess this is rather appealing for me).

One of the first examples of the book uses Keras built in IMDB dataset. On this dataset, user’s comments on IMDB movie database were encoded using a dictionary. In such a database, more used words receive smaller encoding numbers, and vice-versa.

The first words are expected: “the”, “and”, “this” and “that” (not strictly the very first ones, but you get the picture”. Among the ten first words, there is one that quizzes me: “br”. Could it be the HTML tag for break?

Anyway, I almost asked myself right away, what are top used words for commentaries, once we set apart the trivial articles and personal pronouns?

My first three words to test were “love”, “hate” and “sex”. I loved the results and told my wife, who proposed several other words. For this article, I concentrated on four of them. You probably already guessed them: Love, hate, sex, and money.

The code to get the words is rather naive. Pardon me, I have just started with Python and Keras. Here it goes:

from keras.datasets import imdb
word_index = imdb.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

i = 1
while i < 10000:
    if reverse_word_index.get(i) == "money":
        print('money is in position ' + str(i))
    if reverse_word_index.get(i) == "love":
        print('love is in position ' + str(i))
    if reverse_word_index.get(i) == "sex":
        print('sex is in position ' + str(i))
    if reverse_word_index.get(i) == "hate":
        print('hate is in position ' + str(i))
    i += 1