Keras and IMDB: Love, hate, sex and money

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlowCNTK, or Theano.

I have recently installed Anaconda, TensorFlow, and Keras in my laptop PC as part of my Deep Machine Learning (DML) plan.

I am reading two books in for my learning effort in parallel: MIT’s Deep Learning book (free online), who provides a strong theoretical background, and Deep Learning with Python. The latter is focused on hands-on training on DML, and it contains almost no math (I must confess this is rather appealing for me).

One of the first examples of the book uses Keras built in IMDB dataset. On this dataset, user’s comments on IMDB movie database were encoded using a dictionary. In such a database, more used words receive smaller encoding numbers, and vice-versa.

The first words are expected: “the”, “and”, “this” and “that” (not strictly the very first ones, but you get the picture”. Among the ten first words, there is one that quizzes me: “br”. Could it be the HTML tag for break?

Anyway, I almost asked myself right away, what are top used words for commentaries, once we set apart the trivial articles and personal pronouns?

My first three words to test were “love”, “hate” and “sex”. I loved the results and told my wife, who proposed several other words. For this article, I concentrated on four of them. You probably already guessed them: Love, hate, sex, and money.

The code to get the words is rather naive. Pardon me, I have just started with Python and Keras. Here it goes:

from keras.datasets import imdb
word_index = imdb.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

i = 1
while i < 10000:
    if reverse_word_index.get(i) == "money":
        print('money is in position ' + str(i))
    if reverse_word_index.get(i) == "love":
        print('love is in position ' + str(i))
    if reverse_word_index.get(i) == "sex":
        print('sex is in position ' + str(i))
    if reverse_word_index.get(i) == "hate":
        print('hate is in position ' + str(i))
    i += 1

 

If you are waiting for the results, here they are:

Love goes first. Then money, followed by sex. And hate is way on the back.

So if I had to judge the future of humanity, I would say it is quite good, at least based on IMDB reviews.

P.S.: The first two non-trivial words championing the list are, not surprisingly, “film” and “movie”. What words would YOU search on the list?

Leave a comment