RNN – New lyricist in Bollywood

ये जीवन है क्या तेरा ही जल जाना
तेरा दिल कहें कितना हसीन ये सपना
ये ज़िंदगी दर्द है मेरी ज़िंदगी से कहूँ मैं
हर तरफ उसका होना है सब कुछ सपना

#RNN generated #hindi poem. #machinelearning #indian languages #deeplearning

Can machine write songs as good as some of the well-known lyricists of Bollywood? Can machines construct meaningful sentences along with context which makes the song complete? Deep learning has just opened a door for all us to try this possibility.

IT industry also already concluded that 2017 is going to be the Year of Deep Learning. Furthermore, it is one of the profound discoveries which is going to change the shape of our life in next 2-5 years.

What is really ‘Deep Leaning’?  The simplest way to explain Deep Learning is – It’s a magic. Yes, magic is the only apt word that one can use to describe Deep Learning. It is sub-domain of machine learning. It’s called Deep because of multiple layers of interconnected neural networks.

I started reading about it in the early part of 2015. My interest level went up when Google released TensorFlow library under the Apache 2.0 open source license in November 2015. This led me to further explore this topic. I became really serious about deep learning after reading Andrej Karpathy’s famous blog about RNN – Recurrent Neural Networks.  That’s when decided to stop reading and get my hands dirty on a small project in Deep Learning.

RNN – Recurrent Neural Network can process arbitrary sequence of inputs using internal memory. This allows RNN to exhibit dynamic temporal behavior. I obviously picked up LSTM (Long Term Short Memory) for my project.

I was looking for a project which is unique and something that possibly no one tried till now. I thought the unique project has its own challenges. It would force me to develop a unique solution or at least I can avoid getting biased with what is readily available on the web.

My wife, Manjusha is a die hard fan of old Hindi songs (1950-1980). She introduced me to some of the finest lyrics in Bollywood history. She collects and publishes these soulful, meaningful lyrics at geetmanjusha.com. I decided to take Hindi Lyrics as my project for couple of reasons

  • Unicode data set may have its own set of challenges
  • No one has ever tried to use RNN and Hindi together
  • Poetry doesn’t follow normal sentence structure
    • A normal sentence like ‘मेरा नाम उमेश है’ (My name is Umesh) can have 24 variations and all are valid in poetry. Below are few examples
      • मेरा नाम है, उमेश
      • नाम है मेरा, उमेश
      • उमेश नाम है मेरा
      • नाम मेरा उमेश है

Data

After deciding the project, next obvious step in deep learning is to construct training set. This is the most crucial step as your end results solely depend on quality and quantity of data. For data, more the merrier. More data simply means more learning opportunity and better results. However, more data also pose problems.

In Hindi,  ‘ज़िंदगी’ (Life) and ‘ज़िन्दगी’ are valid words. If you have a handful of examples for ‘ज़िन्दगी’, then the machine cannot understand its usage in different sentences.  Similarly,  एक अकेला इस शहर में, रात में और दोपहर में (Ek Akela Is Shahar Mein) is the only song that I know in Bollywood where Gulzar (a well-known lyricist) has used the word  ‘आब-ओ-दाना’ (aab-o-dana). Having only one song in your data not going to help the machine to understand the usage of this word.

Another example is Shakeel Badayuni’s famous song from Hindi movie Ganga Jamuna – नैन लड़ जइ हैं तो मनवामा कसक होईबे करी (Nain Lad Jai Hai To). I believe this song is in Avadhi language. Due to similar words in Hindi, a human being can understand this song. However, a machine would need multiple examples in order to learn correct usage.

In short, you cannot take anything as your training data. You need to scrub it and remove noise from training data. Finally, I was ready with 2MB data.

Framework

There are several frameworks available to support Deep Leaning. Some of the popular frameworks are – TensorFlow, Torch, theano, Caffe. Most people prefer to use Python-based Keras library as it can run on top of theano or TensorFlow.

I chose Torch as I was looking for a framework which runs on GPU or non-GPU mode. Other libraries also support these modes. I tried TensorFlow and felt that its slow on CPU. As I wanted to try few things on my old faithful Mac, I was looking for a framework which works equally good on GPU or non-GPU hardware.

Bottomline, I’ve not done comprehensive analysis while selecting the framework.

CPU/GPU

This is a big deal in deep learning. CPU seems like taking ages to complete training as compared to GPU. If you have access to GPU-based hardware, use it. If not, you can still achieve the results however it is painfully slow. I mean damn slow. Many a time, my 20 epoch run used to take 8 hours with CPU. The Same load would take 1hr on GPU. Please do not conclude that performance gain is 1:8. My point is, GPU adds significant performance gain. I used Amazon P2 to carry out this experimentation. I also used Deep learning AMI which supports MXNet, Caffe, Tensorflow, CNTK and Torch.

Approach

Initially, I just wanted to try few things so I did not bother to keep track of all the parameters. However, I realized my mistake and started keeping a log of every change in chronological order. I kept a log of data changes, training parameters and results. Deep learning training is sensitive to parameter changes. Minor changes could have a significant impact on the results. If you are performing training on CPU, it really tests your patience level. I must have run this experiment 100 times before concluding on final results. As I worked mostly
on weekends, it took close to 4 months for me to reach the conclusion.

Live as if you were to die tomorrow, learn as if you were to live foreever – Mahatma Gandhi