Friday, June 19, 2009

Hadoop On Ubuntu

I am ramping myself up to MapReduce and GFS as part of a very exciting new opportunity. I needed to install Hadoop-0.20 on my Ubuntu desktop. I followed Michael Noll's excellent tutorial as well as the quickstart for the release. Michael's tutorial is not updated yet for the split of config files in Hadoop 0.20, and he has mentioned that clearly. So, I followed his tutorial and when it comes to the configs, I used the quickstart information for core-site, hdfs-site and mapred-site.

I was however getting stuck with a strange issue. jps would show the jobtracker process for a while, and then it would get killed. The logs had a java.io.IOException: /tmp/hadoop....jobtracker-info file could only be replicated to 0 nodes instead of 1.
The Hadoop mailing lists did not have any reference to this issue. All I found was this bug. I read on Cloudera's page that start-all and stop-all are deprecated, but I could not find any reference to it on Hadoop's wiki.

After various hits and trials, what worked for me was to edit start-all.sh. I first started the dfs and then waited for 5 minutes. Only then did I brin up the mapred deamons. This solved it.

Labels: ,