How to compile a Hadoop Program

Before compiling your first hadoop program, please see the instructions on how to run the WordCount Example.

You can get the wordcount example code from Github

(Make sure you get the compatible version):

wget https://github.com/apache/hadoop-common/raw/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/WordCount.java

Optionally you can change package org.apache.hadoop.examples; to package org.janzhou;.

Set the HADOOP_CLASSPATH:

export HADOOP_CLASSPATH=$(bin/hadoop classpath)

Compile:

javac -classpath ${HADOOP_CLASSPATH} -d WordCount/ WordCount.java

Create JAR:

jar -cvf WordCount.jar -C WordCount/ .

Run:

bin/hadoop jar WordCount.jar org.janzhou.wordcount /wordcount/input /wordcount/output

Using sun.tools.javac.Main

You normally invoke javac.exe from the command line, but you can also invoke it from within a Java program. Use the sun.tools.javac.Main class located in ${JAVA_HOME}/lib/tools.jar to pass it an array of Strings equivalent to the command line parameters.

Look the MapReduce Tutorial.

Set environment variables:

export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

Compile WordCount.java and create a jar:

bin/hadoop com.sun.tools.javac.Main -d WordCount/ WordCount.java 
jar -cvf WordCount.jar -C WordCount/ .

Makefile

It is also nice to have a Makefile that do this automatically for you.

Here is a simple example:

HADOOP = ${HOME}/hadoop-2.5.1/bin/hadoop

APP = WordCount
SRC = src/*.java 
OUT = out

$(APP): $(SRC) 
    mkdir -p $(OUT) 
    javac -classpath `$(HADOOP) classpath` -d $(OUT) $(SRC) 
    jar -cvf $(APP).jar -C $(OUT) .

clean: 
    rm -rf $(OUT) *.jar .

You can find more comprehensive examples from: Hadoop Example