Starfish v0.3.0 Tutorial: 1. Installation and Compilation

Starfish Installation Instructions

If you have downloaded the Starfish Binaries, simply extract the package in a directory of your choice.

tar -xzf starfish-0.3.0.tar.gz

For using the Starfish functionalities with a live Hadoop cluster, place Starfish on the node from which you plan to submit the MapReduce jobs. You do not need to copy the Starfish directory to all the slave nodes of the Hadoop cluster. The Starfish Visualizer can be used from any machine that has JavaFX 1.2.3 installed.

After the extraction, you will see a directory called starfish-0.3.0 containing several subdirectories, as well as the starfish-0.3.0-*.jar executables. The subdirectories are:

  1. bin: Contains the Starfish shell scripts. Their use is explained later in this tutorial.
  2. btrace: Contains the compiled BTrace classes and jars that will be installed on the slave nodes of the cluster (explained below).
  3. contrib: Contains various MapReduce programs that can be used as examples.
  4. docs: Contains detailed usage documentation of each script and jar in Starfish.
  5. samples: Contains sample files for specifying the cluster specifications and input data properties (discussed later in the tutorial), as well as sample logs collected from actual MapReduce job executions.
  6. tools: Contains profiling and monitoring tools.
  7. visualizer: Contains the Starfish Visualizer.

BTrace Installation Instructions

In order to profile the execution of a Map-Reduce job in a Hadoop cluster, you must first install the pre-compiled BTrace scripts and jars (included in Starfish).

  1. Set the following global profiling parameter in bin/config.sh:
    • SLAVES_BTRACE_DIR: BTrace installation directory at the slave nodes. Please specify the full path and ensure you have the appropriate write permissions. The path will be created if it doesn't exist.
    • CLUSTER_NAME: A descriptive name for the cluster (like test, production, etc.). Do not include spaces or special characters in the name.
    • PROFILER_OUTPUT_DIR: The local directory to place the collected logs and profile files. Please specify the full path and ensure you have the appropriate write permissions. The path will be created if it doesn't exist.
  2. Install BTrace using the provided bin/install_btrace.sh from the master node in the cluster. The sole input to the script is the path to a file containing the names or ip addresses of the slave nodes in the cluster.
    ./bin/install_btrace.sh /path/to/slaves/file.txt

Tip: If during the BTrace installation you are asked to enter your password multiple times, then you should consider updating your SSH key authetication method to avoid typing your password repeatedly. Follow the instructions on 'Hadoop Cluster Setup, SSH Key Authentication' .

Starfish Compilation Instructions

If you have downloaded the Starfish Source Code, you will need to first compile the code before using it. In addition to the subdirectories above, the downloaded source code directory also contains:

  1. lib: Contains library jars for compiling the source code.
  2. src: Contains the source code for the Starfish project.
  3. build: Created after compilation and will contain the compiled classes.
  4. btrace: Created after compilation and will contain the compiled BTrace classes and jars.

You can compile the source code using ant. Please remember to set the environment variable JAVA_HOME before using ant. The main compilation commands are:

  1. Compile the entire source code and create the jar files:
    ant
  2. Execute all available JUnit tests and verify the code was compiled successfully:
    ant test
  3. Generate the javadoc documentation in docs/api:
    ant javadoc

Known issue: The BTrace compiler (used by the ant script) requires that the environment variable $JAVA_HOME points to the Java JVM directory (which is usually the case) and not the Java JRE directory.

Starfish also includes sample MapReduce programs in the contrib directory. These programs are compiled separately. Simply enter the directories and use ant like above. The README files in the subdirectories explain how to use the corresponding MapReduce programs. Make sure to compile the contrib/examples project for following the remaining of this tutorial.

The Starfish Visualizer is located in the visualizer directory and it is also compiled separately using ant. Please remember to install JavaFX version 1.2.3 and set the environment variable $JAVAFX_HOME before compiling the Visualizer.

You are now ready to use Starfish! The remaining tutorial will explain the basic use of Starfish.