Spark Notebook

Create and start a container

We pull the course’s rubigdata/hadoop image and create a container using that image:

docker pull rubigdata/hadoop 
docker create --name snb -p 9000:9000 -p 4040-4045:4040-4045 rubigdata/hadoop

The port options tell docker to map exposed ports 4040-4045 and port 9000 to the same ports on your local network interface. The name option is handy for the course instructions, but do remember that container names have to be unique; if you later decide to create another container, e.g. with additional options, you will need a different name, and start/stop the container using that name. If you leave out a name, docker generates one for you.

You can start and stop the container you created; now start it to continue work on your lab session:

docker start snb

Now open localhost:9000 in your browser to access the Spark Notebook.

As always, you can execute a shell inside the running container:

docker exec -it snb /bin/bash

Mounting directories (advanced)

This may not work in Huygens due to security considerations:

If your host system supports this, you can mount a directory such that it is shared between the container and the host. E.g., starting a new container using docker run (you can also use separate create and start/stop as above):

docker run -p 9000:9000 -p 4040-4045:4040-4045 -v ${HOME}/tmp:/data -d rubigdata/hadoop

Starting docker this way, you can read and write files in /data in the container and access them from tmp in your home directory. If you use SELinux on your own laptop, you may need to add :z as follows:

docker run -p 9000:9000 -p 4040-4045:4040-4045 -v ${HOME}/tmp:/data:z -d rubigdata/hadoop

Need help?

For more information about the Spark Notebook environment, you may refer to the Spark Notebook documentation, e.g. glance over the brief intro to Spark and the UI.

Use the github issue tracker on the forum so every one in class can help out and my overflowing email box is not a bottleneck for your progress. See also: FAQ

Back to Assignments overview