Spark Notebook How to use the Spark Notebook in Huygens or your own device
Spark Notebook
Create and start a container
We pull the course’s rubigdata/hadoop
image and create a container using that image:
docker pull rubigdata/hadoop
docker create --name snb -p 9000:9000 -p 4040-4045:4040-4045 rubigdata/hadoop
The port options tell docker to map exposed ports 4040-4045 and port 9000 to the same ports on your local network interface. The name option is handy for the course instructions, but do remember that container names have to be unique; if you later decide to create another container, e.g. with additional options, you will need a different name, and start/stop the container using that name. If you leave out a name, docker generates one for you.
You can start and stop the container you created; now start it to continue work on your lab session:
docker start snb
Now open localhost:9000 in your browser to access the Spark Notebook.
As always, you can execute a shell inside the running container:
docker exec -it snb /bin/bash
Mounting directories (advanced)
This may not work in Huygens due to security considerations:
If your host system supports this, you can mount a directory such that it is shared between the container and the host.
E.g., starting a new container using docker run
(you can also use separate create
and start
/stop
as above):
docker run -p 9000:9000 -p 4040-4045:4040-4045 -v ${HOME}/tmp:/data -d rubigdata/hadoop
Starting docker this way, you can read and write files in /data
in the container and access them from tmp
in your home directory.
If you use SELinux on your own laptop, you may need to add :z
as follows:
docker run -p 9000:9000 -p 4040-4045:4040-4045 -v ${HOME}/tmp:/data:z -d rubigdata/hadoop
Need help?
For more information about the Spark Notebook environment, you may refer to the Spark Notebook documentation, e.g. glance over the brief intro to Spark and the UI.
Use the github issue tracker on the forum so every one in class can help out and my overflowing email box is not a bottleneck for your progress. See also: FAQ
Back to Assignments overview