Assignment 1B Docker
We will use Docker to remove the burden on unifying working environments.
First, go over the basic tutorial to gain familiarity with Docker usage.
The easiest solution is to use Docker on the computers in the terminal room; the only drawback that you
have to be in the terminal room to boot the machine in Linux (note that you can use
ssh once it is booted, but,
anyone can decide to reboot the machine, and all machines are shutdown at the end of the day).
Alternatively, install Docker on your own laptop or home computer. While this route is not officially supported, I will provide a few pointers to get you started, see below.
Setting up Docker and Vagrant
Unfortunately, Docker is not supported in the HG00.137 machines - this may or may not change during the semester (security concerns).
As a workaround, for now, we work inside a virtual machine that we manage using
Vagrant has been installed and is ready for use in the terminal room HG00.137.
You will first want to do the getting started if you never worked with vagrant before.
We use vagrant with the
virtualbox provisioner, and a configuration that stores your virtual
machines in directory
Leaving a single virtual machine in the terminal rooms is okay!
vagrant suspend to do that, and
vagrant up to continue where you left off.
(Note: after resuming, you may need to resume docker containers that ran in the suspended VM,
docker start HASH to do so.)
We do request that you:
vagrant destroyunused virtual machines;
- use the same computer every week (whenever possible).
Virtual machines consume considerable disk and that volume has only 16G available (
df --si /var/tmp).
Create the project directory and
cd into it:
mkdir bigdata cd bigdata
Vagrantfile that I prepared, and save it to the project directory you just created, or issue:
Start the virtual machine and
ssh into it:
vagrant up vagrant ssh
When you are finished, you exit the shell, and may want to
vagrant suspend the virtual machine - you can always restart it with
A minimal test shows that the docker client can start so-called containers:
docker run hello-world
If Docker is new to you, it is highly recommended to learn more about its architecture and usage on the Docker site; specifically, the rest of the course material will assume that you followed steps two and three from the excellent documentation.
Follow the tutorial starting from step two.
In the course, we will only use docker with images provided by others, so it is not necessary to continue to step four, although I will not stop you if you are starting to get the hang of it!
Continue only after having seen the ASCII art using your own setup!
If you opt to install docker on your own hardware (and work independently from the availability of computers in the Huygens building), start below at the link corresponding to your operating system of choice, and start with step one to get docker up and running on your own hardware:
Students who install Docker on their own machine will need command
and are recommended to read the docker machine docs.
vagrant is available on all platforms as well, so you could just do what we do on the terminal room machines;
but if your host is a Linux machine, I recommend to see if you cannot just run Docker directly.
In the assignments, we get hands-on experience with Spark Notebook.
Setup (first time only)
If this is the first time that you will start Spark Notebook, you need to use its image and initialize a container: follow the instructions given in Spark Notebook for the big data course.
Starting the Spark Notebook container
Otherwise, start up a container with
docker run (only if it is not running of course);
and simply open localhost:9001 in your browser.
If you successfully started the Spark Notebook container, then opening localhost:9001 will show you the Spark Notebook UI in the browser.
Why don’t you try out some of the scala things you worked with in the first week!
(It is possible to run the docker container remotely, and open the Spark Notebook in a browser on your laptop, provided
that you know how to tunnel ports 4040 and 9001 to the laptop; for example using
ssh -L or the right tunneling
Once you are fluent in using the docker client, it is easy to forget that every container and image used takes up disk space on the local machine. Please clean up regularly!
The following command lists the running containers:
docker ps -f status=running
Any running containers you do not use, can be stopped using
docker stop HASH.
Next, you may remove all inactive, exited containers that you do not plan to restart by issuing one shell command:
docker ps -f status=exited -q | \ xargs docker rm
Optional extra reading (not required for the course):
- Advanced Docker with the Docker book (not free, ~EUR 10).