Docker Environment

We will use Docker to remove the burden on unifying working environments.


First, go over the basic tutorial to gain familiarity with Docker usage. (Do not follow the instruction to install docker yet.)

The easiest solution is to use Docker on the computers in the terminal room; the only drawback that you have to be in the terminal room to boot the machine in Linux (note that you can use ssh once it is booted, but, anyone can decide to reboot the machine, and all machines are shutdown at the end of the day).

Alternatively, install Docker on your own laptop or home computer. While this route is not officially supported, I will provide a few pointers to get you started, see below.

Setting up Docker

Docker can be used in terminal rooms HG00.027 and HG00.075.

You need to be added to the Docker group - in theory, this has already been done, check using the groups command.

If not, let me or the TAs know, or ask at CNCZ (where you received your science login).

Using Docker

A minimal test shows that the docker client can start so-called containers:

docker run hello-world

If Docker is new to you, it is highly recommended to learn more about its architecture and usage on the Docker site; specifically, the rest of the course material will assume that you followed step two from the excellent documentation.

Follow the tutorial step two.

In the course, we will only use docker with images provided by others, so it is not necessary to continue to steps three and beyond, although I will not stop you if you are starting to get the hang of it!


If you opt to install docker on your own hardware (and work independently from the availability of computers in the Huygens building), follow the instructions for the right Docker Community Edition.

You can upgrade to Windows 10 Education via Surfspot (free for students), this gives access to the required windows features for Docker. Because I run Windows Insider preview versions on my laptop, the latest Docker for Windows stable release did not work and I had to install version 17.09.1 instead of 17.12.0. If you have an older version of Windows, you may need to use Docker Toolbox. An alternative would be to use a virtual machine with, e.g., Ubuntu, and install Docker there.


The practical assignments aim to give you basic knowledge of Spark, at the moment the de facto Big Data platform. We will use Spark from Scala, a functional language that is executed on the Java Virtual Machine (JVM), so usage of Java libraries can be mixed with pure Scala in a hybrid environment.

The usefulness of Docker is immediately clear when you explore the language and practice your Scala! Just run a Docker container that comes with a complete Scala environment pre-installed:

docker run -it --rm williamyeh/scala

The first time, the image is downloaded from the Docker hub, and a container is initialised and starts up by automatically running the Scala interpreter (or REPL).

Links to explore:

This course is not about functional programming, so do not get carried away - you only want to acquire a basic understanding of the language.

If you skip the first section of the Scala tutorial (where they compile and run a Hello World program), you can follow along in the Scala interpreter. (Type :quit when you are done.)

To follow the Scala tutorial and compile and execute a HelloWorld.scala program, start the Docker container issuing a shell (notice how I leave out the --rm option, because we want to revisit the same container later!):

docker run -it williamyeh/scala /bin/bash

If you use an editor like vi or emacs, install it in the container:

apt update
apt-get install vim-nox

Create the file (vi HelloWorld.scala, use copy-paste to enter the code). Next, compile the program and run it:

scalac HelloWorld.scala
scala -classpath . HelloWorld

Alternatively, quit the container by typing exit or ^D. You can create the file in your normal desktop environment, save it, and copy it into the root of the container:

docker cp HelloWorld.scala HASH:/

You find the HASH value using docker ps -a (or by using autocompletion in the shell, press TAB) and issue the following commands to copy the file and continue using the container (assuming the value of HASH is vibrant_liskov):

docker cp HelloWorld.scala vibrant_liskov:/
docker start vibrant_liskov
docker attach vibrant_liskov

Clean up

Once you are fluent in using the docker client, it is easy to forget that every container and image used takes up disk space on the local machine. Please clean up regularly!

The following command lists the running containers:

docker ps -f status=running

Any running containers you do not use, can be stopped using docker stop HASH (tab autocompletion is easiest to find the corresponding HASH). You can remove all inactive, exited containers that you do not plan to restart by issuing:

docker container prune

Images that are not needed any longer can also be removed, use docker image ls followed by docker image rm for the image you can free up (or, equivalently, docker rmi).

See also

Optional extra reading (not required for the course):