Using Docker

If Docker is new to you, it is highly recommended to learn more about its architecture and usage on the Docker site; the [getting started guide](( is a useful exercise. In the course, you use docker with images provided by others (well, me), so it is not necessary to follow all the steps to upload your own Dockerfile though.


The real practical assignments of the course aim to give you basic knowledge of Spark, at the moment the de facto Big Data platform. We will work with Spark from Scala, a functional language that is executed on the Java Virtual Machine (JVM), so usage of Java libraries can be mixed with pure Scala in a hybrid environment.

The usefulness of Docker is immediately clear when you set out to explore the language and practice your Scala. Just run a Docker container that comes with a complete Scala environment pre-installed:

docker run -it --rm williamyeh/scala

The first time, the image is downloaded from the Docker hub, and a container is initialised and starts up by automatically running the Scala interpreter (known as the REPL).

Links to explore:

The course is not about functional programming, so do not get carried away - at this point, you only want to acquire a very basic understanding of the language.

If you skip the first section of the Scala tutorial (where they compile and run a Hello World program), i.e., start here, you can follow along in the Scala interpreter. (Type :quit when you are done.)

To follow the Scala tutorial from the start, and also compile and execute the HelloWorld.scala program, start the Docker container issuing a shell:

docker run -it williamyeh/scala /bin/bash

(Notice how I leave out the --rm option when compared to the previous command; that is because we want to revisit the same container later!)

You are now using a Linux computer as root. You can for example install additional software - but before that works, you need to initialize apt:

apt update
apt-get update

Now, install the editor you prefer, e.g., nano, vi or emacs (my personal preference, but use nano if all of this is new to you!). Install it in the container, e.g.:

apt-get install nano

Create the file (nano HelloWorld.scala, use copy-paste to enter the code). Next, compile the program and run it:

scalac HelloWorld.scala
scala -classpath . HelloWorld

Alternatively, quit the container by typing exit or ^D. You can create the file in your normal desktop environment, save it, and copy it into the root of the container’s file system (directory /) using Docker’s copy command:

docker cp HelloWorld.scala HASH:/

You find the HASH value using docker ps -a (or by using autocompletion in the shell, press TAB) and issue the following commands to copy the file and continue using the container (assuming the value of HASH is vibrant_liskov):

docker cp HelloWorld.scala vibrant_liskov:/
docker start vibrant_liskov
docker attach vibrant_liskov

Pro tip: the container that ran most recently can also be queried using docker container ls and then combined on the bash commandline into a oneliner to copy your file into user root’s home directory as follows:

docker cp HelloWorld.scala $(docker container ls -lq):/root

Clean up

Once you are fluent in using the docker client, it is easy to forget that every container and image used takes up disk space on the local machine. Clean up regularly!

The following command lists the running containers:

docker ps -f status=running

Any running containers you do not use, can be stopped using docker stop HASH (tab autocompletion is easiest to find the corresponding HASH). You can remove all inactive, exited containers that you do not plan to restart by issuing:

docker container prune

Images that are not needed any longer can also be removed, use docker image ls followed by docker image rm for the image you can free up (or, equivalently, docker rmi).

Finalize the assignment

Don’t forget to read all instuctions under “finalize the assignment” in Assignment 1 part I.


Optional extra reading (not required for the course):

Back to Assignment / 1 / part II.