Assignment 1B Docker
We will use Docker to remove the burden on unifying working environments.
First, go over the basic tutorial to gain familiarity with Docker usage. (Do not follow the instruction to install docker yet.)
The easiest solution is to use Docker on the computers in the terminal room; the only drawback that you
have to be in the terminal room to boot the machine in Linux (note that you can use
ssh once it is booted, but,
anyone can decide to reboot the machine, and all machines are shutdown at the end of the day).
Alternatively, install Docker on your own laptop or home computer. While this route is not officially supported, I will provide a few pointers to get you started, see below.
Setting up Docker
Docker can be used in terminal rooms HG00.027 and HG00.075.
You need to be added to the Docker group - in theory, this has already been done, check using the
If not, let me or the TAs know, or ask at CNCZ (where you received your science login).
A minimal test shows that the docker client can start so-called containers:
docker run hello-world
If Docker is new to you, it is highly recommended to learn more about its architecture and usage on the Docker site; specifically, the rest of the course material will assume that you followed step two from the excellent documentation.
Follow the tutorial step two.
In the course, we will only use docker with images provided by others, so it is not necessary to continue to steps three and beyond, although I will not stop you if you are starting to get the hang of it!
If you opt to install docker on your own hardware (and work independently from the availability of computers in the Huygens building), follow the instructions for the right Docker Community Edition.
You can upgrade to Windows 10 Education via Surfspot (free for students), this gives access to the required windows features for Docker. Because I run Windows Insider preview versions on my laptop, the latest Docker for Windows stable release did not work and I had to install version 17.09.1 instead of 17.12.0. If you have an older version of Windows, you may need to use Docker Toolbox. An alternative would be to use a virtual machine with, e.g., Ubuntu, and install Docker there.
The practical assignments aim to give you basic knowledge of Spark, at the moment the de facto Big Data platform. We will use Spark from Scala, a functional language that is executed on the Java Virtual Machine (JVM), so usage of Java libraries can be mixed with pure Scala in a hybrid environment.
The usefulness of Docker is immediately clear when you explore the language and practice your Scala! Just run a Docker container that comes with a complete Scala environment pre-installed:
docker run -it --rm williamyeh/scala
The first time, the image is downloaded from the Docker hub, and a container is initialised and starts up by automatically running the Scala interpreter (or REPL).
Links to explore:
- Scala tutorial for Java Programmers
- Excellent interactive tutorial
- Main Scala site and documentation
This course is not about functional programming, so do not get carried away - you only want to acquire a basic understanding of the language.
If you skip the first section of the Scala tutorial (where they compile and run a Hello World program), you
can follow along in the Scala interpreter. (Type
:quit when you are done.)
To follow the Scala tutorial and compile and execute a
start the Docker container issuing a shell (notice how I leave out the
because we want to revisit the same container later!):
docker run -it williamyeh/scala /bin/bash
If you use an editor like
emacs, install it in the container:
apt update apt-get install vim-nox
Create the file (
vi HelloWorld.scala, use copy-paste to enter the code).
Next, compile the program and run it:
scalac HelloWorld.scala scala -classpath . HelloWorld
Alternatively, quit the container by typing
You can create the file in your normal desktop environment, save it, and copy it into the root of the container:
docker cp HelloWorld.scala HASH:/
You find the HASH value using
docker ps -a (or by using autocompletion in the shell, press TAB)
and issue the following commands to copy the file and continue using the container
(assuming the value of HASH is
docker cp HelloWorld.scala vibrant_liskov:/ docker start vibrant_liskov docker attach vibrant_liskov
Once you are fluent in using the docker client, it is easy to forget that every container and image used takes up disk space on the local machine. Please clean up regularly!
The following command lists the running containers:
docker ps -f status=running
Any running containers you do not use, can be stopped using
docker stop HASH (tab autocompletion is easiest to
find the corresponding HASH).
You can remove all inactive, exited containers that you do not plan to restart by issuing:
docker container prune
Images that are not needed any longer can also be removed, use
docker image ls followed by
docker image rm
for the image you can free up (or, equivalently,
Optional extra reading (not required for the course):