Assignment 3 Part A Hands-on session with Spark RDDs
The goal of assignment 3A is to get hands-on experience in using the Spark Notebook to write your spark programs.
Note: if you need to restart from scratch, the easiest is to follow the detailed instructions from assignment 2, where you can skip the middle part (the instructions that setup Hadoop).
Access Spark Notebook in your browser by navigating to localhost:9001.
You may navigate a notebook with the keyboard by pressing shift-enter to execute a cell, and enter to add lines to a cell.
To get files onto the notebook docker container that you use, the easiest approach is to simply start a shell inside the container, and download the files there:
docker exec -it HASH /bin/bash
Inside the shell, you can simply
cd to move to the directory you need,
git clone commands to copy the files needed.
Create the following directory and copy the course notebook BigData-big-data-spark-rdd.snb into that directory:
mkdir -p /opt/docker/notebooks/BigData cd /opt/docker/notebooks/BigData wget http://rubigdata.github.io/course/assignments/BigData-big-data-spark-rdd.snb
Navigate to localhost:9001/tree/BigData in your browser to open the notebook you just installed inside your Spark Notebook container.
Follow the steps in the course notebook to get at ease with using spark, scala; and make sure that you understand what you find in the Spark UI, available at localhost:4040.
When you get stuck, open an issue in the Forum to find help from your fellow students and/or me!
Back to Assignment 3 overview.