Nijmegen - Spark Data Frame API

The goal of assignment 3 part B is to gain hands-on experience with Spark SQL and DataFrames by carrying out an analysis on structured open data.

The repository for the assignment contains the course’s notebook and three extra data files.

Big Data Spark Nijmegen notebook

Copy the course’s notebook from your assignment 3 repo, or from the course repository (BigData-open-data-Nijmegen.snb) into the docker container in directory /opt/docker/notebooks/BigData (just like you did in assignment 3 Part A).

Open localhost:9001/tree/BigData in your browser, and open the notebook you just installed inside your Spark Notebook container.

Follow the instructions in the notebook, and do not just click through every single cell - experiment with alternatives, you will need the experience when you write your blog post!

When you get stuck, open an issue in the Forum to find help from your fellow students and/or me!

Useful background information: Spark SQL and DataFrame documentation.

