Lab Sessions

Assignments in this course are completed with peer-reviewed blog posts. The assignments are complementary to the theory taught in the lectures, help you gain some practical experience, but more importantly, they also help you better understand the design principles of Big Data platforms.

Assignments are a standalone component in the course. The lab sessions are crucial for building up the competences to be able to complete a final project on your own.

The other component of your grade, the final exam, is designed to test your understanding of the theory. You most likely do better on the exam if you take the assignments seriously, but these are separate parts and exam questions do not directly relate to the lab sessions.

Assignment 1

The goal in this first assignment is to acquire two skills:

  1. basic Github Pages
  2. basic docker expertise

Let’s start with the first objective: Github Pages.

Content over layout

Every assignment in this course is completed with a brief report created as a blog post. Grading does however not take into account how beautiful results of the work are formatted, but rather focuses on the quality of the content. To make sure you will not put form over function, these blog posts are written in Markdown, an increasingly popular language used to document code and/or write blogs published on the web.

Markdown is especially convenient as a tool to publish on the web when you use it with GitHub Pages and git. Github pages is a service that helps you manage web pages by modifying a Github repository. This course website is constructed that way too!

Note: if an assignment pages include links, you are supposed to follow up and look at the linked material before proceeding! Take some time to glance over the Github Pages and git documents linked above, and be preprared to revisit those when you need more detail later on during the assignment.

Using Github Pages, the webpage that contains your blog post corresponds 1:1 to a text file (written in Markdown) that is published the same way as we version our code: git add, git commit, and git push the files that make up your own course site, collaboratively and versioned. Once setup correctly, a tool called Jekyll automagically converts your markdown into HTML (without you doing anything; Jekyll will run at the Github servers upon git push).

Github for Education

You understand now why you had to create a Github account for the course - apart from Github Pages, we also use Github for education to initialize your assignment repositories.

If you did not request your github student pack, I do recommend to visit github for education to get student benefits including your own private repositories and other goodies including some free credits on cloud services. BTW: repositories created during the course are private unless you change that setting yourself, so no worries whatever silly things you write here, these won’t follow you online into your future.

Git and Github background

Let’s now continue to your first blog post composed in Markdown, published online for free; make sure the process is familiar, because every assignment is completed using these same steps. Use the results of Assignment 1 as your personal course website and blog. Of course, at the end of the course, you can always fork this repository and use it as the start of a personal online showcase portfolio - that is totally up to you!

If you are not yet fluent in git and Github, the first step to take is to study the online training material provided by Github to gain familiarity with git version control. If you are not that fluent on the command line (a skill which I do recommend taking the effort to acquire), consider Github’s Desktop app (Windows or Mac). Don’t forget that MIT’s The Missing Semester of Your CS Education includes both commandline and git lectures. Recommended stuff!

Create the Blog Post repository

After familiarizing yourself with git (or refreshing your memory) with the resources above, continue to write the first post on your blog:

  1. Follow the link to the Classroom for Github Bigdata Blog 2021 assignment, login with your github account, and accept the assignment. Be sure to link your student number to the repo: the identifiers should be entered as #sXXXXXXX (including the # and the s). This creates a private repository under the rubigdata “organization”, named rubigdata/bigdata-blog-2021-USERNAME. It contains the structure that you use throughout the practical assignments for the course.
  2. Go to your Github repository on Github (you can follow the link that is shown after accepting the assignment) and navigate to the settings tab. Scroll down to the Github Pages settings and select master branch /docs folder as source to enable Github pages. After a little delay your repository should appear on rubigdata.github.io/bigdata-blog-2021-USERNAME.
  3. If you would like your blog to look better than plain HTML, select a theme in your repository settings after enabling Github Pages.

Edit your Blog post

In order to add your own content you should now clone the private assignment repository. The README.md demonstrates some Markdown markup that you may use. The docs/ directory contains the Github Pages source for your blog (if you followed the instructions above).

Proceed as follows:

  1. Create a new markdown file in the docs/ folder, e.g., blogpost1.md, and write something like ‘Assignment 1’ in it (more content will follow later). Do the git add, git commit, git push sequence.
  2. Edit docs/index.md to write a short introduction about yourself (name, student number, course programme; if you want some extra information e.g., hobbies, favourite study/sport/student association, why you follow this course, etc.)

    Also, create a table of contents (ToC) (just use a bulleted list) with a link to your assignment

  3. Markdown has several ways to link, the simplest one is [anchor text](blogpost1.md) (using a relative URL, assuming the blogpost is located in the /docs/ directory in your repository). Don’t forget to git commit and git push (again) upon making changes.
  4. Go to your Github Pages website rubigdata.github.io/bigdata-blog-2021-USERNAME and check whether the HTML has been generated correctly.

If you are just starting out using git, it is fine to use just simple markup and limited styling. I have made a screencast that walks you through the process of clone (you do that usually only once), editing a file, commit and push, just to illustrate:

asciicast

Note: You do not have to install Jekyll yourself, Github servers run it for you.

Have you been committing and pulling for months or years, then level up! Figure out how to use the Github pages features for a blog series instead of having to create separate files manually (background info and this excellent blog).

Finalize assignment 1

The second part of assignment one is to learn how to use Docker.

For easier reference and better reading, part II and III have their own webpages. After completing those parts of the assignment, return here and complete Assignment 1.

Add a brief description of your computer setup and the way you run docker to your blogpost, and perhaps include a screenshot of executing a few Scala commands (having a screenshot is not mandatory, but good Markdown exercise). Images should be added to the docs/ directory and work the same as linking to other files (something you already learnt!).

Complete the assignment by handing in the URL of your published blog post using the PeerGrade add-on for Brightspace (precise instructions to follow in a Brightspace announcment).

References

And

Back to the assignments overview.