Content for Chapter 2, Big Data Technologies

Here is a detailed content of class sessions for Chapter 2 of module Big Data computing technologies (4th semester), BSC in Data Science for Responsible Business (Centrale Lyon & EM Lyon).

Caution: the content of this page will evolve as the lessons progress.

Part 1. Linked Open Data (LOD) technology (6h) and project (7h).

Part 2. Hadoop framework, including HDFS and MrJob’ python library (8h).

  • Preparation (note that the installation of the software and container will require up to 3GB of free space on your hard drive!)

    • Install Docker on your personal machines by following this link.
    • Launch Docker (it will run in the background).
    • Open a Terminal (Windows Powershell for Windows users), and execute the following command:
  docker pull stephanederrode/docker-cluster-hadoop-spark-python-16:3.6