Content for Chapter 2, Big Data Technologies
Here is a detailed content of class sessions for Chapter 2 of module Big Data computing technologies (4th semester), BSC in Data Science for Responsible Business (Centrale Lyon & EM Lyon).
Caution: the content of this page will evolve as the lessons progress.
Part 1. Linked Open Data (LOD) technology (6h) and project (7h).¶
-
Teaching materials
- Slides
- iPython notebook with examples from the course (and other examples).
-
Educational resources
-
Book (available at the Centrale Lyon library)
- Learning SparQL, by Bob Ducharme, 2nd Edition, 2011, O’Reilly. (pdf copies can be found on the Internet!)
-
SparQL language reference
- SparQL language from W3C.
- SparQL By Example: The Cheat Sheet.
- SparQL query-validator. As a bonus, it re-indents and improves the readability of your code!
-
Videos
- Big Data in 5 minutes
- What is Linked Open Data? (Introduction for students)
- What is Linked Data ? (A short non-technical introduction to Linked Data)
- SPARQL in 11 minutes
-
-
Practical work
-
LOD Project (in groups of 3 students)
Part 2. Hadoop framework, including HDFS and MrJob’ python library (8h).¶
-
Preparation (note that the installation of the software and container will require up to 3GB of free space on your hard drive!)
- Install Docker on your personal machines by following this link.
- Launch Docker (it will run in the background).
- Open a Terminal (Windows Powershell for Windows users), and execute the following command:
docker pull stephanederrode/docker-cluster-hadoop-spark-python-16:3.6
-
Teaching materials
-
Educational resources
-
Videos
- Hadoop In 5 Minutes
- What Is Hadoop? . 30 minutes introduction for beginners
- HDFS Tutorial For Beginners. 43 minutes
-
-
Practical works