Data Analysis, M1

Module supervisor

Stéphane Derrode, Centrale Lyon, Mathematics and Computer Sciences Dpt

Table of Contents

General information
Prerequisites
Organisation
Schedule
Sessions

This page contains all resources (notebooks, data) for the Data Analysis module, taught as part of the M1 programme at Centrale Digital Lab @ Ecole Centrale Lyon.

General information¶

Formation: Centrale Digital Lab @ Ecole Centrale Lyon
Level: M1
Total duration: 12 hours (4 sessions × 3 hours)
Language: English
Format: Interactive — Jupyter notebooks + closed-book paper quiz at the end of each session

Prerequisites¶

Prerequisite	Expected level
Python programming	Comfortable with functions, loops, lists, and basic OOP
NumPy	Basic array manipulation
Statistics	Mean, variance, correlation — no regression or probability theory required
Pandas	Not required — introduced in Session 1

Organisation¶

The module follows the full lifecycle of a data analysis project, from raw data to actionable decisions:

Session 1 — From Raw Data to Clean Data · Titanic dataset
Session 2 — From Clean Data to Insight · Spotify Tracks dataset
Session 3 — From Insight to Decision · Heart Disease UCI dataset
Session 4 — Beyond Supervised Learning · Spotify + Heart Disease

Each session is structured as follows:

Instructor-led introduction (10–15 min) — context, objectives, concepts
Guided notebook — blocks of explanation + code + exercises (🏋️)
Paper quiz (15 min, closed book) — at the end of each session

Schedule¶

Session	Title	Dataset	Duration
1	From Raw Data to Clean Data	Titanic	3h
2	From Clean Data to Insight	Spotify Tracks	3h
3	From Insight to Decision	Heart Disease UCI	3h
4	Beyond Supervised Learning	Spotify + Heart Disease	3h

Sessions¶

Session 1. From Raw Data to Clean Data¶

Dataset: Titanic passenger records (891 rows, 12 features)

Key topics: Pandas by practice · Missing value detection and imputation · Categorical encoding · DataFrame merging · Aggregation with groupby

→ Go to Session 1

Session 2. From Clean Data to Insight¶

Dataset: Spotify Tracks (~3 000 tracks, 8 genres, 19 audio features)

Key topics: Univariate and bivariate analysis · Histograms and boxplots · Correlation matrix · Pair plot · PCA with linear algebra

→ Go to Session 2

Session 3. From Insight to Decision¶

Dataset: Heart Disease UCI (~920 patients, 13 clinical features)

Key topics: Feature engineering · Targeted EDA · Logistic Regression · Random Forest · Confusion matrix · Precision / Recall / F1 · ROC-AUC · Ethics of model errors

→ Go to Session 3

Session 4. Beyond Supervised Learning¶

Datasets: Spotify Tracks (K-Means) · Heart Disease UCI (Naive Bayes, MLP)

Key topics: K-Means clustering · Elbow method · Silhouette score · Naive Bayes and Bayes’ theorem · Multi-Layer Perceptron · Model comparison · Introduction to Deep Learning

→ Go to Session 4