Data Analysis, M1
Module supervisor
- Stéphane Derrode, Centrale Lyon, Mathematics and Computer Sciences Dpt
Table of Contents
This page contains all resources (notebooks, data) for the Data Analysis module, taught as part of the M1 programme at Centrale Digital Lab @ Ecole Centrale Lyon.
General information¶
Formation: Centrale Digital Lab @ Ecole Centrale Lyon
Level: M1
Total duration: 12 hours (4 sessions × 3 hours)
Language: English
Format: Interactive — Jupyter notebooks + closed-book paper quiz at the end of each session
Prerequisites¶
| Prerequisite | Expected level |
|---|---|
| Python programming | Comfortable with functions, loops, lists, and basic OOP |
| NumPy | Basic array manipulation |
| Statistics | Mean, variance, correlation — no regression or probability theory required |
| Pandas | Not required — introduced in Session 1 |
Organisation¶
The module follows the full lifecycle of a data analysis project, from raw data to actionable decisions:
- Session 1 — From Raw Data to Clean Data · Titanic dataset
- Session 2 — From Clean Data to Insight · Spotify Tracks dataset
- Session 3 — From Insight to Decision · Heart Disease UCI dataset
- Session 4 — Beyond Supervised Learning · Spotify + Heart Disease
Each session is structured as follows:
- Instructor-led introduction (10–15 min) — context, objectives, concepts
- Guided notebook — blocks of explanation + code + exercises (🏋️)
- Paper quiz (15 min, closed book) — at the end of each session
Schedule¶
| Session | Title | Dataset | Duration |
|---|---|---|---|
| 1 | From Raw Data to Clean Data | Titanic | 3h |
| 2 | From Clean Data to Insight | Spotify Tracks | 3h |
| 3 | From Insight to Decision | Heart Disease UCI | 3h |
| 4 | Beyond Supervised Learning | Spotify + Heart Disease | 3h |
Sessions¶
Session 1. From Raw Data to Clean Data¶
Dataset: Titanic passenger records (891 rows, 12 features)
Key topics: Pandas by practice · Missing value detection and imputation · Categorical encoding · DataFrame merging · Aggregation with groupby
Session 2. From Clean Data to Insight¶
Dataset: Spotify Tracks (~3 000 tracks, 8 genres, 19 audio features)
Key topics: Univariate and bivariate analysis · Histograms and boxplots · Correlation matrix · Pair plot · PCA with linear algebra
Session 3. From Insight to Decision¶
Dataset: Heart Disease UCI (~920 patients, 13 clinical features)
Key topics: Feature engineering · Targeted EDA · Logistic Regression · Random Forest · Confusion matrix · Precision / Recall / F1 · ROC-AUC · Ethics of model errors
Session 4. Beyond Supervised Learning¶
Datasets: Spotify Tracks (K-Means) · Heart Disease UCI (Naive Bayes, MLP)
Key topics: K-Means clustering · Elbow method · Silhouette score · Naive Bayes and Bayes’ theorem · Multi-Layer Perceptron · Model comparison · Introduction to Deep Learning