Session 4 — Beyond Supervised Learning
Instructor: Stéphane Derrode, Centrale Lyon
Formation: Centrale Digital Lab @ Ecole Centrale Lyon
← Back to course index
📦 Download all session files — notebook
Contents:
session4_beyond_supervised.ipynb
Note: data files from Sessions 2 and 3 are required — download them from their respective pages.
Overview¶
| Datasets | Spotify Tracks (K-Means) · Heart Disease UCI (Naive Bayes, MLP) |
| Duration | 3 hours |
| Format | Jupyter notebook + paper quiz (15 min) |
| Prerequisite notebooks | Sessions 2 and 3 — the data files from those sessions are reused |
This final session broadens the picture beyond the classifiers seen in Session 3. You will explore unsupervised clustering, probabilistic classification, and neural networks — then compare all models seen across the course and get a conceptual overview of Deep Learning as the next horizon.
Learning objectives¶
By the end of this session, you will be able to:
- Explain the difference between supervised, unsupervised, and probabilistic learning
- Apply K-Means, choose k with the elbow method and silhouette score, and interpret clusters
- State Bayes’ theorem and explain the “naive” independence assumption
- Describe the forward pass of a Multi-Layer Perceptron with one hidden layer
- Name the role of activation functions and of backpropagation
- Compare all models on the same dataset and articulate when to use each
- Name the three main Deep Learning architectures (CNN, RNN/LSTM, Transformer) and their use cases
Before the session — what you need to do¶
1. Verify your environment
All packages from previous sessions must be installed. No new installation required.
2. Download the session 4 notebook
3. Launch Jupyter and run the setup cell
You should see: All imports OK.
Session content¶
The notebook is divided into 5 blocks:
| Block | Dataset | Topic | Key tools |
|---|---|---|---|
| 1 | Spotify | K-Means clustering | KMeans, silhouette_score, PCA projection, cluster profiling |
| 2 | Heart Disease | Naive Bayes | GaussianNB, Bayes’ theorem, class priors |
| 3 | Heart Disease | Neural Networks (MLP) | MLPClassifier, loss curve, early stopping |
| 4 | Heart Disease | Model comparison | Unified ROC plot, metric table, “when to use” guide |
| 5 | — | Introduction to Deep Learning | CNNs, RNNs/LSTMs, Transformers — markdown only |
Key formulas to know¶
K-Means — assignment step: $\(c^{(i)} = \arg\min_{j} \| \mathbf{x}^{(i)} - \boldsymbol{\mu}_j \|^2\)$
Silhouette score: $\(s = \frac{b - a}{\max(a, b)} \in [-1, 1]\)$ where \(a\) = mean intra-cluster distance, \(b\) = mean distance to nearest other cluster.
Bayes’ theorem: $\(P(y \mid \mathbf{x}) = \frac{P(\mathbf{x} \mid y) \cdot P(y)}{P(\mathbf{x})}\)$
MLP — forward pass (one hidden layer): $\(\mathbf{a}^{(1)} = f(W^{(1)} \mathbf{x} + \mathbf{b}^{(1)}), \qquad \hat{y} = \sigma(W^{(2)} \mathbf{a}^{(1)} + \mathbf{b}^{(2)})\)$
Quiz¶
A 15-minute paper quiz (closed book, no devices) will be held at the end of the session.
It covers:
- True/False on K-Means (supervised vs unsupervised), silhouette score, Naive Bayes independence assumption, MLP activation functions
- Multiple choice: inertia definition, Gaussian Naive Bayes parameters, role of backpropagation
- Short questions: reading an elbow plot, explaining the “naive” assumption with a concrete example, arguing against a deep network on small tabular data
💡 Tip: For the elbow plot question, practice computing cumulative inertia drops and identifying where the curve flattens. For Naive Bayes, think of pairs of features in the Heart Disease dataset that are clearly not independent.
Key concepts to remember¶
- K-Means is unsupervised — it discovers structure without labels; you must interpret the clusters
- Elbow + silhouette together — neither alone is sufficient to choose k
- Naive Bayes is fast and surprisingly robust — a violated assumption does not necessarily mean poor predictions
- Without activation functions, an MLP collapses to a linear model
- On small tabular data, simpler models often win — do not reach for deep networks by default
- Deep Learning = same math, different scale and architecture
Model comparison summary¶
| Model | Best when… | Main limitation |
|---|---|---|
| Logistic Regression | Interpretability needed; linearly separable data | Cannot capture non-linear patterns |
| Random Forest | Good default for tabular data | Less interpretable |
| Naive Bayes | Small dataset; fast inference | Independence assumption often violated |
| MLP | Large dataset; complex patterns | Data-hungry; requires tuning |
| K-Means | No labels; want to find natural groups | Must specify k; assumes spherical clusters |