Session 4 — Beyond Supervised Learning

Instructor: Stéphane Derrode, Centrale Lyon
Formation: Centrale Digital Lab @ Ecole Centrale Lyon
← Back to course index

📦 Download all session files — notebook

⬇ session4.zip

Contents: session4_beyond_supervised.ipynb
Note: data files from Sessions 2 and 3 are required — download them from their respective pages.

Overview¶


Datasets	Spotify Tracks (K-Means) · Heart Disease UCI (Naive Bayes, MLP)
Duration	3 hours
Format	Jupyter notebook + paper quiz (15 min)
Prerequisite notebooks	Sessions 2 and 3 — the data files from those sessions are reused

This final session broadens the picture beyond the classifiers seen in Session 3. You will explore unsupervised clustering, probabilistic classification, and neural networks — then compare all models seen across the course and get a conceptual overview of Deep Learning as the next horizon.

Learning objectives¶

By the end of this session, you will be able to:

Explain the difference between supervised, unsupervised, and probabilistic learning
Apply K-Means, choose k with the elbow method and silhouette score, and interpret clusters
State Bayes’ theorem and explain the “naive” independence assumption
Describe the forward pass of a Multi-Layer Perceptron with one hidden layer
Name the role of activation functions and of backpropagation
Compare all models on the same dataset and articulate when to use each
Name the three main Deep Learning architectures (CNN, RNN/LSTM, Transformer) and their use cases

Before the session — what you need to do¶

1. Verify your environment

All packages from previous sessions must be installed. No new installation required.

2. Download the session 4 notebook

⬇ session4.zip

3. Launch Jupyter and run the setup cell

jupyter notebook session4_beyond_supervised.ipynb

You should see: All imports OK.

Session content¶

The notebook is divided into 5 blocks:

Block	Dataset	Topic	Key tools
1	Spotify	K-Means clustering	`KMeans`, `silhouette_score`, PCA projection, cluster profiling
2	Heart Disease	Naive Bayes	`GaussianNB`, Bayes’ theorem, class priors
3	Heart Disease	Neural Networks (MLP)	`MLPClassifier`, loss curve, early stopping
4	Heart Disease	Model comparison	Unified ROC plot, metric table, “when to use” guide
5	—	Introduction to Deep Learning	CNNs, RNNs/LSTMs, Transformers — markdown only

Key formulas to know¶

K-Means — assignment step: $$c^{(i)} = \arg\min_{j} \| \mathbf{x}^{(i)} - \boldsymbol{\mu}_j \|^2$$

Silhouette score: $$s = \frac{b - a}{\max(a, b)} \in [-1, 1]$$ where $a$ = mean intra-cluster distance, $b$ = mean distance to nearest other cluster.

Bayes’ theorem: $$P(y \mid \mathbf{x}) = \frac{P(\mathbf{x} \mid y) \cdot P(y)}{P(\mathbf{x})}$$

MLP — forward pass (one hidden layer): $$\mathbf{a}^{(1)} = f(W^{(1)} \mathbf{x} + \mathbf{b}^{(1)}), \qquad \hat{y} = \sigma(W^{(2)} \mathbf{a}^{(1)} + \mathbf{b}^{(2)})$$

Quiz¶

A 15-minute paper quiz (closed book, no devices) will be held at the end of the session.
It covers:

True/False on K-Means (supervised vs unsupervised), silhouette score, Naive Bayes independence assumption, MLP activation functions
Multiple choice: inertia definition, Gaussian Naive Bayes parameters, role of backpropagation
Short questions: reading an elbow plot, explaining the “naive” assumption with a concrete example, arguing against a deep network on small tabular data

💡 Tip: For the elbow plot question, practice computing cumulative inertia drops and identifying where the curve flattens. For Naive Bayes, think of pairs of features in the Heart Disease dataset that are clearly not independent.

Key concepts to remember¶

K-Means is unsupervised — it discovers structure without labels; you must interpret the clusters
Elbow + silhouette together — neither alone is sufficient to choose k
Naive Bayes is fast and surprisingly robust — a violated assumption does not necessarily mean poor predictions
Without activation functions, an MLP collapses to a linear model
On small tabular data, simpler models often win — do not reach for deep networks by default
Deep Learning = same math, different scale and architecture

Model comparison summary¶

Model	Best when…	Main limitation
Logistic Regression	Interpretability needed; linearly separable data	Cannot capture non-linear patterns
Random Forest	Good default for tabular data	Less interpretable
Naive Bayes	Small dataset; fast inference	Independence assumption often violated
MLP	Large dataset; complex patterns	Data-hungry; requires tuning
K-Means	No labels; want to find natural groups	Must specify k; assumes spherical clusters

← Back to course index