White Wine
Quality
Classification

NATHALY INGOL ORANGE 3 TOOLKIT

UCI ML REPOSITORY

4,898

Wine instances

Chemical features

Quality classes (3–9)

Models compared

Orange 3 Workflow - click any model to inspect scores

Click a model node to activate

1. Source

📄

File (XLSX)

white-wine-quality

↓

2. Prepare

📊

Data Table

Inspect 4,898 rows

✏️

Edit Domain

quality → Categorical

↓

3. Models - tap to inspect

↓

4. Evaluate

🧪

Test & Score

5-fold stratified CV

📈

ROC Analysis

AUC curves

🔢

Confusion Matrix

Per-class accuracy

Model Comparison - 5-fold stratified cross-validation

Model	AUC	Accuracy	F1	Precision	Recall	MCC

Feature Importance (Decision Tree, depth 5)

Quality Score Distribution

        Classes 5 & 6 = 74% of all 4,898 instances
      

Python Analysis - Jupyter Notebook Deep Dive

📓

What is this section?

The Orange 3 analysis above gives us a visual workflow. Below, the same dataset was analyzed using Python (pandas, scikit-learn, matplotlib) in a Jupyter Notebook - replicating every result and going deeper. Each chart below includes an explanation so you can follow along even without a data science background.

View notebook on GitHub →

Step 1 - Class Distribution & Imbalance

What is this chart?

The bar chart (left) shows how many wines exist at each quality score from 3 to 9. The pie chart (right) groups them into Low (3–4), Medium (5–6), and High (7–9) buckets.

Why does this matter?

Quality scores 5 and 6 make up 74% of all 4,898 wines. This is called class imbalance - the model sees far more medium wines than low or high ones. As a result, accuracy alone is misleading: a model that just guesses '6' for everything would still be ~45% accurate. This is why we also look at AUC and F1.

Key Finding

Only 20 wines scored 3 and only 5 scored 9 - these rare classes are nearly impossible to classify correctly.

Python

vc = df['quality'].value_counts().sort_index()

1 / 7

White WineQualityClassification

White Wine
Quality
Classification