Machine Learning

04 Classification: Introduction to Classification

Imron Rosyadi

00. Introduction to Classification

Classification vs. Regression: A Quick Review

Regression: Predicting Continuous Values

  • Predicts a numeric, continuous output.
  • Examples: House prices, temperature, signal strength.
  • Evaluation: Measures like Mean Squared Error (MSE).

Classification: Predicting Categories

  • Predicts a categorical, discrete output.
  • Examples: Spam/Not Spam, Object presence (cat/dog), Fault detection.
  • Evaluation: Focuses on correct vs. incorrect assignments.

What Does It Mean to Classify?

Classification model results are often returned as a list of confidences for each class. The model predicts the probability a data point belongs to each class.

Understanding Classification Confidence

Example Output:

  • Tiger: 0.96
  • Lion: 0.75
  • Cougar: 0.68

Tip

In ECE, such confidence levels are critical in systems like autonomous vehicles (identifying pedestrians with high certainty), medical image diagnosis, or anomaly detection in power grids.

Ambiguous Cases

  • Orange: 0.97
  • Grapefruit: 0.96
  • Sun: 0.45

Common Classification Models

  • Logistic Regression:
    • A variation of linear regression, uses a sigmoid function for binary outcomes. Simple and interpretable.
  • Nearest Neighbors:
    • Classifies based on the majority class among its closest data points. Intuitive, but sensitive to local data structure.
  • Decision Trees:
    • Tree-like structure where each node tests a feature, leading to a class decision. Good for interpretability.
  • Random Forests:
    • An ensemble of many decision trees, combining their predictions for robustness and better accuracy. Often very powerful.
  • Naive Bayes:
    • Based on Bayes’ theorem, assumes feature independence. Useful for text classification and spam detection.
  • Deep Learning (Neural Networks):
    • Multi-layered networks capable of learning complex patterns. Highly effective for image, speech, and sensor data classification.

The Machine Learning Classification Workflow

graph TD
    A["Raw Sensor/System Data"] --> B{"Data Preprocessing"}
    B --> C["Feature Engineering/Extraction"]
    C --> D{"Split Data <br> (Training & Testing Sets)"}
    D -- Training Data --> E[Choose & Train ML Model]
    E -- & Evaluation --> F[Model Evaluation & Tuning]
    D -- Testing Data --> F
    F -- Performance OK? --> G[Deploy Model to ECE System]
    F -- Needs Improvement --> E
    G --> H[New Live Data Input]
    H --> I["Real-time Prediction / Classification"]
    I --> J["Action/Decision <br> (e.g., Control Signal, Alert)"]

    style A fill:#f9f,stroke:#333,stroke-width:2px;
    style G fill:#bbf,stroke:#333,stroke-width:2px;
    style J fill:#fcf,stroke:#333,stroke-width:2px;
    style E fill:#ccf,stroke:#333,stroke-width:2px;
    style F fill:#dfd,stroke:#333,stroke-width:2px;

Classification Model Performance

Unlike regression, we can’t measure continuous “distance” to evaluate classification. Instead, we count correct vs. incorrect predictions. These counts form the basis for various performance metrics.

The Confusion Matrix

  • True Positive (TP): Model predicted positive, was actually positive.
  • False Positive (FP): Model predicted positive, was actually negative (Type I error).
  • False Negative (FN): Model predicted negative, was actually positive (Type II error).
  • True Negative (TN): Model predicted negative, was actually negative.

Accuracy

  • The fraction of all predictions that a classification model got right.
  • Simply the sum of True Positives and True Negatives, divided by the total.

\[ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} \]

Motivation for Precision

When the model predicted positive, how often was it correct?

What is the probability that a detected anomaly in our sensor data is an actual* anomaly, given that our model flagged it?*

Precision

  • The fraction of correct positive predictions out of all positive predictions made by the model.

\[ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} \]

Motivation for Recall

Out of all the actual positive cases, how many did the model correctly identify?

What is the probability that our model will detect a ‘critical’ electromagnetic interference event, given that it actually occurred?

Recall

  • The fraction of correct positive predictions out of all actual positive cases.

\[ \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} \]

Precision vs. Recall: A Trade-Off

  • Increasing one often decreases the other.
  • The optimal balance depends on the application’s cost of FP vs. FN.

F1 Score

  • The harmonic mean of precision and recall.
  • High F1 indicates both precision and recall are high.

\[ F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \]

F1 Score: Simplified

The F1 formula can be reduced to:

\[ F_1 = \frac{2 \cdot \text{TP}}{2 \cdot \text{TP} + \text{FP} + \text{FN}} \]

Interactive Metric Calculator

Adjust the True Positives, False Positives, False Negatives, and True Negatives to see how Accuracy, Precision, Recall, and F1 Score change.

Which Metric Do I Use?

Note

The answer is always: it depends on your specific ECE application!

  • Accuracy: Rarely a sufficient standalone metric, especially with imbalanced classes.
  • Precision: Crucial when False Positives are costly (e.g., discarding good products in QA, false alarms in security).
  • Recall: Critical when False Negatives are costly (e.g., missing a fault in critical infrastructure, failing to detect a disease).
  • F1 Score: A good general measure when you need to balance both precision and recall, particularly with imbalanced datasets.

Confusion Matrix Example

Scenario: A model predicts if a tumor is malignant.

(Positive Class: Malignant, Negative Class: Benign)

Confusion Matrix Example: Data

Model to predict if a tumor is malignant

Given these values:

  • TP = 1
  • FP = 1
  • FN = 8
  • TN = 90

Solution: Accuracy

\[\text{Accuracy} = \frac{1 + 90}{1 + 1 + 8 + 90} = \frac{91}{100} = 0.91\]

Solution: Precision

\[\text{Precision} = \frac{1}{1 + 1} = \frac{1}{2} = 0.50\]

Solution: Recall

\[\text{Recall} = \frac{1}{1 + 8} = \frac{1}{9} \approx 0.11\]

Solution: F1 Score

\[ F_1 = \frac{2 \cdot 0.50 \cdot 0.11}{0.50 + 0.11} = \frac{0.11}{0.61} \approx 0.18 \]

Or, using the simplified formula:

\[ F_1 = \frac{2 \cdot 1}{2 \cdot 1 + 1 + 8} = \frac{2}{11} \approx 0.18 \]

Solution: F1 Score

Weather Prediction

Scenario: Predict “Rain” or “No Rain”.

Create a Confusion Matrix from this data.

Your Turn: Calculate Metrics

Now that you have constructed the confusion matrix for the weather prediction:

  • Accuracy = ?
  • Precision = ?
  • Recall = ?
  • F1 Score = ?

Solution: Weather Prediction

Solution: Weather Prediction

Confusion Matrix:

  • TP (Actual Rain, Predicted Rain): 2
  • FP (Actual No Rain, Predicted Rain): 2
  • FN (Actual Rain, Predicted No Rain): 2
  • TN (Actual No Rain, Predicted No Rain): 1

Metrics:

  • Accuracy: (2+1) / (2+2+2+1) = 3/7
  • Precision: 2 / (2+2) = 2/4
  • Recall: 2 / (2+2) = 2/4
  • F1 Score: 2/4

Graphical Measurements for Classification

Beyond single scalar metrics, graphical tools offer deeper insights into model performance across different decision thresholds.

Precision vs. Recall Curve

  • Plots Precision against Recall for different threshold values.
  • Helps select an optimal operating point based on FP/FN costs.

Receiver Operating Characteristic (ROC) Curve

  • Plots True Positive Rate (Recall) against False Positive Rate.
  • Helps compare models across all possible thresholds.

ROC Curve: True Positive Rate (TPR) / Recall

\[ \text{TPR (Recall)} = \frac{\text{TP}}{\text{TP} + \text{FN}} \]

ROC Curve: False Positive Rate (FPR)

\[ \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}} \]

  • FPR is 1 minus the True Negative Rate (TNR, Specificity).
  • Measures how many actual negative examples were falsely predicted as positive.

Interpreting the ROC Curve

  • TPR (Y-axis): Proportion of actual positives correctly identified.
  • FPR (X-axis): Proportion of actual negatives incorrectly identified as positive.
  • Dotted Line: Represents a random classifier (AUC = 0.5).
  • Area Under Curve (AUC): Single scalar metric to summarize the curve.
    • AUC near 1.0 indicates excellent discriminative power.
    • AUC near 0.5 suggests poor or random classification.

01. Binary Classification

Binary Classification: Two Outcomes

Yes or No

  • Predicts one of two discrete values or states.
  • Commonly encoded as 0 or 1.
  • Examples:
    • Spam / Not Spam
    • Fault / No Fault
    • Signal Present / Signal Absent
    • Pass / Fail for product testing

Binary Classification: Common Models

  • Logistic Regression: Transforms linear regression output into a probability (0-1).
  • Decision Trees & Random Forests: Can naturally split data into two categories.
  • Support Vector Machines (SVM): Finds an optimal hyperplane to separate classes with the largest margin.
  • Bayesian Networks: Probabilistic graphical models used for classification.
  • Neural Networks: Highly versatile, learn complex non-linear boundaries.

Binary Classification: Logistic Regression Example

Classification

  • Finds a logistic function to separate two classes.
  • Outputs a probability value (0-1) which is then thresholded for classification.
  • Relatively easy to interpret.

Lab Preview: Fruit Classification

Fruit Classification

  • Objective: Differentiate between oranges and grapefruit.
  • Dataset: Contains features like weight, size, and color.
  • Model: We will build a logistic regression model.

Lab Preview: Confusion Matrix Generation

Confusion Matrix

  • You will generate your first confusion matrix.
  • Visualizing TP, FP, FN, TN for your fruit classifier.

Your Turn: Binary Classification Lab

Let’s apply these concepts and build a binary classifier!

03. Multiclass Classification

Multiclass Classification: Many Outcomes

  • Classification problems with more than two classes.
  • Examples:
    • Digit recognition (0-9)
    • Speech command recognition (e.g., “activate,” “mute,” “volume up”)
    • Modulation scheme identification (e.g., BPSK, QPSK, 16-QAM)
    • Component type classification

Multiclass Strategies: One-vs-All (OvA) & One-vs-One (OvO)

  • One-vs-All (OvA):
    • Trains k binary classifiers for k classes.
    • Each classifier distinguishes one class from all others.
    • Final prediction is the class with the highest confidence.
  • One-vs-One (OvO):
    • Trains k * (k-1) / 2 binary classifiers.
    • Each classifier distinguishes one class from another specific class.
    • Final prediction is derived by a voting scheme among classifiers.

Lab Preview: The Iris Dataset

  • Classic ML Dataset: Widely used for multiclass classification.
  • Features: Sepal length, sepal width, petal length, petal width.
  • Target: Three species of Iris flowers (Setosa, Versicolor, Virginica).

Lab Preview: Cross-Fold Validation

  1. Shuffle data.
  2. Split into k groups (folds).
  3. Iterate k times:
    • Use one fold as test set.
    • Use remaining k-1 folds as training set.
    • Train model on training data.
    • Evaluate on test data and record performance.
  4. Average performance metrics across k iterations.

Lab Preview: Wine Producer Identification

  • Challenge: Identify wine producers based on chemical properties.
  • Dataset: Chemical analysis of different wines.
  • Your Task: Apply your ML skills with minimal guidance.

Your Turn: Multiclass Classification Lab

Time to apply your knowledge to solve multiclass problems!

Classification with TensorFlow

Dataset: UCI Heart Disease

Predicting the presence of heart disease

Note

This will be a binary classification problem: 0 = does not have heart disease 1 = has heart disease

Dataset: Features

Feature Description
age Age in years.
sex Sex (0 = female, 1 = male).
cp Chest pain type (1 = typical angina, 2 = atypical angina, 3 = non-anginal pain, 4 = asymptomatic).

Dataset: Features (continued)

Feature Description
trestbps Resting blood pressure in Hg.
chol Serum cholesterol in mg/dl.
fbs Is fasting blood sugar > 120 mg/dl (0 = false, 1 = true).
restecg Results of a resting electrocardiograph (0 = normal, 1 = ST-T wave abnormality, 2 = left ventricular hypertrophy).

Dataset: Features (continued)

Feature Description
thalach Max heart rate.
exang Exercise induced angina (0 = no, 1 = yes).
oldpeak Measurement of an abnormal ST depression.
slope Slope of peak of exercise ST segment (1 = upslope, 2 = flat, 3 = downslope).

Dataset: Features (continued)

Feature Description
ca Count of major blood vessels colored by fluoroscopy (0, 1, 2, 3, or 4).
thal Presence heart condition (0 = unknown, 1 = normal, 2 = fixed defect, 3 = reversible defect).

The Model: Output Layer Activation

    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
  • For binary classification, the final layer typically has 1 neuron.
  • Uses a sigmoid activation function.
  • Output range [0.0, 1.0] interpreted as prediction confidence.
  • Threshold determines final class.

The Model: Loss Function & Optimizer

    model.compile(
        loss='binary_crossentropy',
        optimizer='Adam',
        metrics=['accuracy']
    )
  • Loss Function: binary_crossentropy is standard for binary classification.
  • Optimizer: Adam is an adaptive learning rate optimization algorithm.
    • Adjusts learning rate for each parameter, often faster convergence.

The Model: Early Stopping

    tf.keras.callbacks.EarlyStopping(
        monitor='loss',
        min_delta=1e-3,
        patience=5,
    )
  • Purpose: Prevents overfitting and reduces training time.
  • Mechanism: Stops training when a monitored metric (e.g., loss) stops improving significantly.
    • monitor='loss': Watches the validation loss.
    • min_delta=1e-3: Minimum change in the monitored quantity to qualify as an improvement.
    • patience=5: Number of epochs with no improvement after which training will be stopped.

Your Turn: TensorFlow Lab

Now, it’s your turn to perform binary classification using TensorFlow Keras and deep neural networks!

04. Classification Project: Predicting Titanic Survivors

The Titanic Shipwreck Challenge

  • Goal: Achieve a high accuracy score in predicting passenger survival.
  • Application: A canonical challenge for applying binary classification.

Review: Types of Classification

What types of classification have we learned about?

Review: ML Tools for Classification

What tools have we learned about for classification?

Review: Evaluation Metrics

What metrics have we learned for evaluating classification models?

Review: Other Useful Techniques

What other useful techniques have we learned, and what are they used for?

Classification Project: The Data

Column Type Description
Survived number 1 or 0 ( target )
Name string Passenger name
Pclass number Ticket class
Sex string male or female
Age number Passenger age
SibSp number # of siblings/spouses on board
Embarked string Port of Embarkation

Classification Project: Kaggle Competition

Titanic: Machine Learning from Disaster

  • Engage with a global community of ML practitioners.
  • Upload your results to compare your model’s performance.

Classification Project: Your Turn

  1. Exploratory Data Analysis (EDA):
    • Understand the data, identify obvious problems, and perform initial cleaning.
    • Consider pros/cons of using ML for this problem.
  2. Model Building & Evaluation:
    • Choose your model (scikit-learn or TensorFlow).
    • Train and evaluate your model, discussing chosen metrics.
  3. Make Predictions & Upload to Kaggle:
    • Generate predictions for the test dataset.
    • Submit your predictions to the Kaggle competition.
  4. Iterate on Your Model:
    • Tweak hyperparameters, try different models, explore new features.
    • Discuss your methodical approach to improvement.
    • Research and compare with other solutions for deeper insights.