Image Classification: Data-driven Approach, k-Nearest Neighbor, train/val/test splits
Given: A set of images, each labeled with a single category (e.g., “cat”, “dog”, “car”).
Goal: Train a model to predict the category of new, unseen images.
ECE Relevance:
![]()
The core idea is simple: - Store all training data. - To classify a new point: Find the closest training example. - Assign the new point the label of its closest training example.
Important
How do we define “closest”? We use Distance Metrics!
Common choices for d(I_1, I_2) (distance between images I_1 and I_2):
Tip
In ECE, these distances are fundamental for comparing signals or data vectors.
Instead of just one neighbor, k-NN considers the k closest training examples. The label of the new point is determined by a majority vote among its k nearest neighbors.
Key Hyperparameter: k
k = 1 is the basic Nearest Neighbor.k > 1 provides a smoother decision boundary and can reduce sensitivity to noisy data points.Note
A small k can be sensitive to noise. A large k can blur boundaries.
Hyperparameters are settings that control the learning process, not learned from data.
For k-NN, key hyperparameters include:
k.Caution
Problem: If we choose k based on how well the model performs on the test data, we are essentially “cheating.” The model would seem to perform better than it would on truly unseen data. This is called overfitting to the test set.
To correctly tune hyperparameters, we divide our data into three distinct sets:

Important
Golden Rule: The Test Set is used only once, at the very end!
k# assume we have Xtr_rows, Ytr, Xte_rows, Yte (e.g., 50,000 CIFAR-10 images)
# Xtr_rows is 50,000 x 3072 matrix (image data)
# Ytr are the 50,000 corresponding labels
# 1. Split training data into a smaller training set and a validation set
Xval_rows = Xtr_rows[:1000, :] # Take first 1000 for validation
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :] # Use remaining 49,000 for actual training
Ytr = Ytr[1000:]
# 2. Iterate through different k values and evaluate on the validation set
validation_accuracies = []
for k in [1, 3, 5, 10, 20, 50, 100]: # Try various 'k' values
# (Imagine a NearestNeighbor class here with a 'train' and 'predict' method)
# nn = NearestNeighbor()
# nn.train(Xtr_rows, Ytr) # Model stores the (now smaller) training data
# Yval_predict = nn.predict(Xval_rows, k = k) # Predict on validation data for current 'k'
# acc = np.mean(Yval_predict == Yval) # Calculate accuracy
# --- Simplified for demonstration using placeholder values ---
# In a real scenario, the above commented lines would compute actual accuracy.
# For this interactive demo, assume dummy accuracy based on 'k'.
if k == 1: acc = 0.38
elif k == 3: acc = 0.42
elif k == 5: acc = 0.44
elif k == 10: acc = 0.41
elif k == 20: acc = 0.35
elif k == 50: acc = 0.28
elif k == 100: acc = 0.20
# --- End simplified part ---
print(f'k = {k}: accuracy = {acc:.4f}')
validation_accuracies.append((k, acc))
# 3. Choose the 'k' that gave the best accuracy on the validation set
best_k = max(validation_accuracies, key=lambda item: item[1])[0]
print(f"\nBest k on validation set: {best_k}")How it works (e.g., 5-fold CV):
N equal “folds” (e.g., 5).N times:
N-1 folds for training.N iterations.Tip
Common folds: 3, 5, or 10. More folds offer a better estimate but are more computationally expensive.

Note
Cross-validation is computationally more expensive. Choose between a single validation split and cross-validation based on dataset size, available computational resources, and the number of hyperparameters to tune.
Simplicity & Interpretability:
No Training Time:
Non-parametric:
Pixel-based L1 or L2 distances often correlate more with background and general color distribution than with semantic content.

Warning
The image on the left is the original. The three images next to it are all equally far away by L2 pixel distance. Notice: The L2 distance suggests they are equally similar, despite huge perceptual differences.
t-SNE (t-Distributed Stochastic Neighbor Embedding) helps visualize high-dimensional data in 2D or 3D, preserving local neighborhood structures.
Important
Observation: Images nearby in this embedding (meaning they are pixel-wise similar) are clustered by background/color, not by their semantic class (e.g., “dog”, “cat”, “car”).
The failure of pixel-wise distances indicates a need for more robust feature representations. ECE Connection: Image processing often involves filtering — a form of feature extraction.
Note
Convolution is a fundamental operation for extracting meaningful features from image and signal data.
How it works:
viewof k11 = Inputs.range([-1, 1], {step: 0.1, value: 0, label: "k[0,0]"});
viewof k12 = Inputs.range([-1, 1], {step: 0.1, value: -1, label: "k[0,1]"});
viewof k13 = Inputs.range([-1, 1], {step: 0.1, value: 0, label: "k[0,2]"});
viewof k21 = Inputs.range([-1, 1], {step: 0.1, value: -1, label: "k[1,0]"});
viewof k22 = Inputs.range([-1, 1], {step: 0.1, value: 4, label: "k[1,1]"});
viewof k23 = Inputs.range([-1, 1], {step: 0.1, value: -1, label: "k[1,2]"});
viewof k31 = Inputs.range([-1, 1], {step: 0.1, value: 0, label: "k[2,0]"});
viewof k32 = Inputs.range([-1, 1], {step: 0.1, value: -1, label: "k[2,1]"});
viewof k33 = Inputs.range([-1, 1], {step: 0.1, value: 0, label: "k[2,2]"});k and distance metric.test set overfitting.If you consider k-NN (perhaps not for images, but for other sensor data):
k and distance metrics on the validation set.