Machine Learning

5. Deep Learning: CNN, RNN, NLP

Imron Rosyadi

Convolutional Neural Network

The Biological Inspiration: Visual Cortex

In the 1960s, research by David Hubel and Torsten Wiesel revealed how the visual cortex processes visual information.

Neurons respond to specific regions of the visual field.
Each neuron has a “receptive field.”
Spatially close neurons have similar, overlapping receptive fields.

Receptive Fields and Image Formation

Our visual system integrates information from these small receptive fields.

This process forms the complete images we perceive.

This biological mechanism provided a key inspiration for Convolutional Neural Networks (CNNs).

Introduction to Convolutional Neural Networks (CNNs)

Inspired by the visual cortex, CNNs emerged in the 1980s.

CNNs are specialized neural networks for processing structured grid data, like images.

They incorporate unique layer types:

Convolutional Layers
Downsampling Layers
Pooling Layers

CNN Architecture Flexibility

The power of CNNs lies in their flexible architecture.

Different combinations and ordering of layers lead to varied results.

This allows for optimization based on the specific task.

Recall: The Perceptron

The simplest building block of a typical neural network.

A basic computational unit that processes inputs to produce an output.

Issues with Multi-Layer Perceptrons (Plain ANNs) for Images

Sensitivity to Input Changes:

Small shifts in image data can drastically alter learned parameters.
E.g., a cat’s position in an image should not change its recognition.

CNNs: A Solution for Image Data

CNNs introduce specialized layers to address the limitations of plain ANNs for image processing.

The processed output then feeds into a fully connected neural network.

Convolution: Analyzing Pixel Influence

A fundamental operation in CNNs.

It uses a filter (also called a kernel, mask, or convolution matrix) to analyze the influence of nearby pixels.

Convolution Example: Input Image

Consider a simple image: a rectangle with two shaded halves.

Pixel intensities are represented numerically.

Convolution Example: Blurring Filter

We apply a 3x3 filter designed to create a blurring effect.

Each element in the filter has a specific weight.

Convolution Example: Filter Application

The filter is applied by sliding it over the image.

At each position, it computes a new pixel value.

Convolution Example: Calculating a New Pixel Value

To calculate the new value for a pixel:

Center the filter on the target pixel.
Multiply corresponding values from the filter and the image.
Sum the products.

Convolution Example: Result of Blurring

The new pixel value is an average of its neighbors.

This averaging effect reduces sharp intensity changes, leading to blurring.

Handling Edges: Padding

When the filter reaches the image edge, padding is often used.

Commonly, the original image is padded with zeros around its borders.

Edge Handling: Relevant Filter Area

The padding ensures the filter can be centered on edge pixels.

Only the part of the filter overlapping the original image contributes to the sum.

Line Detectors: Gx and Gy Kernels

These kernels are designed to detect sharp intensity changes, indicating lines or edges.

\(G_x\): Detects vertical lines.
\(G_y\): Detects horizontal lines.

Line Detector Example: Input Image

An image with a vertical line where shading changes.

We will apply the \(G_x\) kernel to detect this line.

Line Detector Example: First Pixel

Applying \(G_x\) to the first 3x3 block yields 0.

No intensity change within this block.

Line Detector Example: Shifting Right

Shifting the \(G_x\) kernel one pixel to the right.

The kernel now straddles the intensity change.

Line Detector Example: Detecting the Edge

The calculation results in a non-zero value (\(200/9\)).

This indicates the presence of a vertical edge.

Line Detector Example: Further Shift

Shifting the \(G_x\) kernel one more pixel to the right.

The kernel is now fully on the right side of the edge.

Line Detector Example: Another Non-Zero

The calculation results in \(300/9\).

This continues to highlight the edge region.

Line Detector Example: Past the Edge

Shifting the \(G_x\) kernel one final pixel to the right.

The kernel is now past the intensity change.

Line Detector Example: Back to Zero

The calculation returns 0 again.

This demonstrates that \(G_x\) effectively detects vertical lines by identifying sharp horizontal intensity transitions.

CNNs Learn Features

In CNNs, the kernel values are learned during training.

The network automatically identifies important features.

We don’t explicitly tell the model what to look for (e.g., “vertical lines”).

Pooling

A downsampling technique often applied after convolution.

Objective: Reduce data size without losing critical information.

Window Size: Select a region (e.g., 2x2 or 3x3).
Stride: Define movement step (e.g., 2 pixels).
Window Movement: Slide the window across filtered images.
Value Selection: Take the maximum (Max Pooling) or average (Average Pooling) value within each window.

Hyperparameters in CNNs

While CNNs learn many parameters, users define several key hyperparameters:

Convolution:
- Number of filters (features)
- Size of filters

Pooling:
- Window size
- Stride

Fully Connected Layers:
- Number of nodes

Also, the number and order of each layer type.

Your Turn: CNNs in Practice

In the lab, you will build and experiment with a Convolutional Neural Network.

You’ll apply these concepts to a practical image classification task.

Recurrent Neural Networks (RNNs)

Beyond Feedforward: Introducing RNNs

Previous deep neural networks were primarily feedforward.

Data flows in one direction.
Weights adjusted via backpropagation.

Recurrent Neural Networks (RNNs) introduce a new dynamic.

Not strictly feedforward.
Designed for sequential data.

Note

RNNs excel in tasks where the order of data points is crucial, such as time series or natural language.

The Feedforward Neuron

Receives inputs from the previous layer.
Multiplies inputs by weights, adds bias.
Passes sum through an activation function.
Output goes to the next layer.

The Recurrent Neuron

A feedforward neuron with a crucial addition:

Its output feeds back into its own inputs.
This creates a “memory” over time.
Allows processing of sequential data.

Unrolling a Recurrent Neuron Over Time

Visualizing the data flow:

Starts with a seeded input (often zero).
At each time step, it processes current input and its own previous output.
Passes data to the next layer and forward in time to itself.

The Long Short-Term Memory (LSTM) Neuron

Addresses the “short memory” problem of standard RNNs.

Passes two weights back to itself:
- Long-term memory
- Short-term memory
Uses “gates” (forget, input, output) to control information flow.

With a typical recurrent neural network, the network tends to have a very short memory. As the sequences passing through the network get longer, the network forgets what it first saw. There have been a few strategies to get around this, one of which is the “long short term memory” (LSTM) neuron.

On this slide you can see a very simplified LSTM cell. If you look at the horizontal center, you can see the standard neuron: X-in, y-out. However, instead of having a single feedback like a standard recurrent neuron, this neuron passes two weights back to itself. One represents the long-term member, and the other represents the short-term member.

You can see that the short-term state gets mixed with the weights in a set of activation functions labelled A1 through A4. The outputs of these functions, as well as the long-term state, then get passed through a series of gates that ultimately lead to the output of a new y, c, and h value.

The numbered gates in order are:
1. The forget gate
2. The input gate
3. Addition of the forget and input gate
4. The output gate

LSTM cells are often higher-performing than standard recurrent cells. They also often train faster than standard recurrent cells.

Other Recurrent Neuron Types

Gated Recurrent Unit (GRU) Neuron:
- Simpler than LSTM, with a single feedback channel managing both short- and long-term state.
- Often performs comparably to LSTMs with fewer parameters.
Convolutional Neurons:
- Can also be adapted for sequence tasks.
- Effective in identifying local patterns within sequences.

What Are RNNs Good For?

RNNs excel in tasks involving sequential data:

Language Translation
Sequence Prediction (e.g., stock prices, weather)
Sequence Generation (e.g., music, text)
Tagging (e.g., video annotation)
Summarization (e.g., text summarization)

… and many more!

Sequence Prediction

RNNs are particularly strong in sequence prediction.

Unlike earlier models, RNNs inherently consider the temporal dependency of data.

Previous models often assumed time-independent data.

Important

For example, predicting future sensor readings from past measurements in an ECE system.

Time Series Data

Time series data is an ordered set of data points indexed by time.

The inherent ordering makes it ideal for RNNs.

What Are We Predicting?

Sequence prediction aims to forecast future values based on historical data.

Example: Predicting the next quarter’s performance from a year of data.

RNNs for Sequence Prediction: A New Tool

Traditional Approach: Statistical methods (e.g., ARIMA, Markov chains)
- Often involve numerous assumptions.
Machine Learning & RNNs: A largely non-parametric approach.
- Data “speaks for itself,” fewer assumptions.
- Requires more data for optimal performance.

Examples of Sequence Prediction

Example: Stock Price Prediction

Predicting future stock prices based on historical market data.

A complex task due to market volatility, but RNNs can capture trends.

Example: Weather Forecasting

Forecasting weather patterns based on past meteorological data.

While complex, RNNs can identify subtle temporal dependencies.

Example: Predicting Passenger Traffic

Predicting daily traveler numbers at a train station.

RNNs can learn seasonality (weekdays vs. weekends, holidays).

Requires sufficient historical data to capture these patterns.

Your Turn: RNNs for Vibration Prediction

In the lab, you will use an RNN to predict a sequence of vibration readings from an engine.

You’ll apply TensorFlow with Keras to build, test, and tune your model.

Natural Language Processing (NLP)

What is Natural Language Processing?

The interaction between computers and human (natural) language.

Enables computers to understand, interpret, and generate human language.

NLP Applications in Everyday Life

Autocorrect & Predictive Text
Translation Services (e.g., Google Translate)
Parsing Text (e.g., extracting information)
Chatbots & Virtual Assistants
Question Answering Systems
Speech Recognition

… and so much more!

Character vs. Word-Level Models

Character-Level Models:
- Process text one character at a time.
- Can handle out-of-vocabulary words and typos.

Character vs. Word-Level Models (Cont.)

Word-Level Models:
- Process text one word at a time.
- More common for English and similar languages.
- Often faster to train and perform well.

Which is better? Depends on the language and use case.

Text Processing: Regular Expressions (Regex)

A powerful tool for pattern matching in strings.

Used for extracting, validating, or modifying text.

regex	matches
`[wW]ood`	wood, Wood
`beg.n`	begin, begun, beg3n
`o+h`	oh, oooooh
`[^a-zA-Z]`	a single non-alpha character

Before machine learning, NLP problems were usually solved by pattern matching. Even now, these text processing techniques can be very important in processing messy natural language.

Regular expressions are widely used in text processing. Imagine needing to extract all the email addresses from a block of text or remove prefixes/suffixes from a word. A regex defines a pattern that is used to match certain character combinations, following a set of rules. In this table we show a few examples of pattern matching rules:
* “.” matches any single character
* “+” matches 1 or more of the previous character
* “[^...]” negates the rest of the pattern in the brackets

Regex rules can be very powerful but also very complex. Many guides exist for effectively using regexes: https://www.rexegg.com/regex-quickstart.html

Text Processing: Minimum Edit Distance

Also known as Levenshtein distance.

Measures the minimum number of single-character edits (insertions, deletions, substitutions) required to change one string into another.
Crucial for autocorrect, spell checkers, and evaluating text generation systems.

Feature Extraction in NLP

Transforming raw text into informative numerical features.

N-grams: Consider sequences of n words.
- Captures context beyond single words (e.g., “not horrible”).
TFIDF (Term Frequency-Inverse Document Frequency):
- Determines word importance in a document.
- Discounts common words like “the” or “and.”

Tip

N-grams help capture local word order, while TFIDF helps identify unique and important terms.

Before neural networks, the first step in NLP was “feature extraction,” or transforming raw text into informative features. The idea is that just the individual words in a text do not fully capture the meaning of the text.

One very common feature extraction technique is n-grams, which consider n-word sequences instead of just individual words. In the original sentence “that movie was not horrible,” the word “horrible” may cause a model to predict very strong negative emotion. But, if we extract bigrams (2-grams), then we would correctly pair “not horrible,” which is a much milder emotion.

Another common technique is TFIDF, which calculates how important a word is to a text. This often has the effect of ignoring more common words like “the” and letting the model focus on more unique words in the text.

Language Modeling: Bag-of-Words

Simplest language modeling approach.

Treats a sentence as an unordered collection of words.

Example: “I love love loved it!” and “I HATED it :-(”
- Meaning can often be inferred without word order.

To build models for NLP tasks, we must have some notion of how words fit together into sentences and text. Language modeling refers to determining how likely a certain sentence is. The simplest language modeling approach is a bag-of-words: treat a sentence like an unordered collection (set) of words.

Take an example movie review, “I love love loved it!”, and another, “I HATED it :-(”. As humans, we could deduce which review corresponded to a positive sentiment and which review corresponded to a negative sentiment, even if we looked at these sentences out of order (e.g., “it! I loved love love” and “HATED :-( I it”). So bag-of-words is like saying, “I’m pretty sure I can glean the meaning of sentences, with words in any order, so why bother keeping track of the order? Sounds like more work to me.”

But can you think of an example or two where this strategy would fail? Especially consider if you’re trying to predict more than just two sentiments (“good” and “bad”). Prompt class for discussion.

Language Modeling: Sequential Words

While bag-of-words is effective for some tasks (e.g., spam filtering), word order is crucial for complex NLP.

Sequential approaches preserve word order.

This is where Recurrent Neural Networks (RNNs) become indispensable.

The NLP Processing Pipeline

Understanding the typical flow for NLP tasks:

graph TD
    A["Raw Text"] --> B{"Feature Extraction / Embeddings"};
    B --> C["Machine Learning Model"];
    C --> D["Supervised Task (e.g., Classification, Generation)"];

Your Turn: NLP Lab

In the lab, you will perform sentiment analysis on reviews.

You will then build a classifier to determine authorship (e.g., Jane Austen vs. Charles Dickens).

Conclusion and Q&A

We’ve explored Convolutional Neural Networks for image processing, Recurrent Neural Networks for sequential data, and key concepts in Natural Language Processing.

These deep learning techniques are fundamental to many modern ECE applications.

What questions do you have about these powerful tools and their applications in engineering?