Machine Learning

CNN Model Zoo

Imron Rosyadi

Convolutional Neural Networks (CNNs) in ECE

Welcome to the fascinating world of Convolutional Neural Networks!

Today, we’ll dive into advanced CNN architectures commonly used in ECE for tasks like:

  • Image/Video Processing: Object detection, facial recognition
  • Signal Processing: Anomaly detection, medical imaging
  • Robotics & Autonomous Systems: Perception, navigation

What makes CNN architectures “Advanced”?

Advanced CNNs are designed to overcome limitations of simpler networks:

  • Deeper Networks: Learn more complex features.
    • Challenge: Vanishing/exploding gradients, computational cost.
  • Efficiency: Achieve high accuracy with fewer parameters/computations.
    • Crucial for: Embedded systems, mobile devices (common in ECE).
  • Better Generalization: Perform well on unseen data.
    • Important for: Robust real-world ECE applications.

Tip

These architectures often introduce innovative layers or connections to manage depth and efficiency.

Key Concepts Revisited: Convolutions and Pooling

Before diving into complex models, let’s quickly recap:

Convolutional Layer

Extracts features by sliding a filter (kernel) over the input.

\[ (I * K)(i, j) = \sum_m \sum_n I(i-m, j-n) K(m, n) \]

  • Parameters: Filter size, number of filters, stride, padding.
  • Output: Feature maps revealing specific patterns.

Pooling Layer

Reduces spatial dimensions, making the representation smaller and more manageable.

  • Max Pooling: Selects the maximum value from a region.

  • Average Pooling: Computes the average value from a region.

  • Benefits: Reduces parameters, controls overfitting, makes the network invariant to small shifts.

Interactive Convolution Visualization

Let’s visualize how a filter slides over an input!

Model Zoo: Advanced CNN Architectures

Now, let’s explore some of the most influential advanced CNN architectures available in TensorFlow.

Each model introduces unique strategies to build deeper, more efficient, and more accurate networks.

  • VGG (Visual Geometry Group)
  • ResNet (Residual Network)
  • Inception (GoogLeNet)
  • Xception (Extreme Inception)
  • MobileNet (Mobile-first design)
  • EfficientNet (Compound Scaling)

Note

These models are often pre-trained on large datasets like ImageNet, providing a powerful starting point for various ECE applications via transfer learning.

VGG16 / VGG19

Developed by the Visual Geometry Group at Oxford, known for its simplicity and uniformity.

  • Architecture: Stacks 3x3 convolutional layers with 2x2 max-pooling layers.
    • VGG16 has 16 layers, VGG19 has 19 layers (counted as weight layers).
  • Key Idea: Proved that very deep networks with small filters (3x3) could achieve state-of-the-art performance.
  • Parameters: Very high (VGG16 ~138M, VGG19 ~143M).
    • Downside: Computationally expensive and memory-intensive.

VGG-16 and VGG-19

ResNet (Residual Network)

Introduced by Microsoft Research, solving the vanishing gradient problem in very deep networks.

  • Architecture: Features “skip connections” or “residual blocks”.
    • Allows gradients to flow directly through the network.
    • Enables training networks with hundreds or even thousands of layers (e.g., ResNet-50, ResNet-101, ResNet-152).
  • Key Idea: Instead of learning direct mappings, layers learn residual mappings.
    • \(H(x) = F(x) + x\), where \(F(x)\) is the residual function.
  • Parameters: ResNet-50 ~25M parameters. Much more efficient than VGG.

ResNet (Residual Network)

ResNet (Residual Network)

Inception (GoogLeNet)

Developed by Google, emphasizing efficiency and “computational budget.”

  • Architecture: Uses Inception Modules (or blocks) that perform multiple parallel convolutions with different kernel sizes (1x1, 3x3, 5x5) and pooling.
    • 1x1 convolutions (bottleneck layers) are used to reduce dimensionality before larger convolutions, saving computation.
  • Key Idea: Allow the network to learn multiple feature representations at once, then concatenate them. Optimizes “width” and “depth” simultaneously.
  • Parameters: Very low for its accuracy (GoogLeNet ~5M parameters).
    • Highly efficient for deployment in real-time ECE systems.

Inception (GoogLeNet)

Inception (GoogLeNet)

Xception (Extreme Inception)

Proposed by Google, building on the Inception idea by replacing standard convolutions with depthwise separable convolutions.

  • Architecture: Inception modules are replaced with depthwise separable convolutions.
    • Depthwise Conv: Applies a single filter to each input channel independently. For example, if an image has three color channels (red, green, and blue), a separate filter is applied to each color channel.
    • Pointwise Conv: A 1x1 convolution projects the output of the depthwise operation into a new channel space. This is a 1×1 filter that combines the output of the depthwise convolution into a single feature map.
  • Key Idea: Separating spatial and channel-wise correlations.
    • More efficient parameter usage and computation than traditional convolutions.
  • Parameters: Xception ~22.9M parameters.
    • Achieves comparable or better accuracy than Inception with fewer parameters and FLOPs.

Xception (Extreme Inception)

Xception

MobileNet (V1, V2, V3)

Designed by Google specifically for mobile and embedded vision applications.

  • Architecture: Primarily uses depthwise separable convolutions, similar to Xception.
    • MobileNetV2 introduces “Inverted Residuals” and linear bottlenecks to improve efficiency and avoid information loss.
    • MobileNetV3 further optimizes through NAS (Neural Architecture Search) and new activation functions.
  • Key Idea: Achieve high accuracy with extremely low latency and small model size.
  • Parameters: MobileNetV1 ~4.2M, MobileNetV2 ~3.5M.
    • Crucial for: Real-time processing on edge devices, a core ECE application area.

MobileNet (V1, V2, V3)

MobileNet

EfficientNet (B0 to B7)

Developed by Google, achieving state-of-the-art accuracy with significantly fewer parameters and FLOPs than previous models.

  • Architecture: Uses a compound scaling method to uniformly scale width, depth, and resolution of the network.
    • Scales up from a baseline model (EfficientNet-B0) to larger versions (B1-B7).
  • Key Idea: It found a “recipe” for scaling CNNs more efficiently than arbitrary scaling, leading to better accuracy and efficiency trade-offs.
  • Parameters: EfficientNet-B0 has ~5.3M parameters, B7 ~66M.
    • Outperforms ResNets and Inception variants with orders of magnitude fewer parameters and FLOPs.

EfficientNet (B0 to B7)

EfficientNet

Conclusion: Choosing the Right CNN for ECE

Selecting a CNN architecture largely depends on your specific ECE application requirements:

  • VGG: Good for understanding basic depth, but often too heavy for deployment.
  • ResNet: Excellent for very deep networks, good accuracy. A strong general-purpose choice.
  • Inception / Xception: Great for balancing accuracy and efficiency, especially with depthwise separable convolutions.
  • MobileNet: Your go-to for edge devices and real-time mobile applications.
  • EfficientNet: Achieves state-of-the-art results with remarkable efficiency, often the best choice when pushing performance limits.

Important

Always consider the trade-off between accuracy, inference speed, model size, and computational power available on your target hardware.

Further Exploration & Discussion

  • How might these different architectures perform on custom datasets specific to ECE applications (e.g., medical images, SAR data, sensor readings)?
  • What are the challenges of deploying these models on FPGAs or custom ASICs in ECE systems?
  • Beyond classification, how are these models adapted for tasks like object detection, segmentation, or robotics in your field?

Tip

TensorFlow Keras Applications Documentation: tf.keras.applications This is your starting point for loading pre-trained models.