Machine Learning

03 Regression: TensorFlow & Neural Networks

Imron Rosyadi

05. Introduction to TensorFlow

An end-to-end open source machine learning platform

What Is TensorFlow Good For?

Neural Networks: Advanced architectures, key for modern ML breakthroughs.
Distributed Computing: Handles massive datasets across multiple machines.
GPU and TPU Support: Specialized hardware acceleration for faster training.

We’ve been humming along pretty nicely performing machine learning tasks with NumPy, Pandas, and scikit-learn. Is TensorFlow really necessary?

We have been able to do quite a bit with the tools that we’ve seen so far. What TensorFlow adds to the equation is better support for neural networks. Neural networks are the technology behind many of the breakthroughs in machine learning we’ve seen in recent years. We’ll learn more about neural networks soon.

TensorFlow also provides support for distributed computing. Machine learning algorithms thrive with big data. TensorFlow helps you process massive amounts of data, across many machines if necessary.

TensorFlow also provides support for graphical processing units (GPUs) and tensor processing units (TPUs). These are specialized microprocessors that can really accelerate machine learning.

That being said, TensorFlow isn’t the only toolkit that fills this space. Other options like Torch and Microsoft Cognitive Toolkit (CNTK), as well as many others, provide powerful machine learning capabilities.

Tensor

An N-dimensional array of data

Tensor

So where does the name TensorFlow come from?

In math, a simple number like 3 or 5 is called a scalar.

A vector is a one-dimensional array of numbers. In physics, a vector is something with magnitude and direction. In computer science, you use vector to mean 1D arrays.

A two-dimensional array is a matrix.

A three-dimensional array? These can be called cubes.

And four-dimensional? That is typically just called a 4d or Rank-4 tensor.

But it doesn’t have to stop there. You can create tensors with an arbitrarily high number of dimensions.

So we now understand why the “tensor” part of the name exists, but what about “flow?”

Typically a sequence of operations is performed on tensors in a model. These tensors “flow” through the graph that constitutes the model, hence “TensorFlow.”

TensorFlow: Graphs

Graphs

TensorFlow: Graphs

Graphs

TensorFlow: Graphs

Graphs

TensorFlow: Versions

TensorFlow 1

Lazy execution by default
Awkward programming model

TensorFlow 2

Eager execution by default
Keras programming model

Version 1 of TensorFlow really emphasized the concept of graphs. It used a “lazy” execution model where you build a graph completely before anything is run. This graph was then put into a session where data was passed through the model.

This programming model worked, but it was a little clunky. Luckily, a library called Keras showed that machine learning models could be built and trained using a more natural eager execution model.

TensorFlow 2 was officially released in late 2019. TensorFlow 2 still supports much of the older programming model through a compatibility layer, but, if possible, new programs should be written in TensorFlow 2’s Keras API style.

TensorFlow 1 placed more of an emphasis on the concept of estimators (similar to scikit-learn). They are still supported in TensorFlow 2 and will continue to be for the indefinite future.

TensorFlow Is Separated Into Abstraction Layers

TensorFlow Abstraction

Your Turn!

Note

Exercise: Explore basic tensor operations in TensorFlow.

06. Linear Regression With TensorFlow

But Why?

Scalability: Handling huge datasets and distributed training.
Unified Ecosystem: Prepares for advanced deep learning tasks.
Learning Tool: Practice with familiar concepts in a new framework.

`LinearRegressor`

An implementation of `Estimator`

`LinearRegressor`

import tensorflow as tf

# Define feature columns (e.g., numeric_column, categorical_column_with_vocabulary_list)
feature_columns = [
    tf.feature_column.numeric_column("feature1"),
    tf.feature_column.numeric_column("feature2")
]

lr = tf.estimator.LinearRegressor(
    feature_columns=feature_columns
)

# Dummy input functions for demonstration
def training_input():
    features = {"feature1": [1.0, 2.0], "feature2": [10.0, 20.0]}
    labels = [30.0, 40.0]
    return tf.data.Dataset.from_tensor_slices((features, labels)).batch(1)

def testing_input():
    features = {"feature1": [3.0, 4.0], "feature2": [30.0, 40.0]}
    return tf.data.Dataset.from_tensor_slices(features).batch(1)

lr.train(input_fn=training_input, steps=1) # Train for a single step for demo

p = lr.predict(input_fn=testing_input)
# print(list(p)) # Uncomment to see predictions

`LinearRegressor`: Training Function Details

import tensorflow as tf
import pandas as pd

# Dummy DataFrame for demonstration
training_df = pd.DataFrame({
    "MedInc": [1.0, 2.0, 3.0, 4.0, 5.0],
    "HouseAge": [10.0, 20.0, 30.0, 40.0, 50.0],
    "target_charges": [100.0, 150.0, 200.0, 250.0, 300.0]
})
feature_columns = ["MedInc", "HouseAge"]
target_column = "target_charges"

def training_input():
  ds = tf.data.Dataset.from_tensor_slices((
    {c: training_df[c].values for c in feature_columns},  # feature map
    training_df[target_column].values                     # labels
  ))
  ds = ds.repeat(100)           # Repeat data 100 times
  ds = ds.shuffle(buffer_size=10000) # Shuffle data for better training
  ds = ds.batch(100)            # Process in mini-batches of 100
  return ds

# Example usage (not run, just definition)
# input_dataset = training_input()
# for element in input_dataset.take(1):
#    print(element)

Here you can see what an input function might look like. The function:

Creates a Dataset object. This particular Dataset is just wrapping a bunch of Pandas Series objects, but Dataset can represent other data acquisition and storage strategies.
Sets the number of times to pass the data to the model. Remember that our models will be using an optimizer to try to find good weights. In order to do this, it helps to pass the data to the model a few times.
Shuffles the data between repeats.
Defines the mini-batch size. This is the number of data points that will be passed to the model in each training step.

Note that repetition and batch are hyperparameters that you can change in the model. You might find that you don’t need to repeat the data as much or that you need to repeat it more. You might find that smaller batches work better than big batches.

`LinearRegressor`: Optimizer

import tensorflow as tf

# Example feature columns
feature_columns = [tf.feature_column.numeric_column("x")]

# Create an Adam optimizer with a specific learning rate
adam_optimizer = tf.keras.optimizers.Adam(
  learning_rate=0.001,
  epsilon=1e-08 # Added for Keras compatibility
)

# Instantiate LinearRegressor with the custom optimizer
linear_regressor = tf.estimator.LinearRegressor(
    feature_columns=feature_columns,
    optimizer=adam_optimizer,
)

# You would then call .train() and .predict() on linear_regressor
# print(linear_regressor) # Uncomment to inspect the estimator

`LinearRegressor`: Distribution

# Dummy for conceptual demonstration - actual distribution
# requires a multi-device setup not available in pyodide.
import tensorflow as tf

# Example feature columns
feature_columns = [tf.feature_column.numeric_column("x")]

# Define a distributed strategy (conceptually)
# This part won't execute effectively in pyodide, but shows the API.
try:
    mirrored_strategy = tf.distribute.MirroredStrategy()
    config = tf.estimator.RunConfig(
        train_distribute=mirrored_strategy,
        eval_distribute=mirrored_strategy,
    )
except RuntimeError as e:
    print(f"Distribution Strategy initialization skipped in Pyodide: {e}")
    config = None # Fallback if strategy cannot be initialized

linear_regressor = tf.estimator.LinearRegressor(
    feature_columns=feature_columns,
    config=config,
)

# print(linear_regressor) # Uncomment to inspect

Your Turn! Predicting Housing Prices

Important

Lab: Apply LinearRegressor to predict housing prices using the California census data.

07. Neural Networks

Neural Networks: Good?

Self-driving Car

Neural Networks: Bad?

Decepticons

Neural Networks: Hype?

Hype

And finally, there are those who think deep learning and neural networks are just hype. For every person who thinks a technological revolution is around the corner, there is another pointing out how specialized and controlled the environment has to be for machine learning algorithms to perform well.

Deep learning doesn’t progress at an even pace. We are currently in a deep learning boom, but this has happened before. There have been a few “AI winters” where researchers thought that we were on the cusp of a revolution, only to have research in neural networks go dormant for a while.

We’d like to think that this time might be different. Computation is finally fast enough and has enough scale that algorithms designed decades ago can finally be implemented and trained in an effective manner.

Only time will tell if deep learning can live up to expectations. What we can do now is learn about it, be thoughtful about how we train and use it, and continue to innovate cautiously.

History & Motivation

Neural Networks: Inspired by Nature

Inspired by Nature

Neural Networks: Inspired by Nature

Neuron in our Body

Similar to the examples in the last slide, neural networks are inspired by nature. The brain contains a massive network of neurons that send electrical signals that activate other neurons. Through this network we are able to think.

This is the building block of the brain: a neuron.

A neuron is just a cell with a nucleus and cell body like any other cell. One of the distinguishing features of the neuron is the ‘axon,’ which is the long tail of the neuron. The tip of the axon has synaptic terminals that attach to other neuron bodies. A neuron body receives signals from the synapse of neurons before it. When those signals reach a critical point within a fixed period of time, the receiving neuron fires, sending a signal to later neurons.

Neural networks were inspired by neurons and connections between neurons in the brain, hence the name.

Neural Networks: Inspired by Nature

Web of neurons (neural networks)

Neural Networks: Cutting Edge?

Einstein?

Artificial Neural Networks (ANN)

Computational networks inspired by biological systems.
Feed-forward networks: Information flows in one direction.
Backpropagation: Algorithm for training ANNs by adjusting weights.
Specific types: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN).

Artificial Neural Networks (ANN)

ANN architecture

These are the typical diagrams you see to depict an artificial neural network. On the left we have our “input layer.” This is where we feed our feature data into the model. In these two diagrams, there are three features (depicted by the two blue dots on the far left of the schematic).

The feature information then flows into “hidden layers.” In these hidden layers, mathematical operations are performed to extract patterns from the feature data. We’ll talk more about this math on future slides.

Finally, the transformed feature data flows to the output layer, which returns our predicted target values.

The main idea is that if neurons in one layer “fire.” Then, using the connections to the next layer, we can determine which neurons in the next layer will fire. For now, it is useful to think of a neuron firing as a 1 and not firing as a 0. It is true that more sophisticated neural networks take into account the intensity of a “fire” (i.e., fired at 50% vs fired at 100%), but for the sake of discussion, let’s stick with the 1 or 0 model.

Perceptron

Simplest neural network: perceptron

Perceptron: The Math

Perceptron math

The core computation: \[ \text{sum} = \sum_{i=1}^{m} w_i x_i + b = W^T X + b \] The result sum then goes through an activation function \(f(\text{sum})\) to produce the output.

Perceptron: The Math

Interactive Perceptron Demo

Adjust inputs and weights to see the output.

viewof x1 = Inputs.range([0, 1], {step: 1, label: "Input x1"});
viewof x2 = Inputs.range([0, 1], {step: 1, label: "Input x2"});
viewof w1 = Inputs.range([-2, 2], {value: 1, step: 0.1, label: "Weight w1"});
viewof w2 = Inputs.range([-2, 2], {value: 1, step: 0.1, label: "Weight w2"});
viewof bias = Inputs.range([-2, 2], {value: -0.5, step: 0.1, label: "Bias b"});

The green and blue compartments show the computations taking place in the connections between the input layer and output layer of a perceptron.

The features are denoted by \(x_{i}\). The weights \(w_{i}\) are playing the same role as the weights in our linear regression model. If we build a weight vector \(W = [w_{1}, w_{2}, ..., w_{m}]\) and a feature vector \(X = [x_{1}, x_{2}, ..., x_{m}]\), then the green computation is simply \(W^{T}X + b\) (which is exactly the same as the target in a regression model: bias + \(w_{1}x_{1}\) + \(w_{2}x_{2}\) + … + \(w_{m}x_{m}\)).

This information is then sent to an “activation function,” which uses the information from the green computation to determine whether or not the next neuron should fire. In a linear regression example, the activation function might be \(f(x) = x\). In other words, the activation function plays no role. But let’s look at a slightly more interesting example and walk through these details in a little more depth.

The interactive demo on the right allows you to play with the inputs, weights, and bias to see how the neuron’s output changes based on the calculated weighted sum and a simple step function. This demonstrates the fundamental logic of a perceptron.

Perceptron Example: Predicting ML Study

Perceptron example

Perceptron Example: Predicting ML Study

\(x_1\): Will make more money? (1=Yes, 0=No)
\(x_2\): Loves programming/math? (1=Yes, 0=No)
\(x_3\): Has project benefiting from ML? (1=Yes, 0=No)

\[\sum_{i=1}^{3} w_i x_i - \text{threshold} \geq 0 \implies \text{Studies ML (1)}\] \[\sum_{i=1}^{3} w_i x_i - \text{threshold} < 0 \implies \text{Does Not Study ML (0)}\]

Suppose we want to predict whether an individual will start studying machine learning. Our features are given by: \(x_{1}\) = will the person make more money? \(x_{2}\) = does the person love programming and mathematics? \(x_{3}\) = does the person have a project that would benefit from ML?

We compute \(W^{T}X = w_{1}x_{1} + w_{2}x_{2} + w_{3}x_{3} + bias\).

Now assume that we will say “yes”: the person will study machine learning if the result is \(\geq 0\) and “no”: the person will not study machine learning if the result is \(< 0\).

It might be helpful to flip back to the previous slide and explain that the specific activation function we’re working with in this example is \(f(x) = 1\) if \(W^{T}X + b \geq 0\) and \(f(x) = 0\) if \(W^{T}X + b < 0\). Also, for notational convenience, we flip the sign of b and write \(w_{1}x_{1} + w_{2}x_{2} + w_{3}x_{3} - b\) going forward. If we use this model, then the algorithm will learn a negated form of b.

That is, we ask is \(w_{1}x_{1} + w_{2}x_{2} + w_{3}x_{3} - bias \geq 0\)? Which is the same as asking is \(w_{1}x_{1} + w_{2}x_{2} + w_{3}x_{3} \geq b\). For convenience, we have relabeled b as -b.

Machine Learning Process (Review)

Infer/Predict/Forecast: Use the model to make predictions.
Calculate Error/Loss/Cost: Quantify prediction inaccuracy.
Train/Learn: Adjust model parameters (weights, biases) to minimize error.
Iterate: Repeat until a stopping condition is met.

Perceptron Example: Weights & Bias

Perceptron example: weights and bias

Perceptron Example: Weights & Bias

\(w_1 = 2\), \(w_2 = 2\), \(w_3 = 6\)
Threshold = 5

\[\text{Predict } 1 \text{ if } 2x_1 + 2x_2 + 6x_3 \geq 5\] \[\text{Predict } 0 \text{ if } 2x_1 + 2x_2 + 6x_3 < 5\]

Perceptron Example: Kelly’s Input

Perceptron example: input

Perceptron Example: Kelly’s Input

\(x_1 = 0\) (Won’t make more money)
\(x_2 = 0\) (Doesn’t love programming/math)
\(x_3 = 1\) (Has a project benefiting from ML)

Perceptron Example: Kelly’s Prediction

Perceptron example: prediction

Perceptron Example: Kelly’s Prediction

\[2(0) + 2(0) + 6(1) = 6\]

Since \(6 \geq 5\), the model predicts: YES, Kelly will study ML!

Perceptron Example: Riley’s Input

Perceptron example: input

Perceptron Example: Riley’s Input

\(x_1 = 1\) (Will make more money)
\(x_2 = 1\) (Loves programming/math)
\(x_3 = 0\) (No project benefiting from ML)

Perceptron Example: Riley’s Prediction

Perceptron example: prediction

Perceptron Example: Riley’s Prediction

\[2(1) + 2(1) + 6(0) = 4\]

Since \(4 < 5\), the model predicts: NO, Riley will not study ML.

Perceptron Example: Learning Process

Perceptron example: learning process

Perceptron Example: Learning Process

Kelly: Prediction = 1 (Correct, actual = 1)
Riley: Prediction = 0 (Incorrect, actual = 1)

The model needs to adjust weights and bias to correctly predict for Riley. This involves optimization (e.g., gradient descent) and backpropagation (applying the chain rule to update weights across layers).

But how does the model actually update the weights and bias during the learning process?

Let’s look back at our example. Note that both of these samples were technically training data. From our dataset, we know that both Kelly and Riley did study ML (y=1), but for Kelly we predicted \(\hat{y} = 1\), and for Riley we predicted \(\hat{y} = 0\). So Kelly’s prediction was correct, while Riley’s was not correct.

Now the model needs to adjust the weights. It seems like if a person stands to make more money from studying ML AND they love programming and math, then the model should predict a 1 (whether or not they have a current project that would benefit from ML).

So the model needs to update the weights and bias via some optimization algorithm like gradient descent. In order to compute the derivative (gradient) to discern the direction of steepest descent, we will need to unravel the many compositions of matrix multiplication. If you remember your calculus, how do we take the derivative of a composition? The chain rule! That is effectively what backpropagation does. It is a way to compute the gradient when many chain rules are involved through each layer of the network.

Machine Learning Process (Neural Networks)

Infer/Predict/Forecast: Compute \(f(X, W, B)\), involving compositions of activation functions and many matrix multiplications across layers.
Calculate Error/Loss/Cost: Use metrics like MSE, MAE to quantify discrepancy between predicted and actual values.
Train/Learn (Optimization):
- Adjust \(W\) and \(B\) in the direction that minimizes cost.
- This direction is typically found via gradient descent.
- Gradients for complex networks are computed efficiently using the chain rule, implemented through backpropagation.
Iterate: Repeat steps 1-3 until the model converges or a stopping condition (e.g., max epochs) is met.

Issues with this plan?

The simple step function:

\[ f(x) = \begin{cases} 1 & \text{if } x \geq 0 \\ 0 & \text{if } x < 0 \end{cases} \]

Drawbacks:

Not differentiable at 0: Problematic for gradient descent.
Zero gradient elsewhere: \(f'(x) = 0\) for \(x \neq 0\), hindering learning.
Binary output only: No confidence scores or continuous values.

There are many possible activation functions, and some work better than others in certain situations.

Let’s take a closer look at the activation function we used in our simple example. This function is called a step-function.

There are a few drawbacks to using the step-function. * \(f\) is not differentiable at 0. This could create problems for gradient descent when we need to take a derivative. * \(f'(x)\) is 0 whenever \(x\) is not 0. This could also create problems for gradient descent. If we ever multiply by \(f'(x)\), the entire function will go to 0, which means no slope. So it can be hard to determine the direction of steepest descent. * \(f\) only returns a no or a yes. It would be preferable for \(f\) to return a continuous value between 0 and 1. For example, if \(f\) returned .9, then we would say that we’re 90% confident the answer is “yes, this person will study ML.” That is far more powerful than just returning a “yes” or “no.” We will discuss this further in the section on classification.

Sigmoid Activation

Sigmoid

A differentiable function that “squashes” values between 0 and 1.
Addresses limitations of the step function.

Activation Functions

List of Activation Functions

Crucial for introducing non-linearity to the network.
Enables learning complex patterns.
Many types: ReLU (Rectified Linear Unit), Tanh, Leaky ReLU, etc.

08. Regression With TensorFlow (Keras)

Keras

The Python Deep Learning Library

High-level API for quickly building and training ML models.
Integrates seamlessly with TensorFlow 2.
Simplifies complex deep learning model design.

Keras: Sequential Models

from tensorflow import keras

# An empty sequential model
model = keras.Sequential()

# model.add(some_layer) # Layers can be added later
# print(model) # Uncomment to see model summary

Linear stack of layers: Each layer feeds directly into the next.
Ideal for simple feed-forward networks where data flows in one direction.
Alternative: Functional API for more complex, graph-like architectures.

Keras: Layers

from tensorflow.keras import layers

# Input layer (implicitly created) and 1st hidden layer with 32 nodes
layer_1 = layers.Dense(32, input_shape=[8]) 

# 2nd hidden layer with 16 nodes and ReLU activation
layer_2 = layers.Dense(16, activation='relu')

# Output layer with 1 node for regression
layer_3 = layers.Dense(1)

# These layers would typically be combined in a Sequential model
# e.g., model = keras.Sequential([layer_1, layer_2, layer_3])

Dense layer: Every node connects to every node in the previous/next layer.
input_shape: Defines the number of features in the input.
First argument: Number of nodes (\(\textit{units}\)) in the layer.
activation: Specifies the activation function (e.g., 'relu', 'sigmoid').

A model consists of layers of nodes. In the lab we are about to do, those layers are Dense layers. A dense layer in a neural network is a layer where every node is connected to every node in the next layer.

In the example we have on this slide, we create three Dense layer classes. This actually creates a neural network that is four layers deep, though.

When we make the first layer, we pass in an input shape. This is the shape of the features you’ll be feeding into the model. In this case we chose an input shape of 8. That indicates we’ll be providing 8 features to the model. The input layer is the first layer.

But you should also notice that we passed the number 32 to the Dense constructor. This creates our first hidden layer with 32 nodes.

In review, this first line of code creates two layers. One layer is an input layer that accepts 8 features. That layer is densely connected to the next layer, which has 32 nodes. This means there are 8x32 connections between the layers.

The next line of code creates another dense layer. This layer is 16 nodes wide. Notice that we pass an activation function to this layer. The activation we chose is the relu activation. By default the activation for a dense layer in TensorFlow Keras is \(f(x) = x\). We can adjust the activation function layer by layer.

There are many activation functions available in the tensorflow.keras.activations namespace. Many of these can be referenced by name, as shown in this example. There are more activation functions available in tf.nn. For these functions you’ll need to pass in the class - like tf.nn.leaky_relu - instead of just the name.

The final layer that we create is our output layer. Since we have been doing single output regressions, this output layer has only one node. That node will be our predicted regression value for a given set of input features.

You aren’t limited to one output though. As we move into classification, we’ll see examples with more than one output node.

Keras: Dense Neural Network Architecture

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
  layers.Dense(64, input_shape=[8], activation='relu', name='hidden_layer_1'),
  layers.Dense(64, activation='relu', name='hidden_layer_2'),
  layers.Dense(1, name='output_layer')
])
model.summary()

Model Visualization

In our previous slide, we created layers, but we didn’t connect them. In this slide we’ll create the layers inside a sequential model. Now the layers are densely connected in sequence.

Questions to ask the class: How many layers are there in this model? Answer: 4 (Input, Hidden1, Hidden2, Output)

How many nodes are in the first (input) layer? Answer: 8 (implicitly from input_shape)

How many nodes are in the second (hidden_layer_1) and third (hidden_layer_2) layers? Answer: 64

How many nodes are in the final (output) layer? Answer: 1

How many connections are there between layer 1 (input) and layer 2 (hidden_layer_1)? Answer: 8x64 = 512 connections.

It may be helpful to draw a schematic of the model on the board while asking students the questions if the Graphviz diagram isn’t clear enough immediately.

The model.summary() command (if run in a Python environment) prints a table showing the layers, their output shapes, and the number of parameters. This helps verify the architecture.

Keras: Other Layer Types

from tensorflow.keras.layers import (
    AveragePooling1D,
    Conv3D,
    GRU,
    RNN,
    ZeroPadding3D,
    LSTM,
    BatchNormalization,
    Dropout,
    Reshape
)

# Not an exhaustive list, just examples for different ML tasks.
# Each serves a specific purpose in processing different data types.

# These are imported for conceptual understanding, not direct execution.
# Actual usage involves constructing models from these layers.

Dense is just one type; Keras offers many specialized layers:
- Convolutional layers (Conv2D, Conv3D): For spatial data (images, videos).
- Recurrent layers (LSTM, GRU): For sequential data (time series, text).
- Pooling layers (MaxPooling1D, AveragePooling2D): For downsampling.
- Normalization layers (BatchNormalization): For stabilizing training.
- Regularization layers (Dropout): For preventing overfitting.

Keras: Model Compilation

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
  layers.Dense(64, input_shape=[8], activation='relu'),
  layers.Dense(1)
])

model.compile(
  loss='mse',           # Mean Squared Error
  optimizer='Adam',     # Adaptive Moment Estimation optimizer
  metrics=['mae', 'mse'], # Track Mean Absolute Error and Mean Squared Error
)

# print(model.optimizer) # Uncomment to inspect optimizer

Configures the model for training.
loss function: Measures how well the model performs.
optimizer: Algorithm to adjust weights and minimize the loss.
metrics: Evaluation criteria, displayed during training.

Keras: Model Training

import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

# Dummy data for demonstration
training_df = {
    "feature1": np.random.rand(100, 8),
    "target_column": np.random.rand(100)
}
feature_columns = ["feature1"]
target_column = "target_column"

model = keras.Sequential([layers.Dense(1, input_shape=[8])])
model.compile(loss='mse', optimizer='Adam', metrics=['mae'])

EPOCHS = 5

history = model.fit(
  training_df["feature1"],
  training_df[target_column],
  epochs=EPOCHS,
  validation_split=0.2, # Use 20% of training data for validation
  verbose=0 # Suppress output for concise presentation
)

print(history.history) # Display training history

model.fit(): Method to train the model.
epochs: Number of times the entire dataset is passed forward and backward through the neural network.
validation_split: Fraction of data to use for validation during training.
Returns a History object containing loss and metric values per epoch.

Keras: Making Predictions

import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

# Assume 'model' is already trained (from previous slide)
# Dummy testing data for demonstration
testing_df = {
    "feature1": np.random.rand(10, 8)
}
feature_columns = ["feature1"]

# Generate predictions
predictions = model.predict(testing_df["feature1"])

print("First 5 predictions:")
print(predictions[:5])

model.predict(): Generates output predictions for new input data.
Returns an array of predictions, matching the output layer’s shape.

Your Turn! Regression with TensorFlow

Tip

Lab: Build a deep neural network using Keras to predict California housing prices.

09. Regression Project

Predicting Insurance Charges

Review: What regression models have we learned about?

Review: What tools have we learned about?

Review: What data analysis and preparation techniques have we learned about?

Review: How do we measure the quality of a model?

Regression Project: The Data

Column	Type	Description
`age`	`number`	age of primary beneficiary
`sex`	`string`	gender of the primary beneficiary
`bmi`	`number`	body mass index of the primary beneficiary
`children`	`number`	number of children covered by the plan
`smoker`	`string`	is the primary beneficiary a smoker
`region`	`string`	geographic region of the beneficiaries
`charges`	`number`	costs to the insurance company (target)

Regression Project: Your Turn

Problem Framing: Understand the context, potential biases, and impact.
Exploratory Data Analysis (EDA): Acquire, clean, and visualize the data.
Model Building: Choose, train, and evaluate a regression model.

It is now your turn to perform a regression from end-to-end.

The lab you are about to be given is divided into three primary parts, shown on this slide.

In the “Problem Framing” section, you’ll be given the context for your insurance charges model and asked questions about how machine learning might or might not be the best tool for the job, how the data might be biased, and how the model fits in the overall solution. This section exists to remind you that we create these models to help drive decisions, and those decisions have impact. There aren’t necessarily right or wrong answers here. We are interested in you thinking through the issues and coming up with your own opinion.

In the next section, you’ll acquire and explore the data. In this section we expect you to write code and prose about the data. Does the data have obvious problems? Do any model-independent changes need to be made to the data? EDA is the place to reason about and perform these tasks.

The final section is the modeling section. In this section we expect you to build and train a model to perform regression. Then measure the quality of that model using, at minimum, a final root mean squared error. It doesn’t matter if you perform a linear regression or build a neural network. We just want to see a model built and trained. It would be good if your final RMSE was near or better than the benchmark mentioned in the lab, but that isn’t a strict requirement.

Feel free to use any of the tools that we have covered in this course so far.

Take your time. Experiment. Don’t be afraid to throw away some work along the way.

Machine Learning

05. Introduction to TensorFlow

What Is TensorFlow Good For?

Tensor

TensorFlow: Graphs

TensorFlow: Graphs

TensorFlow: Graphs

TensorFlow: Versions

TensorFlow Is Separated Into Abstraction Layers

Your Turn!

06. Linear Regression With TensorFlow

But Why?

LinearRegressor

An implementation of Estimator

LinearRegressor

LinearRegressor: Training Function Details

LinearRegressor: Optimizer

LinearRegressor: Distribution

Your Turn! Predicting Housing Prices

07. Neural Networks

Neural Networks: Good?

Neural Networks: Bad?

Neural Networks: Hype?

History & Motivation

Neural Networks: Inspired by Nature

Neural Networks: Inspired by Nature

Neural Networks: Inspired by Nature

Neural Networks: Cutting Edge?

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN)

Perceptron

Perceptron: The Math

Perceptron: The Math

Perceptron Example: Predicting ML Study

Perceptron Example: Predicting ML Study

Machine Learning Process (Review)

Perceptron Example: Weights & Bias

Perceptron Example: Weights & Bias

Perceptron Example: Kelly’s Input

Perceptron Example: Kelly’s Input

Perceptron Example: Kelly’s Prediction

Perceptron Example: Kelly’s Prediction

Perceptron Example: Riley’s Input

Perceptron Example: Riley’s Input

Perceptron Example: Riley’s Prediction

Perceptron Example: Riley’s Prediction

Perceptron Example: Learning Process

Perceptron Example: Learning Process

Machine Learning Process (Neural Networks)

Issues with this plan?

Sigmoid Activation

Activation Functions

08. Regression With TensorFlow (Keras)

Keras

The Python Deep Learning Library

Keras: Sequential Models

Keras: Layers

Keras: Dense Neural Network Architecture

Keras: Other Layer Types

Keras: Model Compilation

Keras: Model Training

Keras: Making Predictions

Your Turn! Regression with TensorFlow

09. Regression Project

Predicting Insurance Charges

Review: What regression models have we learned about?

Review: What tools have we learned about?

Review: What data analysis and preparation techniques have we learned about?

Review: How do we measure the quality of a model?

Regression Project: The Data

Regression Project: Your Turn

`LinearRegressor`

An implementation of `Estimator`

`LinearRegressor`

`LinearRegressor`: Training Function Details

`LinearRegressor`: Optimizer

`LinearRegressor`: Distribution