HW2. Linear and Logistic Regression via Gradient Descent
Overview
In this assignment you will implement two complete gradient-descent training pipelines using only NumPy.
- Problem 1 asks you to implement linear regression with an interaction term, trained using Mean Squared Error (MSE).
- Problem 2 asks you to implement logistic regression with an interaction term, trained using Binary Cross-Entropy (BCE).
The two problems are intentionally parallel. Both use synthetic features, both use the same four-parameter interaction model, and both ask you to derive gradients by hand before translating those derivatives into NumPy code.
You will work with two separate submission files:
hw2_linear_regression.pyhw2_logistic_regression.py
Background 1
Linear Regression
In Problem 1, you will generate synthetic regression data from the rule
\[ y = 1 + 2x_1 + 3x_2 + 4x_1x_2 + \varepsilon \]
where \(\varepsilon\) is Gaussian noise. The ground-truth parameters are
\[ b = 1, \qquad w_1 = 2, \qquad w_2 = 3, \qquad w_3 = 4 \]
Once the data has been generated, the prediction model is
\[ \hat{y}_i = b + w_1 x_{i1} + w_2 x_{i2} + w_3 x_{i1}x_{i2} \]
The MSE loss over \(N\) samples is
\[ \mathcal{L}_{\text{MSE}}(\mathbf{w}) = \frac{1}{N}\sum_{i=1}^{N}(\hat{y}_i - y_i)^2 \]
and the gradient descent update rule is
\[ \mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} - \eta \nabla_{\mathbf{w}} \mathcal{L}_{\text{MSE}}(\mathbf{w}^{(t)}) \]
Grading-Preview Doctests
The public doctests for Problem 1 use the tiny noiseless dataset
\[ \mathbf{x}^{(1)} = (0, 0), \quad \mathbf{x}^{(2)} = (1, 0), \quad \mathbf{x}^{(3)} = (0, 1), \quad \mathbf{x}^{(4)} = (1, 1) \]
which produces
\[ \mathbf{y} = \begin{bmatrix} 1 \\ 3 \\ 4 \\ 10 \end{bmatrix} \]
because it comes from the same ground-truth rule without any added noise.
Background 2
Logistic Regression
In Problem 2, you will generate synthetic classification data from the same continuous interaction score used in Problem 1:
\[ s = 1 + 2x_1 + 3x_2 + 4x_1x_2 + \varepsilon \]
where \(\varepsilon\) is Gaussian noise. You will then convert that score into binary labels using a threshold:
\[ y = \begin{cases} 1 & \text{if } s \ge \text{threshold} \\ 0 & \text{if } s < \text{threshold} \end{cases} \]
The logistic regression model uses the same interaction structure inside the sigmoid:
\[ p_i = \sigma(b + w_1 x_{i1} + w_2 x_{i2} + w_3 x_{i1}x_{i2}), \qquad \sigma(z) = \frac{1}{1 + e^{-z}} \]
The BCE loss over \(N\) samples is
\[ \mathcal{L}_{\text{BCE}}(\mathbf{w}) = -\frac{1}{N}\sum_{i=1}^{N}\left[y_i \log p_i + (1-y_i)\log(1-p_i)\right] \]
and the gradient descent update rule is
\[ \mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} - \eta \nabla_{\mathbf{w}} \mathcal{L}_{\text{BCE}}(\mathbf{w}^{(t)}) \]
Grading-Preview Doctests
The public doctests for Problem 2 use the tiny deterministic dataset
\[ \mathbf{x}^{(1)} = (0, 0), \quad \mathbf{x}^{(2)} = (1, 0), \quad \mathbf{x}^{(3)} = (0, 1), \quad \mathbf{x}^{(4)} = (1, 1) \]
with labels
\[ \mathbf{y} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{bmatrix} \]
which comes from thresholding the noiseless score with threshold = 5.0 in the public doctests. Hidden tests may use different threshold values, so your implementation should correctly use the threshold argument rather than assuming it is always 5.0.
Problem 1 and Problem 2 are intentionally parallel. In both cases, you will generate synthetic data, implement a vectorized prediction rule, compute a loss by reduction over samples, derive a gradient by hand, and write a full batch gradient descent training loop.
Problem 1: Linear Regression
Complete the five TODO functions in hw2_linear_regression.py. Each function builds on the previous one, so implement them in order. test_linear_regression.py provides doctests for each function. Do not modify it.
Before implementing compute_mse_gradient, derive the four partial derivatives
\[ \frac{\partial \mathcal{L}}{\partial b}, \qquad \frac{\partial \mathcal{L}}{\partial w_1}, \qquad \frac{\partial \mathcal{L}}{\partial w_2}, \qquad \frac{\partial \mathcal{L}}{\partial w_3} \]
for the interaction linear regression model above.
Function 1: generate_interaction_synthetic_data
Generate a synthetic regression dataset from
\[ y = 1 + 2x_1 + 3x_2 + 4x_1x_2 + \varepsilon \]
where \(\varepsilon \sim \mathcal{N}(0, \text{noise\_std}^2)\).
hw2_linear_regression.py
def generate_interaction_synthetic_data(n, noise_std=0.5, seed=42):
''' Generate synthetic regression data with an interaction term.
Data is generated from
y = 1 + 2*x1 + 3*x2 + 4*x1*x2 + eps
where eps is Gaussian noise with standard deviation noise_std.
Args
----
n : int
Number of samples to generate.
noise_std : float, optional
Standard deviation of Gaussian noise added to the targets.
seed : int, optional
Random seed for reproducibility.
Returns
-------
x1_N : 1D np.array, shape (N,)
First feature values.
x2_N : 1D np.array, shape (N,)
Second feature values.
y_N : 1D np.array, shape (N,)
Regression targets.
'''
n = int(n)
if n < 1:
raise ValueError("n must be at least 1")
# ========== TODO ==========
# 1. Create a random number generator using the given seed
# 2. Sample x1_N and x2_N uniformly from [0, 2]
# 3. Sample Gaussian noise with standard deviation noise_std
# 4. Compute y_N = 1 + 2*x1_N + 3*x2_N + 4*x1_N*x2_N + noise
# 5. Return x1_N, x2_N, y_N
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 2: predict_interaction_linear_regression
Compute predictions using
\[ \hat{y}_i = b + w_1 x_{i1} + w_2 x_{i2} + w_3 x_{i1}x_{i2} \]
for all samples at once.
hw2_linear_regression.py
def predict_interaction_linear_regression(x1_N, x2_N, w_D):
''' Predict targets for a 2-feature linear model with an interaction term.
The model is
yhat_i = b + w1*x1_i + w2*x2_i + w3*x1_i*x2_i
where w_D = [b, w1, w2, w3].
Args
----
x1_N : 1D np.array, shape (N,)
First feature value for each sample.
x2_N : 1D np.array, shape (N,)
Second feature value for each sample.
w_D : 1D np.array, shape (4,)
Parameter vector [b, w1, w2, w3].
Returns
-------
yhat_N : 1D np.array, shape (N,)
Predicted target for each sample.
'''
N = x1_N.shape[0]
if x2_N.shape[0] != N:
raise ValueError("x1_N and x2_N must have the same length")
if w_D.shape[0] != 4:
raise ValueError("w_D must have length 4")
# ========== TODO ==========
# Compute yhat_N using the formula
# yhat_i = b + w1*x1_i + w2*x2_i + w3*x1_i*x2_i
# for all N samples without Python loops.
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 3: compute_mse_loss
Given the true targets and predictions, compute the Mean Squared Error.
hw2_linear_regression.py
def compute_mse_loss(y_N, yhat_N):
''' Compute mean squared error loss.
Args
----
y_N : 1D np.array, shape (N,)
True target values.
yhat_N : 1D np.array, shape (N,)
Predicted target values.
Returns
-------
loss : float
Mean squared error over all samples.
'''
N = y_N.shape[0]
if yhat_N.shape[0] != N:
raise ValueError("y_N and yhat_N must have the same length")
# ========== TODO ==========
# 1. Compute residuals r_N = yhat_N - y_N
# 2. Return the mean of r_N ** 2
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 4: compute_mse_gradient
Compute the gradient of the MSE loss with respect to the four parameters.
hw2_linear_regression.py
def compute_mse_gradient(x1_N, x2_N, y_N, w_D):
''' Compute the gradient of MSE loss with respect to [b, w1, w2, w3].
You should derive the four partial derivatives by hand from the model and
MSE definition, then implement the result directly in NumPy.
Args
----
x1_N : 1D np.array, shape (N,)
First feature value for each sample.
x2_N : 1D np.array, shape (N,)
Second feature value for each sample.
y_N : 1D np.array, shape (N,)
True target values.
w_D : 1D np.array, shape (4,)
Parameter vector [b, w1, w2, w3].
Returns
-------
grad_D : 1D np.array, shape (4,)
Gradient of the MSE loss with respect to [b, w1, w2, w3].
'''
N = x1_N.shape[0]
if x2_N.shape[0] != N or y_N.shape[0] != N:
raise ValueError("x1_N, x2_N, and y_N must have the same length")
if w_D.shape[0] != 4:
raise ValueError("w_D must have length 4")
# ========== TODO ==========
# 1. Compute predictions yhat_N using predict_interaction_linear_regression
# 2. Compute residuals r_N = yhat_N - y_N
# 3. Derive and implement the four partial derivatives:
# dL/db, dL/dw1, dL/dw2, dL/dw3
# 4. Return them as a NumPy array of shape (4,)
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 5: fit_linear_regression_gd
Train the model using full batch gradient descent and return both the learned weights and the loss history.
hw2_linear_regression.py
def fit_linear_regression_gd(x1_N, x2_N, y_N, lr=0.1, num_epochs=1000):
''' Fit the interaction linear regression model using batch gradient descent.
Args
----
x1_N : 1D np.array, shape (N,)
First feature value for each sample.
x2_N : 1D np.array, shape (N,)
Second feature value for each sample.
y_N : 1D np.array, shape (N,)
True target values.
lr : float, optional
Learning rate.
num_epochs : int, optional
Number of gradient descent updates to perform.
Returns
-------
w_D : 1D np.array, shape (4,)
Learned parameter vector [b, w1, w2, w3].
losses_L : 1D np.array, shape (num_epochs,)
Loss value recorded before each parameter update.
'''
N = x1_N.shape[0]
if x2_N.shape[0] != N or y_N.shape[0] != N:
raise ValueError("x1_N, x2_N, and y_N must have the same length")
num_epochs = int(num_epochs)
if num_epochs < 1:
raise ValueError("num_epochs must be at least 1")
# ========== TODO ==========
# 1. Initialize w_D as zeros of shape (4,)
# 2. Create an empty Python list to store losses
# 3. For each epoch:
# a. Compute predictions with predict_interaction_linear_regression
# b. Compute loss with compute_mse_loss
# c. Compute gradient with compute_mse_gradient
# d. Append loss to the list
# e. Update weights with w_D = w_D - lr * grad_D
# 4. Convert the list of losses to a NumPy array before returning
raise NotImplementedError("Remove this line and implement above")
# ==========================Problem 2: Logistic Regression
Complete the six TODO functions in hw2_logistic_regression.py. Each function builds on the previous one, so implement them in order. test_logistic_regression.py provides doctests for each function. Do not modify it.
Before implementing compute_bce_gradient, derive the four partial derivatives
\[ \frac{\partial \mathcal{L}}{\partial b}, \qquad \frac{\partial \mathcal{L}}{\partial w_1}, \qquad \frac{\partial \mathcal{L}}{\partial w_2}, \qquad \frac{\partial \mathcal{L}}{\partial w_3} \]
for the interaction logistic regression model above.
Function 1: generate_interaction_synthetic_classification_data
Generate a synthetic classification dataset by first computing the continuous interaction score
\[ s = 1 + 2x_1 + 3x_2 + 4x_1x_2 + \varepsilon \]
and then thresholding it to create binary labels.
hw2_logistic_regression.py
def generate_interaction_synthetic_classification_data(
n, noise_std=0.5, threshold=5.0, seed=42
):
''' Generate binary classification data with an interaction term.
Data is generated from the continuous interaction score
s = 1 + 2*x1 + 3*x2 + 4*x1*x2 + eps
where eps is Gaussian noise with standard deviation noise_std.
Binary labels are then assigned by thresholding the score:
y = 1 if s >= threshold else 0
Args
----
n : int
Number of samples to generate.
noise_std : float, optional
Standard deviation of Gaussian noise added to the score.
threshold : float, optional
Threshold used to convert the continuous score into binary labels.
seed : int, optional
Random seed for reproducibility.
Returns
-------
x1_N : 1D np.array, shape (N,)
First feature values.
x2_N : 1D np.array, shape (N,)
Second feature values.
y_N : 1D np.array of ints, shape (N,)
Binary labels in {0, 1}.
'''
n = int(n)
if n < 1:
raise ValueError("n must be at least 1")
# ========== TODO ==========
# 1. Create a random number generator using the given seed
# 2. Sample x1_N and x2_N uniformly from [0, 2]
# 3. Sample Gaussian noise with standard deviation noise_std
# 4. Compute the interaction score
# s_N = 1 + 2*x1_N + 3*x2_N + 4*x1_N*x2_N + noise
# 5. Convert scores to binary labels using the given threshold
# 6. Return x1_N, x2_N, y_N
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 2: sigmoid
Implement the sigmoid function
\[ \sigma(z) = \frac{1}{1 + e^{-z}} \]
elementwise.
hw2_logistic_regression.py
def sigmoid(z_N):
''' Compute the sigmoid function elementwise.
Args
----
z_N : np.array or scalar
Input values.
Returns
-------
sigma_N : np.array or scalar
Sigmoid-transformed values.
'''
# ========== TODO ==========
# Compute sigma(z) = 1 / (1 + exp(-z)) elementwise.
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 3: predict_proba_logistic_regression
Predict class-1 probabilities using
\[ p_i = \sigma(b + w_1 x_{i1} + w_2 x_{i2} + w_3 x_{i1}x_{i2}) \]
for all samples at once.
hw2_logistic_regression.py
def predict_proba_logistic_regression(x1_N, x2_N, w_D):
''' Predict class-1 probabilities for logistic regression.
The model is
p_i = sigmoid(b + w1*x1_i + w2*x2_i + w3*x1_i*x2_i)
where w_D = [b, w1, w2, w3].
Args
----
x1_N : 1D np.array, shape (N,)
First feature value for each sample.
x2_N : 1D np.array, shape (N,)
Second feature value for each sample.
w_D : 1D np.array, shape (4,)
Parameter vector [b, w1, w2, w3].
Returns
-------
p_N : 1D np.array, shape (N,)
Predicted probability of class 1 for each sample.
'''
N = x1_N.shape[0]
if x2_N.shape[0] != N:
raise ValueError("x1_N and x2_N must have the same length")
if w_D.shape[0] != 4:
raise ValueError("w_D must have length 4")
# ========== TODO ==========
# 1. Compute z_N = b + w1*x1_N + w2*x2_N + w3*x1_N*x2_N
# 2. Apply sigmoid to z_N
# 3. Return the predicted probabilities
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 4: compute_bce_loss
Given the true labels and predicted probabilities, compute the Binary Cross-Entropy loss.
hw2_logistic_regression.py
def compute_bce_loss(y_N, p_N):
''' Compute binary cross-entropy loss.
Args
----
y_N : 1D np.array, shape (N,)
True binary labels.
p_N : 1D np.array, shape (N,)
Predicted probabilities.
Returns
-------
loss : float
Mean binary cross-entropy over all samples.
'''
N = y_N.shape[0]
if p_N.shape[0] != N:
raise ValueError("y_N and p_N must have the same length")
# ========== TODO ==========
# 1. Add a small epsilon to guard against log(0)
# 2. Compute the BCE loss
# -(1/N) * sum[y*log(p) + (1-y)*log(1-p)]
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 5: compute_bce_gradient
Compute the gradient of the BCE loss with respect to the four parameters.
hw2_logistic_regression.py
def compute_bce_gradient(x1_N, x2_N, y_N, w_D):
''' Compute the gradient of BCE loss with respect to [b, w1, w2, w3].
You should derive the four partial derivatives by hand from the model and
BCE definition, then implement the result directly in NumPy.
Args
----
x1_N : 1D np.array, shape (N,)
First feature value for each sample.
x2_N : 1D np.array, shape (N,)
Second feature value for each sample.
y_N : 1D np.array, shape (N,)
True binary labels.
w_D : 1D np.array, shape (4,)
Parameter vector [b, w1, w2, w3].
Returns
-------
grad_D : 1D np.array, shape (4,)
Gradient of the BCE loss with respect to [b, w1, w2, w3].
'''
N = x1_N.shape[0]
if x2_N.shape[0] != N or y_N.shape[0] != N:
raise ValueError("x1_N, x2_N, and y_N must have the same length")
if w_D.shape[0] != 4:
raise ValueError("w_D must have length 4")
# ========== TODO ==========
# 1. Compute predicted probabilities p_N using predict_proba_logistic_regression
# 2. Compute prediction errors e_N = p_N - y_N
# 3. Derive and implement the four partial derivatives:
# dL/db, dL/dw1, dL/dw2, dL/dw3
# 4. Return them as a NumPy array of shape (4,)
raise NotImplementedError("Remove this line and implement above")
# ==========================Function 6: fit_logistic_regression_gd
Train the model using full batch gradient descent and return both the learned weights and the BCE loss history.
hw2_logistic_regression.py
def fit_logistic_regression_gd(x1_N, x2_N, y_N, lr=0.5, num_epochs=1000):
''' Fit the interaction logistic regression model using batch gradient descent.
Args
----
x1_N : 1D np.array, shape (N,)
First feature value for each sample.
x2_N : 1D np.array, shape (N,)
Second feature value for each sample.
y_N : 1D np.array, shape (N,)
True binary labels.
lr : float, optional
Learning rate.
num_epochs : int, optional
Number of gradient descent updates to perform.
Returns
-------
w_D : 1D np.array, shape (4,)
Learned parameter vector [b, w1, w2, w3].
losses_L : 1D np.array, shape (num_epochs,)
BCE loss recorded before each parameter update.
'''
N = x1_N.shape[0]
if x2_N.shape[0] != N or y_N.shape[0] != N:
raise ValueError("x1_N, x2_N, and y_N must have the same length")
num_epochs = int(num_epochs)
if num_epochs < 1:
raise ValueError("num_epochs must be at least 1")
# ========== TODO ==========
# 1. Initialize w_D as zeros of shape (4,)
# 2. Create an empty Python list to store losses
# 3. For each epoch:
# a. Compute probabilities with predict_proba_logistic_regression
# b. Compute loss with compute_bce_loss
# c. Compute gradient with compute_bce_gradient
# d. Append loss to the list
# e. Update weights with w_D = w_D - lr * grad_D
# 4. Convert the list of losses to a NumPy array before returning
raise NotImplementedError("Remove this line and implement above")
# ==========================Running Doctests
Problem 1
Run the linear regression doctests with:
terminal
pixi run python -m doctest test_linear_regression.py -vProblem 2
Run the logistic regression doctests with:
terminal
pixi run python -m doctest test_logistic_regression.py -vSubmission
Upload hw2_linear_regression.py for Problem 1 and hw2_logistic_regression.py for Problem 2 to eCampus. Do not rename the files. Do not submit the notebook or the doctest files.
Common Mistakes
Problem 1: forgetting to use the provided seed in generate_interaction_synthetic_data, mixing up yhat_N - y_N and y_N - yhat_N, or dropping the interaction term in the gradient.
Problem 2: forgetting to threshold the synthetic score into binary labels, mixing up probabilities and labels inside BCE, or using the wrong sign in the BCE gradient update.