HW2. Linear and Logistic Regression via Gradient Descent

Author

Byungkook Oh

Modified

2026.03.22

Overview

In this assignment you will implement two complete gradient-descent training pipelines using only NumPy.

  • Problem 1 asks you to implement linear regression with an interaction term, trained using Mean Squared Error (MSE).
  • Problem 2 asks you to implement logistic regression with an interaction term, trained using Binary Cross-Entropy (BCE).

The two problems are intentionally parallel. Both use synthetic features, both use the same four-parameter interaction model, and both ask you to derive gradients by hand before translating those derivatives into NumPy code.

You will work with two separate submission files:

  • hw2_linear_regression.py
  • hw2_logistic_regression.py

Background 1

Linear Regression

In Problem 1, you will generate synthetic regression data from the rule

\[ y = 1 + 2x_1 + 3x_2 + 4x_1x_2 + \varepsilon \]

where \(\varepsilon\) is Gaussian noise. The ground-truth parameters are

\[ b = 1, \qquad w_1 = 2, \qquad w_2 = 3, \qquad w_3 = 4 \]

Once the data has been generated, the prediction model is

\[ \hat{y}_i = b + w_1 x_{i1} + w_2 x_{i2} + w_3 x_{i1}x_{i2} \]

The MSE loss over \(N\) samples is

\[ \mathcal{L}_{\text{MSE}}(\mathbf{w}) = \frac{1}{N}\sum_{i=1}^{N}(\hat{y}_i - y_i)^2 \]

and the gradient descent update rule is

\[ \mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} - \eta \nabla_{\mathbf{w}} \mathcal{L}_{\text{MSE}}(\mathbf{w}^{(t)}) \]

Grading-Preview Doctests

The public doctests for Problem 1 use the tiny noiseless dataset

\[ \mathbf{x}^{(1)} = (0, 0), \quad \mathbf{x}^{(2)} = (1, 0), \quad \mathbf{x}^{(3)} = (0, 1), \quad \mathbf{x}^{(4)} = (1, 1) \]

which produces

\[ \mathbf{y} = \begin{bmatrix} 1 \\ 3 \\ 4 \\ 10 \end{bmatrix} \]

because it comes from the same ground-truth rule without any added noise.

Background 2

Logistic Regression

In Problem 2, you will generate synthetic classification data from the same continuous interaction score used in Problem 1:

\[ s = 1 + 2x_1 + 3x_2 + 4x_1x_2 + \varepsilon \]

where \(\varepsilon\) is Gaussian noise. You will then convert that score into binary labels using a threshold:

\[ y = \begin{cases} 1 & \text{if } s \ge \text{threshold} \\ 0 & \text{if } s < \text{threshold} \end{cases} \]

The logistic regression model uses the same interaction structure inside the sigmoid:

\[ p_i = \sigma(b + w_1 x_{i1} + w_2 x_{i2} + w_3 x_{i1}x_{i2}), \qquad \sigma(z) = \frac{1}{1 + e^{-z}} \]

The BCE loss over \(N\) samples is

\[ \mathcal{L}_{\text{BCE}}(\mathbf{w}) = -\frac{1}{N}\sum_{i=1}^{N}\left[y_i \log p_i + (1-y_i)\log(1-p_i)\right] \]

and the gradient descent update rule is

\[ \mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} - \eta \nabla_{\mathbf{w}} \mathcal{L}_{\text{BCE}}(\mathbf{w}^{(t)}) \]

Grading-Preview Doctests

The public doctests for Problem 2 use the tiny deterministic dataset

\[ \mathbf{x}^{(1)} = (0, 0), \quad \mathbf{x}^{(2)} = (1, 0), \quad \mathbf{x}^{(3)} = (0, 1), \quad \mathbf{x}^{(4)} = (1, 1) \]

with labels

\[ \mathbf{y} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{bmatrix} \]

which comes from thresholding the noiseless score with threshold = 5.0 in the public doctests. Hidden tests may use different threshold values, so your implementation should correctly use the threshold argument rather than assuming it is always 5.0.

NoteNumPy Skills Exercised

Problem 1 and Problem 2 are intentionally parallel. In both cases, you will generate synthetic data, implement a vectorized prediction rule, compute a loss by reduction over samples, derive a gradient by hand, and write a full batch gradient descent training loop.

Problem 1: Linear Regression

Complete the five TODO functions in hw2_linear_regression.py. Each function builds on the previous one, so implement them in order. test_linear_regression.py provides doctests for each function. Do not modify it.

Before implementing compute_mse_gradient, derive the four partial derivatives

\[ \frac{\partial \mathcal{L}}{\partial b}, \qquad \frac{\partial \mathcal{L}}{\partial w_1}, \qquad \frac{\partial \mathcal{L}}{\partial w_2}, \qquad \frac{\partial \mathcal{L}}{\partial w_3} \]

for the interaction linear regression model above.

Function 1: generate_interaction_synthetic_data

Generate a synthetic regression dataset from

\[ y = 1 + 2x_1 + 3x_2 + 4x_1x_2 + \varepsilon \]

where \(\varepsilon \sim \mathcal{N}(0, \text{noise\_std}^2)\).

hw2_linear_regression.py
def generate_interaction_synthetic_data(n, noise_std=0.5, seed=42):
    ''' Generate synthetic regression data with an interaction term.

    Data is generated from
        y = 1 + 2*x1 + 3*x2 + 4*x1*x2 + eps
    where eps is Gaussian noise with standard deviation noise_std.

    Args
    ----
    n : int
        Number of samples to generate.
    noise_std : float, optional
        Standard deviation of Gaussian noise added to the targets.
    seed : int, optional
        Random seed for reproducibility.

    Returns
    -------
    x1_N : 1D np.array, shape (N,)
        First feature values.
    x2_N : 1D np.array, shape (N,)
        Second feature values.
    y_N : 1D np.array, shape (N,)
        Regression targets.
    '''
    n = int(n)
    if n < 1:
        raise ValueError("n must be at least 1")

    # ========== TODO ==========
    # 1. Create a random number generator using the given seed
    # 2. Sample x1_N and x2_N uniformly from [0, 2]
    # 3. Sample Gaussian noise with standard deviation noise_std
    # 4. Compute y_N = 1 + 2*x1_N + 3*x2_N + 4*x1_N*x2_N + noise
    # 5. Return x1_N, x2_N, y_N
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 2: predict_interaction_linear_regression

Compute predictions using

\[ \hat{y}_i = b + w_1 x_{i1} + w_2 x_{i2} + w_3 x_{i1}x_{i2} \]

for all samples at once.

hw2_linear_regression.py
def predict_interaction_linear_regression(x1_N, x2_N, w_D):
    ''' Predict targets for a 2-feature linear model with an interaction term.

    The model is
        yhat_i = b + w1*x1_i + w2*x2_i + w3*x1_i*x2_i
    where w_D = [b, w1, w2, w3].

    Args
    ----
    x1_N : 1D np.array, shape (N,)
        First feature value for each sample.
    x2_N : 1D np.array, shape (N,)
        Second feature value for each sample.
    w_D : 1D np.array, shape (4,)
        Parameter vector [b, w1, w2, w3].

    Returns
    -------
    yhat_N : 1D np.array, shape (N,)
        Predicted target for each sample.
    '''
    N = x1_N.shape[0]
    if x2_N.shape[0] != N:
        raise ValueError("x1_N and x2_N must have the same length")
    if w_D.shape[0] != 4:
        raise ValueError("w_D must have length 4")

    # ========== TODO ==========
    # Compute yhat_N using the formula
    #     yhat_i = b + w1*x1_i + w2*x2_i + w3*x1_i*x2_i
    # for all N samples without Python loops.
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 3: compute_mse_loss

Given the true targets and predictions, compute the Mean Squared Error.

hw2_linear_regression.py
def compute_mse_loss(y_N, yhat_N):
    ''' Compute mean squared error loss.

    Args
    ----
    y_N : 1D np.array, shape (N,)
        True target values.
    yhat_N : 1D np.array, shape (N,)
        Predicted target values.

    Returns
    -------
    loss : float
        Mean squared error over all samples.
    '''
    N = y_N.shape[0]
    if yhat_N.shape[0] != N:
        raise ValueError("y_N and yhat_N must have the same length")

    # ========== TODO ==========
    # 1. Compute residuals r_N = yhat_N - y_N
    # 2. Return the mean of r_N ** 2
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 4: compute_mse_gradient

Compute the gradient of the MSE loss with respect to the four parameters.

hw2_linear_regression.py
def compute_mse_gradient(x1_N, x2_N, y_N, w_D):
    ''' Compute the gradient of MSE loss with respect to [b, w1, w2, w3].

    You should derive the four partial derivatives by hand from the model and
    MSE definition, then implement the result directly in NumPy.

    Args
    ----
    x1_N : 1D np.array, shape (N,)
        First feature value for each sample.
    x2_N : 1D np.array, shape (N,)
        Second feature value for each sample.
    y_N : 1D np.array, shape (N,)
        True target values.
    w_D : 1D np.array, shape (4,)
        Parameter vector [b, w1, w2, w3].

    Returns
    -------
    grad_D : 1D np.array, shape (4,)
        Gradient of the MSE loss with respect to [b, w1, w2, w3].
    '''
    N = x1_N.shape[0]
    if x2_N.shape[0] != N or y_N.shape[0] != N:
        raise ValueError("x1_N, x2_N, and y_N must have the same length")
    if w_D.shape[0] != 4:
        raise ValueError("w_D must have length 4")

    # ========== TODO ==========
    # 1. Compute predictions yhat_N using predict_interaction_linear_regression
    # 2. Compute residuals r_N = yhat_N - y_N
    # 3. Derive and implement the four partial derivatives:
    #       dL/db, dL/dw1, dL/dw2, dL/dw3
    # 4. Return them as a NumPy array of shape (4,)
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 5: fit_linear_regression_gd

Train the model using full batch gradient descent and return both the learned weights and the loss history.

hw2_linear_regression.py
def fit_linear_regression_gd(x1_N, x2_N, y_N, lr=0.1, num_epochs=1000):
    ''' Fit the interaction linear regression model using batch gradient descent.

    Args
    ----
    x1_N : 1D np.array, shape (N,)
        First feature value for each sample.
    x2_N : 1D np.array, shape (N,)
        Second feature value for each sample.
    y_N : 1D np.array, shape (N,)
        True target values.
    lr : float, optional
        Learning rate.
    num_epochs : int, optional
        Number of gradient descent updates to perform.

    Returns
    -------
    w_D : 1D np.array, shape (4,)
        Learned parameter vector [b, w1, w2, w3].
    losses_L : 1D np.array, shape (num_epochs,)
        Loss value recorded before each parameter update.
    '''
    N = x1_N.shape[0]
    if x2_N.shape[0] != N or y_N.shape[0] != N:
        raise ValueError("x1_N, x2_N, and y_N must have the same length")
    num_epochs = int(num_epochs)
    if num_epochs < 1:
        raise ValueError("num_epochs must be at least 1")

    # ========== TODO ==========
    # 1. Initialize w_D as zeros of shape (4,)
    # 2. Create an empty Python list to store losses
    # 3. For each epoch:
    #       a. Compute predictions with predict_interaction_linear_regression
    #       b. Compute loss with compute_mse_loss
    #       c. Compute gradient with compute_mse_gradient
    #       d. Append loss to the list
    #       e. Update weights with w_D = w_D - lr * grad_D
    # 4. Convert the list of losses to a NumPy array before returning
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Problem 2: Logistic Regression

Complete the six TODO functions in hw2_logistic_regression.py. Each function builds on the previous one, so implement them in order. test_logistic_regression.py provides doctests for each function. Do not modify it.

Before implementing compute_bce_gradient, derive the four partial derivatives

\[ \frac{\partial \mathcal{L}}{\partial b}, \qquad \frac{\partial \mathcal{L}}{\partial w_1}, \qquad \frac{\partial \mathcal{L}}{\partial w_2}, \qquad \frac{\partial \mathcal{L}}{\partial w_3} \]

for the interaction logistic regression model above.

Function 1: generate_interaction_synthetic_classification_data

Generate a synthetic classification dataset by first computing the continuous interaction score

\[ s = 1 + 2x_1 + 3x_2 + 4x_1x_2 + \varepsilon \]

and then thresholding it to create binary labels.

hw2_logistic_regression.py
def generate_interaction_synthetic_classification_data(
    n, noise_std=0.5, threshold=5.0, seed=42
):
    ''' Generate binary classification data with an interaction term.

    Data is generated from the continuous interaction score
        s = 1 + 2*x1 + 3*x2 + 4*x1*x2 + eps
    where eps is Gaussian noise with standard deviation noise_std.
    Binary labels are then assigned by thresholding the score:
        y = 1 if s >= threshold else 0

    Args
    ----
    n : int
        Number of samples to generate.
    noise_std : float, optional
        Standard deviation of Gaussian noise added to the score.
    threshold : float, optional
        Threshold used to convert the continuous score into binary labels.
    seed : int, optional
        Random seed for reproducibility.

    Returns
    -------
    x1_N : 1D np.array, shape (N,)
        First feature values.
    x2_N : 1D np.array, shape (N,)
        Second feature values.
    y_N : 1D np.array of ints, shape (N,)
        Binary labels in {0, 1}.
    '''
    n = int(n)
    if n < 1:
        raise ValueError("n must be at least 1")

    # ========== TODO ==========
    # 1. Create a random number generator using the given seed
    # 2. Sample x1_N and x2_N uniformly from [0, 2]
    # 3. Sample Gaussian noise with standard deviation noise_std
    # 4. Compute the interaction score
    #       s_N = 1 + 2*x1_N + 3*x2_N + 4*x1_N*x2_N + noise
    # 5. Convert scores to binary labels using the given threshold
    # 6. Return x1_N, x2_N, y_N
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 2: sigmoid

Implement the sigmoid function

\[ \sigma(z) = \frac{1}{1 + e^{-z}} \]

elementwise.

hw2_logistic_regression.py
def sigmoid(z_N):
    ''' Compute the sigmoid function elementwise.

    Args
    ----
    z_N : np.array or scalar
        Input values.

    Returns
    -------
    sigma_N : np.array or scalar
        Sigmoid-transformed values.
    '''
    # ========== TODO ==========
    # Compute sigma(z) = 1 / (1 + exp(-z)) elementwise.
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 3: predict_proba_logistic_regression

Predict class-1 probabilities using

\[ p_i = \sigma(b + w_1 x_{i1} + w_2 x_{i2} + w_3 x_{i1}x_{i2}) \]

for all samples at once.

hw2_logistic_regression.py
def predict_proba_logistic_regression(x1_N, x2_N, w_D):
    ''' Predict class-1 probabilities for logistic regression.

    The model is
        p_i = sigmoid(b + w1*x1_i + w2*x2_i + w3*x1_i*x2_i)
    where w_D = [b, w1, w2, w3].

    Args
    ----
    x1_N : 1D np.array, shape (N,)
        First feature value for each sample.
    x2_N : 1D np.array, shape (N,)
        Second feature value for each sample.
    w_D : 1D np.array, shape (4,)
        Parameter vector [b, w1, w2, w3].

    Returns
    -------
    p_N : 1D np.array, shape (N,)
        Predicted probability of class 1 for each sample.
    '''
    N = x1_N.shape[0]
    if x2_N.shape[0] != N:
        raise ValueError("x1_N and x2_N must have the same length")
    if w_D.shape[0] != 4:
        raise ValueError("w_D must have length 4")

    # ========== TODO ==========
    # 1. Compute z_N = b + w1*x1_N + w2*x2_N + w3*x1_N*x2_N
    # 2. Apply sigmoid to z_N
    # 3. Return the predicted probabilities
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 4: compute_bce_loss

Given the true labels and predicted probabilities, compute the Binary Cross-Entropy loss.

hw2_logistic_regression.py
def compute_bce_loss(y_N, p_N):
    ''' Compute binary cross-entropy loss.

    Args
    ----
    y_N : 1D np.array, shape (N,)
        True binary labels.
    p_N : 1D np.array, shape (N,)
        Predicted probabilities.

    Returns
    -------
    loss : float
        Mean binary cross-entropy over all samples.
    '''
    N = y_N.shape[0]
    if p_N.shape[0] != N:
        raise ValueError("y_N and p_N must have the same length")

    # ========== TODO ==========
    # 1. Add a small epsilon to guard against log(0)
    # 2. Compute the BCE loss
    #      -(1/N) * sum[y*log(p) + (1-y)*log(1-p)]
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 5: compute_bce_gradient

Compute the gradient of the BCE loss with respect to the four parameters.

hw2_logistic_regression.py
def compute_bce_gradient(x1_N, x2_N, y_N, w_D):
    ''' Compute the gradient of BCE loss with respect to [b, w1, w2, w3].

    You should derive the four partial derivatives by hand from the model and
    BCE definition, then implement the result directly in NumPy.

    Args
    ----
    x1_N : 1D np.array, shape (N,)
        First feature value for each sample.
    x2_N : 1D np.array, shape (N,)
        Second feature value for each sample.
    y_N : 1D np.array, shape (N,)
        True binary labels.
    w_D : 1D np.array, shape (4,)
        Parameter vector [b, w1, w2, w3].

    Returns
    -------
    grad_D : 1D np.array, shape (4,)
        Gradient of the BCE loss with respect to [b, w1, w2, w3].
    '''
    N = x1_N.shape[0]
    if x2_N.shape[0] != N or y_N.shape[0] != N:
        raise ValueError("x1_N, x2_N, and y_N must have the same length")
    if w_D.shape[0] != 4:
        raise ValueError("w_D must have length 4")

    # ========== TODO ==========
    # 1. Compute predicted probabilities p_N using predict_proba_logistic_regression
    # 2. Compute prediction errors e_N = p_N - y_N
    # 3. Derive and implement the four partial derivatives:
    #       dL/db, dL/dw1, dL/dw2, dL/dw3
    # 4. Return them as a NumPy array of shape (4,)
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Function 6: fit_logistic_regression_gd

Train the model using full batch gradient descent and return both the learned weights and the BCE loss history.

hw2_logistic_regression.py
def fit_logistic_regression_gd(x1_N, x2_N, y_N, lr=0.5, num_epochs=1000):
    ''' Fit the interaction logistic regression model using batch gradient descent.

    Args
    ----
    x1_N : 1D np.array, shape (N,)
        First feature value for each sample.
    x2_N : 1D np.array, shape (N,)
        Second feature value for each sample.
    y_N : 1D np.array, shape (N,)
        True binary labels.
    lr : float, optional
        Learning rate.
    num_epochs : int, optional
        Number of gradient descent updates to perform.

    Returns
    -------
    w_D : 1D np.array, shape (4,)
        Learned parameter vector [b, w1, w2, w3].
    losses_L : 1D np.array, shape (num_epochs,)
        BCE loss recorded before each parameter update.
    '''
    N = x1_N.shape[0]
    if x2_N.shape[0] != N or y_N.shape[0] != N:
        raise ValueError("x1_N, x2_N, and y_N must have the same length")
    num_epochs = int(num_epochs)
    if num_epochs < 1:
        raise ValueError("num_epochs must be at least 1")

    # ========== TODO ==========
    # 1. Initialize w_D as zeros of shape (4,)
    # 2. Create an empty Python list to store losses
    # 3. For each epoch:
    #       a. Compute probabilities with predict_proba_logistic_regression
    #       b. Compute loss with compute_bce_loss
    #       c. Compute gradient with compute_bce_gradient
    #       d. Append loss to the list
    #       e. Update weights with w_D = w_D - lr * grad_D
    # 4. Convert the list of losses to a NumPy array before returning
    raise NotImplementedError("Remove this line and implement above")
    # ==========================

Running Doctests

Problem 1

Run the linear regression doctests with:

terminal
pixi run python -m doctest test_linear_regression.py -v

Problem 2

Run the logistic regression doctests with:

terminal
pixi run python -m doctest test_logistic_regression.py -v

Submission

Upload hw2_linear_regression.py for Problem 1 and hw2_logistic_regression.py for Problem 2 to eCampus. Do not rename the files. Do not submit the notebook or the doctest files.

Common Mistakes

Problem 1: forgetting to use the provided seed in generate_interaction_synthetic_data, mixing up yhat_N - y_N and y_N - yhat_N, or dropping the interaction term in the gradient.

Problem 2: forgetting to threshold the synthetic score into binary labels, mixing up probabilities and labels inside BCE, or using the wrong sign in the BCE gradient update.