Lecture 9 Finger Exercise

The questions below are due on Friday May 08, 2026; 11:59:00 PM.

You are not logged in.
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

Question 1. Which of the following are true? Check all that apply.

The R^2 value increases as the distance between the actual data points and the model's predicted curve decreases.

An R^2 value of 0.8 means that the model predicts correctly 80% of the data.

An R^2 value of 1 means that the curve of fit runs through every data point that the model was evaluated with.

If we expressed the measurements of an experiment's data in meters instead of in inches, the R^2 of a model fitted to the data would be approximately 30^2 times larger.

When choosing a degree for numpy.polyfit(), one should choose a degree that maximizes R^2 on the training data.

An R^2 of 0 indicates that the model performs no better than a horizontal line drawn at the mean of the dependent variable.

Question 2. You are given some experimental data for which you can't remember whether it was generated from a quadratic or a cubic polynomial relation. Write a function decide_fit_degree(x_vals, y_vals) that infers which degree model best explains how the data was likely generated.

def decide_fit_degree(x_vals, y_vals):
    """
    Determine whether a quadratic or cubic polynomial model better
    explains the experimental data.

    Parameters:
        x_vals (list): Various float values representing the inputs to the experiment.
        y_vals (list): Various float values representing the corresponding measurements.

    Return either the int 2 or the int 3 to indicate the polymial degree.
    """

You may assume that numpy is imported for you as np, and that the r_squared() and split_50_50() functions shown below are also available to you. You may also assume that random is imported for you, and you should not call random.seed() (the server will ignore any lines containing that text).

def r_squared(observed, predicted):
    """
    Return the R-squared value between observed and predicted values,
    which are lists of floats.
    """

    total_error_squared = 0
    for i in range(len(observed)):
        total_error_squared += (predicted[i] - observed[i]) ** 2

    mean_error = total_error_squared/len(observed)
    return 1 - mean_error / np.var(observed)

def split_50_50(x_vals, y_vals):
    """
    Given lists x_vals and y_vals of equal length, partition the data
    they represent into two sets of roughly equal size.

    Return a list of four elements:
        x_a: A list of the first half of the x values.
        y_a: A list of the first half of the y values.
        x_b: A list of the remaining x values.
        y_b: A list array of the remaining y values.
    """
    split = int(len(x_vals) / 2)
    x_a, y_a = x_vals[:split], y_vals[:split]
    x_b, y_b = x_vals[split:], y_vals[split:]
    return [x_a, y_a, x_b, y_b]

Note 1: The split_50_50() function we provide differs from what was shown in class, in that it always splits the data into a first half and second half. When you submit, read the test cases to see how the data is actually generated, and think about why this splitting strategy should work well for the data.

Note 2: Due to randomness in the data used by the tests, a correct solution may occasionally fail a test. If this is the case, and assuming you have a correct solution, then resubmitting will likely pass the tests.