Old Exams Practice (topics may be different)

You are not logged in.

Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

These questions are for study purposes. They are from past quizzes, so the content/topics may not necessarily match up. We do not provide solutions, but encourage you to come to office hours or post on piazza if you have any questions.

Topic: Graphs

Which of the following are true? Check all that apply.
1-1. If the heuristic, h(n), used to estimate distance in A* underestimates the distance to the destination, A* will still always find an optimal answer.
1-2. If the heuristic, h(n), used to estimate distance in A* overestimates the distance to the destination, A* will still always find an optimal answer.
1-3. Dijkstra's algorithm is a variant of depth-first search.
1-4. In a tree, the depth-first search and breadth-first search algorithms both find the same path between a source and destination node.

Consider this graph:

Which of the following are true? Check all that apply.

2-1. This graph is a digraph.
2-2. This graph is a tree.
2-3. This graph is unweighted.
2-4. On this graph, breadth-first search visits fewer distinct nodes than depth-first search when finding the shortest path from node A to node D.
2-5. There is exactly one path between any pair of nodes.

Consider this graph:

Which of the following are true? Check all that apply.

3-1. This graph has exactly 1 cycle.
3-2. Breadth-first search can be used to find a path between any pair of nodes in this graph.

A town wants to build the minimum total length of bicycle lanes needed so that it will be possible to bike from any address in town to any other address on bicycle lanes. This can best be viewed as (select only one):
4-1. A shortest path problem.
4-2. A minimum spanning tree problem.
4-3. A graph flow problem.
4-4. A graph partition problem.

Here is the depth-first search code from lecture, which finds a shortest path between a source and target. Modify it so that it finds a longest path between a source and target. You can assume that the graphs the function will run on are acyclic, directed, and unweighted.

def dfs_graph_search(graph, source, target, verbose=False):
    def dfs_internal(path, shortest):
        if verbose:
            print('Current DFS path:', path_to_string(path))

        last_node = path[-1]
        if last_node == target:
            if verbose:
                print('Path found')
            return path

        if shortest and len(path) + 1 >= len(shortest):
            return None

        best_path = None
        for next_node in graph.children_of(last_node):
            if next_node in path:
                continue
            new_path = dfs_internal(path + [next_node], shortest)
            if new_path:
                if not best_path or len(new_path) < len(best_path):
                    best_path = new_path
                if not shortest or len(new_path) < len(shortest):
                    shortest = new_path
        return best_path

    return dfs_internal([source], None)

Topic: Regression and CLT

Implement a function that meets the specification below. You may use only the libraries imported for you below. We will manually grade submissions.
You are given the following functions, click to expand and see them.

def split_data(x_vals, y_vals, frac_training=0.7):
    training_size = int(len(x_vals)*frac_training)
    training_indices = random.sample(range(len(x_vals)), training_size)
    training_x, training_y, test_x, test_y = [], [], [], []
    for i in range(len(x_vals)):
        if i in training_indices:
            training_x.append(x_vals[i])
            training_y.append(y_vals[i])
        else:
            test_x.append(x_vals[i])
            test_y.append(y_vals[i])
    return (training_x, training_y), (test_x, test_y)

def r_squared(observed, predicted):
    error = ((predicted - observed)**2).sum()
    mean_error = error / len(observed)
    return 1 - mean_error / np.var(observed)

import numpy as np
import random

def find_best_fit(x, y, min_deg, max_deg):
    """ - x and y are numpy arrays of floats
        - x elements are in ascending order
        - min_deg and max_deg are ints > 0 with min_deg <= max_deg

    The values in y are known to be generated from the values in
    x by some process that can be modeled by a polynomial function
    plus noise of some degree between min_deg and max_deg, inclusive.

    Return the degree of the polynomial that best models the  
    polynomial function without overfitting. """
    # your code here

# For example:
x = [1,2,3,4,5,6,7,8,9,10]
y = [1,4.2,7.9,15.3,27.5,38,51.2,63.1,88,98.9]
print(find_best_fit(x, y, 1, 4))    # prints 2

Implement the function that meets the specification below. You may not import libraries. You have access to a function called correlation, which returns a float representing the correlation between two lists. Run your code assuming it exists.

def highest_window_correlation(A, B, w):
    """ - A and B are lists of numbers
        - w is an int <= min(len(A), len(B))

    Returns the highest correlation between two contiguous 
    windows of length w. We compare correlations by their
    numerical value. One window is over A and the other 
    over B. Round the highest correlation found to 3 digits.
    """
    # your code here


# For example:
A = [1, 2, 3, 4, 5, 6]
B = [3, 4, 2, 5, 7, 8]
print(highest_window_correlation(A, B, 3))  # prints 0.982

A = [1, 2, 3, 4, 5]
B = [2, 4, 6, 8, 10]
print(highest_window_correlation(A, B, 2))  # prints 1.0

Consider training a linear (polynomial) regression model of degree d < 100 on a dataset called A, consisting of 1000 points. Which of the following are True? Check all that apply.
3-1: An R^2 value of 0.8 on A means that the model accounts for approximately 80% of the variance in A.
3-2: A negative R^2 value is possible if the model is tested on A.
3-3: Increasing d will never decrease the R^2 value of evaluating the model on A.
3-4: If A is only 60% of the data in some larger dataset, then increasing d will never decrease the R^2 value of evaluating the model on the remaining 40%.
3-5: If A is only 10% of the data in some larger dataset, then increasing d might increase or might decrease the R^2 value of evaluating the model on the remaining 90%.
3-6: A danger of overfitting is that the model captures experimental error in the data.

Assume that X is a list of pseudo-random numbers, and that k is a constant. Which of the following are true? Check all that apply.
4-1: If Y = [k**x for x in X], then numpy.polyfit(X, Y, 1) will fit a straight line to the values.
4-2: If Y = [k for x in X], then numpy.polyfit(X, Y, 2) will fit an approximately straight line to the values.
4-3: numpy.polyval returns a list of the coefficients of a linear regression model.
4-4: numpy.polyfit can be used to fit a function of more than one variable.

Topic: Statistical Tests, Machine Learning

Which of the following are true? Check all that apply.
5-1: A pvalue is the probability that the null hypothesis is true.
5-2: 1 - pvalue is the probability that the null hypothesis is true.
5-3: In testing the hypothesis that African men are taller than Asian men, a small pvalue implies that the difference in heights is large.

All applicants applying for a position at a company fill out a questionnaire. The company uses machine learning to decide who to interview for a manager position. They keep only the questionnaires filled out by all applicants who were HIRED FOR ANY position at the company (N = 10000, call this the data). In each data example, the label is 1 if the employee was hired as a manager and 0 otherwise. The company built a logistic regression model. When they evaluated it on the entire data it had a very high accuracy, precision, and recall. Therefore, they decided to use the model for their purposes. What are the flaws with this approach? Check all that apply.
6-1: Non-response bias.
6-2: Survivor bias.
6-3: Using a logistic regression model instead of clustering the data.
6-4: Deciding to deploy the model because they expected similar results when applied to new applicants.
6-5: Failing to perform a multiple hypothesis test.
6-6: Failing to report a p-value.
6-7: Failing to evaluate the model with an R^2 value.

Consider a random forest classifier with N trees of maximum depth D. Which of the following are true? Check all that apply.
7-1: The 'random' in random forest refers to both random sampling of data and random selection of features.
7-2: The random forest reaches its answer by averaging the probabilities produced by the trees in the forest.
7-3: The probability that the classifer is overfit to the training data increases as the value of D increases.
7-4: The probability that the classifier is overfit to the training data increases as the value of N increases.
7-5: Random forests cannot be used when feature values are continuous.

Which of the following is true about k-means clustering? Check all that apply.
8-1: Once the initial centroids have been chosen, the algorithm is deterministic.
8-2: One problem with k-means clustering is that for small k it often takes a long time to converge.
8-3: As k grows, the average intra-cluster distance tends to grow.
8-4: The clustering found is independent of the distance metric used.