#############################################################
# overview
#############################################################


# In lecture tomorrow, we'll be looking at data from files and fitting
# lines/curves to them. Thus, this pre-lecture code explains Python's
# basic mechanism for reading data from files and writing to them as
# well.

# The last section in Pset 3 also involves reading in data and fitting
# lines to it. In the code we provided, we use the `pandas` module,
# which provides convenience functions for reading in data from various
# formats and manipulating it in code. Understanding pandas is out of
# scope for us, but you are welcome to take a look at its documentation
# on your own.

# https://pandas.pydata.org/
# https://pandas.pydata.org/docs/

# Note: Many concepts in pandas rely on Python dictionaries or NumPy
# arrays. We will cover Python dictionaries soon next week.

# Our running example in this pre-lecture code will be data from a
# scores.csv file. On the face of it, you should be able to open it in
# any spreadsheet program (e.g., Microsoft Excel, Google Sheets, Apple
# Numbers, LibreOfice Calc). However, you can also open it as a text
# file in your editor, and you will see each line has the following
# structure.

#   <text>,<text>,<text>,...

# This format is known as **comma-separated values**, or CSV. It's not
# very pretty to look at the raw text, but it's a common format because
# programs can easily split each line on the commas and process the text
# in between.


#############################################################
# reading data from files
#############################################################


# To access the contents of a file, Python first needs to know where the
# file is. A file's location is known as a **path**, which can be
# absolute or relative. An absolute path specifies a series of nested
# folders from the top-level of your filesystem, and finally the name of
# a file in the last folder. This is what absolute paths look like on
# Windows and macOS:

#   Windows: C:/Users/bob/Documents/6.100/lecture9/lec_pre.py
#   macOS:     /Users/bob/Documents/6.100/lecture9/lec_pre.py

# A relative path assumes that you are already in a folder, and it
# specifies a subpath from that point forward to a specific file. When
# we run Python code, it always runs inside a certain folder, called the
# **working directory**. A common convention is to run a code file in
# the folder it's located, so it can access other files in the folder
# just using relative paths. This is why in our Install Python
# instructions, we ask you to enable the **Terminal: Execute in File
# Dir** setting.

# Once we know where a file is, we simply pass that path as a string to
# Python's built-in `open()` function. This function returns an object
# that represents access to the file, so we can perform operations on
# it. The following example demonstrate this. We `open()` a provided
# path and assign the returned file object to a `file` variable. (When
# we inspect the `type()` of the object, it reports back as
# `...TextIOWrapper`. All you need to know for now is that this
# represents a file object.)

# Once we have a file object, we can `.read()` it, which returns a
# string of the file's contents. This function only works once:
# subsequent `.read()`s will return empty strings, because we have
# reached the end of the file. The only remaining operation (for now) is
# to `.close()`, after which the file object is no longer available for
# reading.


def read_file(filepath):
    file = open(filepath)
    print(type(file))
    print("first read")
    print(file.read())
    print("second read")
    print(file.read())
    file.close()
    # print(file.read())  # error: cannot read after closing


read_file("scores.csv")


# A single `.read()` can be pretty inconvenient, especially if the file
# is large. Thus, file objects also have the `.readline()` operation,
# which reads up to and including the newline character. (Newline
# characters represent the end of a line in text. They are normally
# invisible, but you can explicitly encode a newline in a Python string
# as "\n".)

# Further, because we often don't know how long a file is, Python makes
# its file objects iterable, so you can `for` loop over them. The loop
# will implicitly call `.readline()` each time and assign its output to
# the loop variable.


def read_file_lines(filepath):
    file = open(filepath)
    print("read lines 1 through 10")
    for i in range(1, 11):
        print(f"line {i}", file.readline())
    file.close()

    file = open(filepath)
    print("read all available lines")
    for line in file:
        print(line)
    file.close()


# read_file_lines("scores.csv")


# EXERCISE: Each line that's printed out still includes its ending
# newline, which results in blank lines when printing. Figure out a way
# to print without those extra blank lines.
# (Hint: look into str.split().)

# In all the previous examples, we've explicitly called `.close()` on
# the file. This effectively lets Python tell the operating system that
# we don't need access to that file any more. For small programs, it's
# inconsequential to leave around open files. However, operating systems
# have some (generous) limit on how many open files you can open at
# once. Thus, if you are opening many files, it is your responsibility
# to close them properly. In other languages, forgetting to close files
# can be a common source of bugs. Thus, Python provides the syntax:

#   `with open(path) as name:`

# This expects an indented block afterward. After that block executes,
# Python will automatically close the `name` file object for you. Using
# this syntax is a standard pattern in Python.


def read_file_safely(filepath):
    with open(filepath) as file:
        print("read all available lines")
        for line in file:
            print(line)


# read_file_safely("scores.csv")


#############################################################
# writing data to files
#############################################################


# We won't be writing data to files (yet) in tomorrow's lecture.
# However, it's worth knowing that `open()` also supports writing to a
# file, through a second `mode` parameter. This mode defaults to "r" for
# reading, but you can also set it to "w" for writing or "a" for
# appending. The difference is that "w" will wipe the file's contents
# first, while "a" will preserve existing content and perform its writes
# at the end.

# Thus, be very careful when using "w"! Before proceeding, we suggest
# you make a copy of the original scores.csv file, calling it something
# like scores-copy.csv or scores-original.csv.


def write_to_file(filepath, line):
    with open(filepath, "a") as file:
        file.write(line)

    # check file contents
    read_file_safely(filepath)


# write_to_file("scores.csv", "THIS IS A TEST OF THE AUTOMATIC FIRE ALARM---wait, no...")


def remove_sally(filepath):
    # read all lines
    with open(filepath, "r") as file:
        lines = file.readlines()  # this returns a list of all the lines

    # write back all lines except those mentioning Sally
    with open(filepath, "w") as file:
        for line in lines:
            if "sally" not in line.lower():
                file.write(line)

    # check file contents
    read_file_safely(filepath)


# remove_sally("scores.csv")


# If you're curious about more details on `open()` and reading and
# writing to files, these are the links to Python's official
# documentation.

# https://docs.python.org/3/library/functions.html#open
# https://docs.python.org/3/tutorial/inputoutput.html#tut-files
