6.1. Reading and Writing Files¶
So far, the data we have used in this book have all been either coded right into the program, or have been entered by the user. In real life data reside in files. For example the images we worked with in the image processing unit ultimately live in files on your hard drive. Web pages, and word processing documents, and music are other examples of data that live in files. In this short chapter we will introduce the Python concepts necessary to use data from files in our programs.
6.1.1. Working with Data Files¶
For our purposes, we will assume that our data files are text files–that is, files filled with characters. The Python programs that you write are stored as text files. We can create these files in any of a number of ways. For example, we could use a text editor to type in and save the data. We could also download the data from a website and then save it in a file. Regardless of how the file is created, Python will allow us to manipulate the contents.
In Python, we must open files before we can use them and close them when we are done with them. As you might expect, once a file is opened it becomes a Python object just like all other data. Table 1 shows the functions and methods that can be used to open and close files.
Method Name | Use | Explanation |
---|---|---|
open |
open(filename,'r') |
Open a file called filename and use it for reading. This will return a reference to a file object. |
open |
open(filename,'w') |
Open a file called filename and use it for writing. This will also return a reference to a file object. |
close |
filevariable.close() |
File use is complete. |
6.1.2. Finding a File on your Disk¶
Opening a file requires that you, as a programmer, and Python agree about the
location of the file on your disk. The way that files are located on disk is by
their path You can think of the filename as the short name for a file, and
the path as the full name. For example on a Mac if you save the file
hello.txt
in your home directory the path to that file is
/Users/yourname/hello.txt
On a Windows machine the path looks a bit
different but the same principles are in use. For example on windows the path
might be C:\Users\yourname\My Documents\hello.txt
You can access files in sub-folders, also called directories, under your home
directory by adding a slash and the name of the folder. For example, if you had
a file called hello.py
in a folder called CS150
that is inside a folder
called PyCharmProjects
under your home directory, then the full name for the
file hello.py
is /Users/yourname/PyCharmProjects/CS150/hello.py
. This
is called an absolute file path. An absolute file path typically only works
on a specific computer. Think about it for a second. What other computer in the
world is going to have an absolute file path that starts with
/Users/yourname
?
If a file is not in the same folder as your python program, you need to tell the
computer how to reach it. A relative file path starts from the folder that
contains your python program and follows a computer’s file hierarchy. A file
hierarchy contains folders which contains files and other sub-folders.
Specifying a sub-folder is easy – you simply specify the sub-folder’s name. To
specify a parent folder you use the special ..
notation because every file
and folder has one unique parent. You can use the ..
notation multiple times
in a file path to move multiple levels up a file hierarchy. Here is an example
file hierarchy that contains multiple folders, files, and sub-folders. Folders
in the diagram are displayed in bold type.
Using the example file hierarchy above, the program, myPythonProgram.py
could access each of the data files using the following relative file paths:
data1.txt
../myData/data2.txt
../myData/data3.txt
../../otherFiles/extraData/data4.txt
Here’s the important rule to remember: If your file and your Python program are
in the same directory you can simply use the filename like this:
open('myfile.txt', 'r')
. If your file and your Python program are in
different directories then you must use a relative file path to the file like
this: open('../myData/data3.txt', 'r')
.
6.1.3. Glossary¶
- absolute file path
- The name of a file that includes a path to the file from the root
directory of a file system. An absolute file path always starts
with a
/
. - relative file path
- The name of a file that includes a path to the file from the current
working directory of a program. An relative file path never starts
with a
/
.
6.1.4. Reading a File the Imperative Way¶
As an example, suppose we have a text file called qbdata.txt
that contains
the following data representing statistics about NFL quarterbacks. Although it
would be possible to consider entering this data by hand each time it is used,
you can imagine that it would be time-consuming and error-prone to do this. In
addition, it is likely that there could be data from more quarterbacks and
other years. The format of the data file is as follows
First Name, Last Name, Position, Team, Completions, Attempts, Yards, TDs, Ints, Comp%, Rating
Colt McCoy QB CLE 135 222 1576 6 9 60.8% 74.5 Josh Freeman QB TB 291 474 3451 25 6 61.4% 95.9 Michael Vick QB PHI 233 372 3018 21 6 62.6% 100.2 Matt Schaub QB HOU 365 574 4370 24 12 63.6% 92.0 Philip Rivers QB SD 357 541 4710 30 13 66.0% 101.8 Matt Hasselbeck QB SEA 266 444 3001 12 17 59.9% 73.2 Jimmy Clausen QB CAR 157 299 1558 3 9 52.5% 58.4 Joe Flacco QB BAL 306 489 3622 25 10 62.6% 93.6 Kyle Orton QB DEN 293 498 3653 20 9 58.8% 87.5 Jason Campbell QB OAK 194 329 2387 13 8 59.0% 84.5 Peyton Manning QB IND 450 679 4700 33 17 66.3% 91.9 Drew Brees QB NO 448 658 4620 33 22 68.1% 90.9 Matt Ryan QB ATL 357 571 3705 28 9 62.5% 91.0 Matt Cassel QB KC 262 450 3116 27 7 58.2% 93.0 Mark Sanchez QB NYJ 278 507 3291 17 13 54.8% 75.3 Brett Favre QB MIN 217 358 2509 11 19 60.6% 69.9 David Garrard QB JAC 236 366 2734 23 15 64.5% 90.8 Eli Manning QB NYG 339 539 4002 31 25 62.9% 85.3 Carson Palmer QB CIN 362 586 3970 26 20 61.8% 82.4 Alex Smith QB SF 204 342 2370 14 10 59.6% 82.1 Chad Henne QB MIA 301 490 3301 15 19 61.4% 75.4 Tony Romo QB DAL 148 213 1605 11 7 69.5% 94.9 Jay Cutler QB CHI 261 432 3274 23 16 60.4% 86.3 Jon Kitna QB DAL 209 318 2365 16 12 65.7% 88.9 Tom Brady QB NE 324 492 3900 36 4 65.9% 111.0 Ben Roethlisberger QB PIT 240 389 3200 17 5 61.7% 97.0 Kerry Collins QB TEN 160 278 1823 14 8 57.6% 82.2 Derek Anderson QB ARI 169 327 2065 7 10 51.7% 65.9 Ryan Fitzpatrick QB BUF 255 441 3000 23 15 57.8% 81.8 Donovan McNabb QB WAS 275 472 3377 14 15 58.3% 77.1 Kevin Kolb QB PHI 115 189 1197 7 7 60.8% 76.1 Aaron Rodgers QB GB 312 475 3922 28 11 65.7% 101.2 Sam Bradford QB STL 354 590 3512 18 15 60.0% 76.5 Shaun Hill QB DET 257 416 2686 16 12 61.8% 81.3
To open this file, we would call the open
function. The variable,
fileref
, now holds a reference to the file object returned by
open
. When we are finished with the file, we can close it by using
the close
method. After the file is closed any further attempts to
use fileref
will result in an error.
>>>fileref = open("qbdata.txt", "r")
>>>
>>>fileref.close()
>>>
The process of opening and closing files is very important, as the operating
system will lock write access to a file while it is open. A long-running
process that locks a file might cause problems for other processes that also
need that file. Consequently, the process of opening and closing a file should
be accomplished using the with
statement, as shown below.
In [1]: with open('qbdata.txt') as f:
...: lines = [line for line in f]
...:
In [2]: lines[:5]
Out[2]:
['Colt McCoy QB CLE 135 222 1576 6 9 60.8% 74.5\n',
'Josh Freeman QB TB 291 474 3451 25 6 61.4% 95.9\n',
'Michael Vick QB PHI 233 372 3018 21 6 62.6% 100.2\n',
'Matt Schaub QB HOU 365 574 4370 24 12 63.6% 92.0\n',
'Philip Rivers QB SD 357 541 4710 30 13 66.0% 101.8\n']
Using the with
statement when working with files in Python is considered
a best-practice, as it guarantees that files are properly opened and closed
at the right time.
Note
There are other uses for the with statement and it will work with any object
that supports context management. You can identify context managers by the
presence of the __enter__
and __exit__
methods.
# open is a context manager, illustrated by the existence of __enter__ and
# __exit__
In [3]: f = open('qbdata.txt')
In [4]: f.__enter__
Out[4]: <function TextIOWrapper.__enter__>
In [5]: f.__exit__