This lesson introduces Python file processing.
Objectives and Skills
Objectives and skills for this lesson include:[1]
- Standard Library
- os module
- sys module
- Input Output
- Files I/O
Readings
Multimedia
Examples
The os.getcwd() Method
The os.getcwd() method returns a string representing the current working directory.[2]
import os
print("Current working directory:", os.getcwd())
Output:
Current working directory: /home/ubuntu/workspace
The os.chdir() Method
The os.chdir() method changes the current working directory to the given path.[3]
import os
directory = os.getcwd()
print("Current working directory:", directory)
os.chdir("..")
print("Changed to:", os.getcwd())
os.chdir(directory)
print("Changed back to:", directory)
Output:
Current working directory: /home/ubuntu/workspace Changed to: /home/ubuntu Changed back to: /home/ubuntu/workspace
The os.path.isdir() Method
The os.path.isdir() method returns True if the given path is an existing directory.[4]
import os
path = os.getcwd()
if os.path.isdir(path):
print("Current working directory exists.")
else:
print("Current working directory does not exist.")
Output:
Current working directory exists.
The os.path.join() Method
The os.path.join() method joins one or more path components intelligently, avoiding extra directory separator (os.sep) characters.[5]
import os
path = os.getcwd()
directory = os.path.join(path, "__python_demo__")
print("path:", path)
print("directory:", directory)
Output:
path: /home/ubuntu/workspace directory: /home/ubuntu/workspace/__python_demo__
The os.mkdir() Method
The os.mkdir() method creates a directory with the given path.[6]
import os
path = os.getcwd()
directory = os.path.join(path, "__python_demo__")
if os.path.isdir(directory):
raise Exception("Path already exists. Can't continue.")
os.mkdir(directory)
The os.rmdir() Method
The os.rmdir() method removes (deletes) the directory with the given path.[7]
import os
path = os.getcwd()
directory = os.path.join(path, "__python_demo__")
if os.path.isdir(directory):
raise Exception("Path already exists. Can't continue.")
os.mkdir(directory)
print("Created directory:", directory)
os.chdir(directory)
print("Changed to:", os.getcwd())
os.chdir(path)
print("Changed back to:", os.getcwd())
os.rmdir(directory)
print("Removed directory:", directory)
Output:
Created directory: /home/ubuntu/workspace/__python_demo__ Changed to: /home/ubuntu/workspace/__python_demo__ Changed back to: /home/ubuntu/workspace Removed directory: /home/ubuntu/workspace/__python_demo__
The os.walk() Method
The os.walk() method generates the subdirectories and files in a given path as a 3-tuple of a path string with subdirectory list and filename list.[8]
import os
for path, directories, files in os.walk(os.getcwd()):
for directory in directories:
print(os.path.join(path, directory))
for file in files:
print(os.path.join(path, file))
Output:
... <all subdirectories and files in the current working directory>
The os.path.isfile() Method
The os.path.isfile() method returns True if the given path is an existing file.[9]
path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if os.path.isfile(filename):
print("File exists.")
else:
print("File does not exist.")
Output:
File does not exist.
The open() Function
The open() function opens the given file in the given mode (read, write, append) and returns a file object.[10]
file = open(filename, "r") # read
file = open(filename, "w") # write
file = open(filename, "a") # append
file = open(filename, "r+") # read + write
file = open(filename, "w+") # read + write (new / cleared file)
file = open(filename, "a+") # read + append (position starts at end of file)
The file.write() Method
The file.write() method writes the contents of the given string to the file, returning the number of characters written.[11]
file.write("Temporary Python Demo File")
The file.close() Method
The file.close() method closes the file and frees any system resources taken up by the open file.[12]
import os
path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
file = open(filename, "w")
file.write("Temporary Python Demo File")
file.close()
if os.path.isfile(filename):
print("Created %s" % filename)
Output:
Created /home/ubuntu/workspace/__python_demo.tmp
The file.read() Method
The file.read() method reads the given number of bytes from the file, or all content if no size is given, and returns the bytes that were read.[13]
import os
path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if os.path.isfile(filename):
file = open(filename, "r")
text = file.read()
file.close()
print("File text:", text)
Output:
File text: Temporary Python Demo File
Reading Lines
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code.[14]
import os
path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
file = open(filename, "r")
for line in file:
print(line, end='')
file.close()
Output:
Temporary Python Demo File
The file.tell() Method
The file.tell() method returns an integer giving the file object’s current position in the file.[15]
import os
path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if os.path.isfile(filename):
file = open(filename, "a+")
print("Open file position:", file.tell())
file.write(" - Appended to the end of the file")
print("Write file position:", file.tell())
Output:
Open file position: 26 Write file position: 60
The file.seek() Method
The file.seek() method moves the file position to the given offset from the given reference point. Reference points are 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file.[16]
import os
path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if os.path.isfile(filename):
file = open(filename, "a+")
print("Open file position:", file.tell())
file.seek(0, 0)
print("Seek file position:", file.tell())
text = file.read()
file.close()
print("File text:", text)
Output:
Open file position: 60 Seek file position: 0 File text: Temporary Python Demo File - Appended to the end of the file
The os.rename() Method
The os.rename() method renames the given source file or directory the given destination name.[17]
import os
path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if not os.path.isfile(filename):
raise Exception("File doesn't exist. Can't continue.")
filename2 = os.path.join(path, "__python_demo2.tmp")
if os.path.isfile(filename2):
raise Exception("File already exists. Can't continue.")
os.rename(filename, filename2)
if os.path.isfile(filename2):
print("Renamed %s to %s" % (filename, filename2))
Output:
Renamed /home/ubuntu/workspace/__python_demo.tmp to /home/ubuntu/workspace/__python_demo2.tmp
The os.remove() Method
The os.remove() method removes (deletes) the given file.[18]
import os
path = os.getcwd()
filename2 = os.path.join(path, "__python_demo2.tmp")
if not os.path.isfile(filename2):
raise Exception("File doesn't exist. Can't continue.")
os.remove(filename2)
if not os.path.isfile(filename2):
print("Removed %s" % filename2)
Output:
Removed /home/ubuntu/workspace/__python_demo2.tmp
The sys.argv Property
The sys.argv property returns the list of command line arguments passed to a Python script. argv[0] is the script name.[19]
import sys
for i in range(0, len(sys.argv)):
print("sys.argv[%d]: %s" % (i, sys.argv[i]))
Output:
sys.argv[0]: /home/ubuntu/workspace/argv.py sys.argv[1]: test1 sys.argv[2]: test2
Activities
Tutorials
- Complete one or more of the following tutorials:
- TutorialsPoint
- Codecademy
- Wikiversity
- Wikibooks
Practice
- Create a Python program that displays high, low, and average quiz scores based on input from a file. Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to parse the file and add each score to a list. Display the list of entered scores sorted in descending order and then calculate and display the high, low, and average for the entered scores. Include error handling in case the file is formatted incorrectly. Create a text file of names and grade scores to use for testing based on the following format:
Larry Fine: 80
Curly Howard: 70
Moe Howard: 90
- Create a Python program that asks the user for a file that contains HTML tags, such as:
<p><strong>This is a bold paragraph.</strong></p>
Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to search for and remove all HTML tags from the text, saving each removed tag in a dictionary. Print the untagged text and then use a function to display the list of removed tags sorted in alphabetical order and a histogram showing how many times each tag was used. Include error handling in case an HTML tag isn't entered correctly (an unmatched < or >). Use a user-defined function for the actual string processing, separate from input and output. For example:
</p>: *
</strong>: *
<p>: *
<strong>: *
- Create a Python program that asks the user for a file that contains lines of dictionary keys and values in the form:
Larry Fine: 80
Curly Howard: 70
Moe Howard: 90
Keys may contain spaces but should be unique. Values should always be an integer greater than or equal to zero. Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to parse the file and build a dictionary of key-value pairs. Then display the dictionary sorted in descending order by value (score). Include input validation and error handling in case the file accidentally contains the same key more than once. - Create a Python program that checks all Python (.py) files in a given directory / folder. Check for a folder path parameter passed from the command line. If there is no parameter, ask the user to input a folder path for processing. Verify that the folder exists and then check all Python files in the folder for an initial docstring. If the file contains an initial docstring, continue processing with the next file. If the file does not start with a docstring, add a docstring to the beginning of the file similar to:
"""Filename.py"""
Add a blank line between the docstring and the existing file code and save the file. Test the program carefully to be sure it doesn't alter any non-Python files and doesn't delete existing file content.
Lesson Summary
File Concepts
- A file system is used to control how data is stored and retrieved. There are many different kinds of file systems. Each one has different structure and logic, properties of speed, flexibility, security, size and more.[20]
- File systems are responsible for arranging storage space; reliability, efficiency, and tuning with regard to the physical storage medium are important design considerations.[21]
- File systems allocate space in a granular manner, usually multiple physical units on the device.[22]
- A filename (or file name) is used to identify a storage location in the file system.[23]
- File systems typically have directories (also called folders) which allow the user to group files into separate collections.[24]
- A file system stores all the metadata associated with the file—including the file name, the length of the contents of a file, and the location of the file in the folder hierarchy—separate from the contents of the file.[25]
- Directory utilities may be used to create, rename and delete directory entries.[26]
- File utilities create, list, copy, move and delete files, and alter metadata.[27]
- All file systems have some functional limit that defines the maximum storable data capacity within that system.[28]
- A directory is a file system cataloging structure which contains references to other computer files, and possibly other directories.[29]
- A text file is a kind of computer file that is structured as a sequence of lines of electronic text.[30]
- MS-DOS and Windows use a common text file format, with each line of text separated by a two-character combination: CR and LF, which have ASCII codes 13 and 10.[31]
- Unix-like operating systems use a common text file format, with each line of text separated by a single newline character, normally LF.[32]
Python Files
- The os.getcwd() method returns a string representing the current working directory.[33]
- The os.chdir() method changes the current working directory to the given path.[34]
- The os.path.isdir() method returns True if the given path is an existing directory.[35]
- The os.path.join() method joins one or more path components intelligently, avoiding extra directory separator (os.sep()) characters.[36]
- The os.mkdir() method creates a directory with the given path.[37]
- The os.rmdir() method removes (deletes) the directory with the given path.[38]
- The os.walk() method generates the subdirectories and files in a given path as a 3-tuple of a path string with subdirectory list and filename list.[39]
- The os.path.isfile() method returns True if the given path is an existing file.[40]
- The open() function opens the given file in the given mode (read, write, append) and returns a file object.[41]
- The file.write() method writes the contents of the given string to the file, returning the number of characters written.[42]
- The file.close() method closes the file and frees any system resources taken up by the open file.[43]
- The file.read() method reads the given number of bytes from the file, or all content if no size is given, and returns the bytes that were read.[44]
- For reading lines from a file, you can loop over the file object using a for loop. This is memory efficient, fast, and leads to simple code.[45]
- The file.tell() method returns an integer giving the file object’s current position in the file.[46]
- The file.seek() method moves the file position to the given offset from the given reference point. Reference points are 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file.[47]
- The os.rename() method renames the given source file or directory the given destination name.[48]
- The os.remove() method removes (deletes) the given file.[49]
- The sys.argv property returns the list of command line arguments passed to a Python script. argv[0] is the script name.[50]
- Python text mode file processing converts platform-specific line endings (\n on Unix, \r\n on Windows) to just \n on input and \n back to platform-specific line endings on output.[51]
- Binary mode file processing must be used when reading and writing non-text files to prevent newline translation.[52]
Key Terms
- catch
- To prevent an exception from terminating a program using the try and except statements.[53]
- newline
- A special character used in files and strings to indicate the end of a line.[54]
- Pythonic
- A technique that works elegantly in Python. “Using try and except is the Pythonic way to recover from missing files”.[55]
- Quality Assurance
- A person or team focused on insuring the overall quality of a software product. QA is often involved in testing a product and identifying problems before the product is released.[56]
- text file
- A sequence of characters stored in permanent storage like a hard drive.[57]
Review Questions
-
A file system is _____.A file system is used to control how data is stored and retrieved. There are many different kinds of file systems. Each one has different structure and logic, properties of speed, flexibility, security, size and more.
-
File systems are responsible for _____.File systems are responsible for arranging storage space; reliability, efficiency, and tuning with regard to the physical storage medium are important design considerations.
-
File systems allocate _____.File systems allocate space in a granular manner, usually multiple physical units on the device.
-
A filename (or file name) is used to _____.A filename (or file name) is used to identify a storage location in the file system.
-
File systems typically have directories (also called folders) which _____.File systems typically have directories (also called folders) which allow the user to group files into separate collections.
-
A file system stores all the metadata associated with the file—including _____.A file system stores all the metadata associated with the file—including the file name, the length of the contents of a file, and the location of the file in the folder hierarchy—separate from the contents of the file.
-
Directory utilities may be used to _____.Directory utilities may be used to create, rename and delete directory entries.
-
File utilities _____.File utilities create, list, copy, move and delete files, and alter metadata.
-
All file systems have some functional limit that defines _____.All file systems have some functional limit that defines the maximum storable data capacity within that system.
-
A directory is _____.A directory is a file system cataloging structure which contains references to other computer files, and possibly other directories.
-
A text file is _____.A text file is a kind of computer file that is structured as a sequence of lines of electronic text.
-
MS-DOS and Windows use a common text file format, with _____.MS-DOS and Windows use a common text file format, with each line of text separated by a two-character combination: CR and LF, which have ASCII codes 13 and 10.
-
Unix-like operating systems use a common text file format, with _____.Unix-like operating systems use a common text file format, with each line of text separated by a single newline character, normally LF.
-
The os.getcwd() method _____.The os.getcwd() method returns a string representing the current working directory.
-
The os.chdir() method _____.The os.chdir() method changes the current working directory to the given path.
-
The os.path.isdir() method _____.The os.path.isdir() method returns True if the given path is an existing directory.
-
The os.path.join() method _____.The os.path.join() method joins one or more path components intelligently, avoiding extra directory separator (os.sep()) characters.
-
The os.mkdir() method _____.The os.mkdir() method creates a directory with the given path.
-
The os.rmdir() method _____.The os.rmdir() method removes (deletes) the directory with the given path.
-
The os.walk() method _____.The os.walk() method generates the subdirectories and files in a given path as a 3-tuple of a path string with subdirectory list and filename list.
-
The os.path.isfile() method _____.The os.path.isfile() method returns True if the given path is an existing file.
-
The open() function _____.The open() function opens the given file in the given mode (read, write, append) and returns a file object.
-
The file.write() method _____.The file.write() method writes the contents of the given string to the file, returning the number of characters written.
-
The file.close() method _____.The file.close() method closes the file and frees any system resources taken up by the open file.
-
The file.read() method _____.The file.read() method reads the given number of bytes from the file, or all content if no size is given, and returns the bytes that were read.
-
For reading lines from a file, you can _____.For reading lines from a file, you can loop over the file object using a for loop. This is memory efficient, fast, and leads to simple code.
-
The file.tell() method _____.The file.tell() method returns an integer giving the file object’s current position in the file.
-
The file.seek() method _____.The file.seek() method moves the file position to the given offset from the given reference point. Reference points are 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file.
-
The os.rename() method _____.The os.rename() method renames the given source file or directory the given destination name.
-
The os.remove() method _____.The os.remove() method removes (deletes) the given file.
-
The sys.argv property _____. argv[0] is _____.The sys.argv property returns the list of command line arguments passed to a Python script. argv[0] is the script name.
-
Python text mode file processing _____.Python text mode file processing converts platform-specific line endings (\n on Unix, \r\n on Windows) to just \n on input and \n back to platform-specific line endings on output.
-
Binary mode file processing must be used when _____.Binary mode file processing must be used when reading and writing non-text files to prevent newline translation.
Assessments
See Also
References
- ↑ Vskills: Certified Python Developer
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: os.path
- ↑ Python.org: os.path
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: os.path
- ↑ Python.org: Built-in Functions
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: System-specific parameters and functions
- ↑ Wikipedia: File system
- ↑ Wikipedia: File system
- ↑ Wikipedia: File system
- ↑ Wikipedia: File system
- ↑ Wikipedia: File system
- ↑ Wikipedia: File system
- ↑ Wikipedia: File system
- ↑ Wikipedia: File system
- ↑ Wikipedia: File system
- ↑ Wikipedia: Directory (computing)
- ↑ Wikipedia: Text file
- ↑ Wikipedia: Text file
- ↑ Wikipedia: Text file
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: os.path
- ↑ Python.org: os.path
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: os.path
- ↑ Python.org: Built-in Functions
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: Miscellaneous operating system interfaces
- ↑ Python.org: System-specific parameters and functions
- ↑ Python.org: Input and Output
- ↑ Python.org: Input and Output
- ↑ PythonLearn: Files
- ↑ PythonLearn: Files
- ↑ PythonLearn: Files
- ↑ PythonLearn: Files
- ↑ PythonLearn: Files