Table of Contents
There are a number of operations closely related to file processing. Deleting and renaming files are examples of operations that change the directory information that the operating system maintains to describe a file. Python provides numerous modules for these operating system operations.
We can't begin to cover all of the various ways in which Python supports file handling. However, we can identify the essential modules that may help you avoid reinventing the wheel. Further, these modules can provide you a view of the Pythonic way of working with data from files.
The following modules have features that are essential for supporting file processing. We'll cover selected features of each module that are directly relevant to file processing. We'll present these in the order you'd find them in the Python library documentation.
Chapter 11 - File and Directory Access. Chapter 11 of the Library reference covers many modules which are essential for reliable use of files and directories. We'll look closely at the following modules.
os.pathCommon pathname manipulations. Use this to split and join full directory path names. This is operating-system neutral, with a correct implementation for all operating systems.
osMiscellaneous OS interfaces. This includes parameters of the current process, additional file object creation, manipluations of file descriptors, managing directories and files, managing subprocesses, and additional details about the current operating system.
fileinputThis module has functions which will iterate over lines from multiple input streams. This allows you to write a single, simple loop that processes lines from any number of input files.
tempfileGenerate temporary files and temporary file names.
globUNIX shell style pathname pattern expansion. Unix shells
translate name patterns like *.py into a
list of files. This is called
globbing. The glob
module implements this within Python, which allows this feature to
work even in Windows where it isn't supported by the
OS itself.
fnmatchUNIX shell style filename pattern matching. This implements
the glob-style rules using *, ? and []. * matches any number of
characters, ? matches any single character,
[chars] encloses a
list of allowed characters,
[!chars] encloses a
list of disallowed characters.
shutilHigh-level file operations, including copying and removal. The kinds of things that the shell handles with simple commands like cp or rm become available to a Python program, and are just as simple in Python as they are in the shell.
Chapter 12 - Data Compression and Archiving. Data Compression is covered in Chapter 12 of the Library referece. We'll look closely at the following modules.
tarfile, zipfileThese modules helps you read and write archive files; files
which are an archive of a complex directory structure. This includes
GNU/Linux tape archive (.tar) files, compressed
GZip tar files (.tgz files or
.tar.gz files) sometimes called tarballs, and
ZIP files.
zlib, gzip, bz2These modules are all variations on a common theme of reading
and writing files which are compressed to
remove redundant bytes of data. The zlib and
bz2 modules have a more sophisticated
interface, allowing you to use compression selectively within a more
complex application. The gzip module has a
different (and simpler) interface that only applies only to complete
files.
Chapter 26 - Python Runtime Services. These modules described in Chapter 26 of the Library reference include some that are used for handling various kinds of files. We'll look closely as just one.
sysThis module has several system-specific parameters and functions, including definitions of the three standard files that are available to every program.
The os.path module contains more useful
functions for managing path and directory names. A serious mistake is to
use ordinary string functions with literal
strings for the path separators. A Windows
program using \ as the separator won't work anywhere
else. A less serious mistake is to use os.pathsep
instead of the routines in the os.path
module.
The os.path module contains the following
functions for completely portable path and filename manipulation.
os.path.basename (path
) → fileNameReturn the base filename, the second half of the result
created by os.path.split( path
)
os.path.dirname (path
) → dirNameReturn the directory name, the first half of the result
created by os.path.split( path
)
os.path.exists (path
) → booleanReturn True if the pathname refers to an existing file or directory.
os.path.getatime (path
) → timeReturn the last access time of a file, reported by
os.stat. See the time
module for functions to process the time value.
os.path.getmtime (path
) → timeReturn the last modification time of a file, reported by
os.stat. See the time
module for functions to process the time value.
os.path.getsize (path
) → intReturn the size of a file, in bytes, reported by
os.stat.
os.path.isdir (path
) → booleanReturn True if the pathname refers to an existing directory.
os.path.isfile (path
) → booleanReturn True if the pathname refers to an existing regular file.
os.path.join (string,
... ) → pathJoin path components using the appropriate path separator.
os.path.split (path
) → tupleSplit a pathname into two parts: the directory and the
basename (the filename, without path separators, in that
directory). The result (s, t) is such that
os.path.join(s,
t ) yields the original
path.
os.path.splitdrive (path
) → tupleSplit a pathname into a drive specification and the rest of the path. Useful on DOS/Windows/NT.
os.path.splitext (path
) → tupleSplit a path into root and extension. The extension is
everything starting at the last dot in the last component of the
pathname; the root is everything before that. The result (r, e) is
such that r+e yields the original path.
The following example is typical of the manipulations done with
os.path.
import sys, os.path
def process( oldName, newName ):
Some Processing...
for oldFile in sys.argv[1:]:
dir, fileext= os.path.split(oldFile)
file, ext= os.path.splitext( fileext )
if ext.upper() == '.RST':
newFile= os.path.join( dir, file ) + '.HTML'
print oldFile, '->', newFile
process( oldFile, newFile )
![]() | This program imports the |
![]() | The |
![]() | The for statement sets the variable
|
![]() | Each file name is split into the path name and the base
name. The base name is further split to separate the file name
from the extension. The |
![]() | The extension is tested to be '.RST'. A new file name is
created from the path, base name and a new extension ('.HTML').
The old and new file names are printed and some processing,
defined in the |