>
This page has been moved to http://www.itmaybeahack.com/homepage/books/python/html/p04/p04c03_library.html. Please update your bookmarks.
Consistent with the Pythonic “Batteries Included” philopsophy of Python, there are hundreds of extension modules. It can be difficult to match a programming need with a specific module. The Python Library Reference document can be hard to pick through to locate an appropriate module. We’ll start at the top of the library organization and work our way down to a useful subset of the tremendous wealth that is Python.
In Overview of the Python Library we’ll take a very high level overview of what’s in the Python library. We’ll closely at the 50 or so most useful modules in Most Useful Library Sections.
The Python Library Reference organizes modules into the following sections. The current version of the Library documentation strives to prsent the modules with the most useful near the front of the list. The first 23 chapters, plus chapter 26 are the most useful. From chapter 24 and below (except for chapter 26), the modules are too highly specialized to cover in this book.
This section will overview about 50 of the most useful libary modules. These modules are proven technology, widely used, heavily tested and constantly improved. The time spent learning these modules will reduce the time it takes you to build an application that does useful work.
We’ll dig more deeply into just a few of these modules in subsequent chapters.
Tip
Lessons Learned
As a consultant, we’ve seen far too many programmers writing modules which overlap these. There are two causes: ignorance and hubris. In this section, we hope to tackle the ignorance cause.
Python includes a large number of pre-built modules. The more you know about these, the less programming you have to do.
Hubris sometimes comes from the feeling that the library module doesn’t fit our unique problem well enough to justify studying the library module. In other languages we can’t read the library module to see what it really does. In Python, however, the documentation is only an introduction; we’re encouraged to actually read the library module. This is called the “Use the Source, Luke” principle.
We find that hubris is most closely associated with calendrical calcuations. It isn’t clear why programmers invest so much time and effort writing buggy calendrical calculations. Python provides many modules for dealing with times, dates and the calendar.
8. String Services. The String Services modules contains string-related functions or classes. See Strings for more information on strings.
| re: | The re module is the core of text pattern recognition and processing. A regular expression is a formula that specifies how to recognize and parse strings. The re module is described in detail in Complex Strings: the re Module. |
|---|---|
| struct: | The avowed purpose of the struct module is to allow a Python program to access C-language API’s; it packs and unpacks C-language struct object. It turns out that this module can also help you deal with files in packed binary formats. |
| difflib: | The difflib module contains the essential algorithms for comparing two sequences, usually sequences of lines of text. This has algorithms similar to those used by the Unix diff command (the Window COMP command). |
| StringIO: | |
| cStringIO: | There are two variations on StringIO which provide file-like objects that read from or write to a string buffer. The StringIO module defines the class StringIO , from which subclasses can be derived. The cStringIO module provides a high-speed C-language implementation that can’t be subclassed. Note that these modules have atypical mixed-case names. |
| textwrap: | This is a module to format plain text. While the word-wrapping task is sometimes handled by word processors, you may need this in other kinds of programs. Plain text files are still the most portable, standard way to provide a document. |
| codecs: | This module has hundreds of text encodings. This includes the vast array of Windows code pages and the Macintosh code pages. The most commonly used are the various Unicode schemes (utf-16 and utf-8). However, there are also a number of codecs for translating between strings of text and arrays of bytes. These schemes include base-64, zip compression, bz2 compression, various quoting rules, and even the simple rot_13 substitution cipher. |
9. Data Types. The Data Types modules implement a number of widely-used data structures. These aren’t as useful as sequences, dictionaries or strings – which are built-in to the language. These data types include dates, general collections, arrays, and schedule events. This module includes modules for searching lists, copying structures or producing a nicely formatted output for a complex structure.
| datetime: | The datetime handles details of the calendar, including dates and times. Additionally, the time module provides some more basic functions for time and date processing. We’ll cover both modules in detail in Dates and Times: the time and datetime Modules. These modules mean that you never need to attempt your own calendrical calculations. One of the important lessons learned in the late 90’s was that many programmers love to tackle calendrical calculations, but their efforts had to be tested and reworked prior to January 1, 2000, because of innumerable small problems. |
|---|---|
| calendar: | This module contains routines for displaying and working with the calendar. This can help you determine the day of the week on which a month starts and ends; it can count leap days in an interval of years, etc. |
| collections: | This package contains some handy data types, plus the Abstract Base Classes that we use for defining our own collections. Data types include the collections.deque – a “double-ended queue” – that can be used as stack (LIFO) or queue (FIFO). The collections.defaultdict class, which can return a default value instead of raising an exception for missing keys. The collections.namedtuple function helps us to create a small, specialized class that is a tuple with named positions. We made use of this library in Creating or Extending Data Types. |
| bisect: | The bisect module contains the bisect() function to search a sorted list for a specific value. It also contains the insort() fucntion to insert an item into a list maintaining the sorted order. This module performs faster than simply appending values to a list and calling the sort() method of a list. This module’s source is instructive as a lesson in well-crafted algorithms. |
| array: | The array module gives you a high-performance, highly compact collection of values. It isn’t as flexible as a list or a tuple, but it is fast and takes up relatively little memory. This is helpful for processing media like image or sound files. |
| sched: | The sched module contains the definition for the scheduler class that builds a simple task scheduler. When a scheduler is contructed, it is given two user-supplied functions: one returns the “time” and the other executes a “delay” waiting for the time to arrive. For real-time scheduling, the time module time() and sleep() functions can be used. The scheduler has a main loop that calls the supplied time function and compares the current time with the time for scheduled tasks; it then calls the supplied a delay function for the difference in time. It runs the scheduled task, and calls the delay function with a duration of zero to release any resources. Clearly, this simple algorithm is very versatile. By supplying custom time functions that work in minutes instead of seconds, and a delay function that does additional background processing while waiting for the scheduled time, a flexible task manager can be constructed. |
| copy: | The copy module contains functions for making copies of complex objects. This module contains a function to make a shallow copy of an object, where any objects contained within the parent are not copied, but references are inserted in the parent. It also contains a function to make a deep copy of an object, where all objects contained within the parent object are duplicated. Note that Python’s simple assignment only creates a variable which is a label (or reference) to an object, not a duplicate copy. This module is the easiest way to create an independent copy. |
| pprint: | The pprint module contains some useful functions like pprint.pprint() for printing easy-to-read representations of nested lists and dictionaries. It also has a PrettyPrinter class from which you can make subclasses to customize the way in which lists or dictionaries or other objects are printed. |
10. Numeric and Mathematical Modules. These modules include more specialized mathemathical functions and some additional numeric data types.
| decimal: | The decimal module provides decimal-based arithmetic which correctly handles significant digits, rounding and other features common to currency amounts. |
|---|---|
| math: | The math module was covered in The math Module. It contains the math functions like sine, cosine and square root. |
| random: | The random module was covered in The math Module. |
11. File and Directory Access. We’ll look at many of these modules in File Handling Modules. These are the modules which are essential for handling data files.
| os.path: | The os.path module is critical for creating portable Python programs. The popular operating systems (Linux, Windows and MacOS) each have different approaches to file names. A Python program that depends on os.path will behave more consistently in all environments. |
|---|---|
| fileinput: | The fileinput module helps your progam process a large number of files smoothly and simply. |
| glob: | |
| fnmatch: | The glob and fnmatch modules help a Windows program handle wild-card file names in a standard manner. |
| shutil: | The shutil module provides shell-like utilities for file copy, file rename, directory moves, etc. This module lets you write short, effective Pytthon programs that do things that are typically done by shell scripts. Why use Python instead of the shell? Python is far easier to read, far more efficient, and far more capable of writing moderately sophisticated programs. Using Python saves you from having to write long, painful shell scripts. |
12. Data Persistence. There are several issues related to making objects persistent. In Chapter 12 of the Python Reference, there are several modules that help deal with files in various kinds of formats. We’ll talk about these modules in detail in File Formats: CSV, Tab, XML, Logs and Others.
There are several additional techniques for managing persistence. We can “pickle” or “shelve” an object. In this case, we don’t define our file format in detail, instead we leave it to Python to persist our objects.
We can map our objects to a relational database. In this case, we’ll use the SQL language to define our storage, create and retrieve our objects.
| pickle: | |
|---|---|
| shelve: | The pickle and shelve modules are used to create persistent objects; objects that persist beyond the one-time execution of a Python program. The pickle module produces a serial text representation of any object, however complex; this can reconstitute an object from its text representation. The shelve module uses a dbm database to store and retrieve objects. The shelve module is not a complete object-oriented database, as it lacks any transaction management capabilities. |
| sqlite3: | This module provides access to the SQLite relational database. This database provides a significant subset of SQL language features, allowing us to build a relational database that’s compatible with products like MySQL or Postgres. |
13. Data Compression and Archiving. These modules handle the various file compression algorithms that are available. We’ll look at these modules in File Handling Modules.
| tarfile: | |
|---|---|
| zipfile: | These two modules create archive files, which contain a number of files that are bound together. The TAR format is not compressed, where the ZIP format is compressed. Often a TAR archive is compressed using GZIP to create a .tar.gz archive. |
| zlib: | |
| gzip: | |
| bz2: | These modules emplioye different compression algorithms. They all have similar features to compress or uncompress files. |
14. File Formats. These are modules for reading and writing files in a few of the amazing variety of file formats that are in common use. In addition to these common formats, modules in chapter 20, Structured Markup Processig Tools are also important.
| csv: | The csv module helps you parse and create Comma-Separated Value (CSV) data files. This helps you exchange data with many desktop tools that produce or consume CSV files. We’ll look at this in Comma-Separated Values: The csv Module. |
|---|---|
| ConfigParser: | Configuration files can take a number of forms. The simplest approach is to use a Python module as the configuration for a large, complex program. Sometimes configurations are encoded in XML. Many Windows legacy programs use .INI files. The ConfigParser can gracefully parse these files. |
15. Cryptographic Services. These modules aren’t specifically encryption modules. Many popular encryption algorithms are protected by patents. Often, encryption requires compiled modules for performance reasons. These modules compute secure digests of messages using a variety of algorithms.
| hashlib: | Compute a secure hash or digest of a message to ensure that it was not tampered with. The hashlib.md5 class creates an MD5 hash, which is often used for validating that a downloaded file was recieved correctly and completely. |
|---|
16. Generic Operating System Services. The following modules contain basic features that are common to all operating systems. Most of this commonality is acheived by using the C standard libraries. By using this module, you can be assured that your Python application will be portable to almost any operating system.
| os: | The os (and os.path) modules provide access to a number of operating system features. The os module provides control over Processes, Files and Directories. We’ll look at os and os.path in The os Module and The os.path Module. |
|---|---|
| time: | The time module provides basic functions for time and date processing. Additionally datetime handles details of the calendar more gracefully than time does. We’ll cover both modules in detail in Dates and Times: the time and datetime Modules. Having modules like datetime and time mean that you never need to attempt your own calendrical calculations. One of the important lessons learned in the late 90’s was that many programmers love to tackle calendrical calculations, but their efforts had to be tested and reworked because of innumerable small problems. |
| getopt: | |
| optparse: | A well-written program makes use of the command-line interface. It is configured through options and arguments, as well as properties files. We’ll cover optparse in Programs: Standing Alone. Command-line programs for Windows will also need to use the glob module to perform standard file-name globbing. |
| logging: | Often, you want a simple, standardized log for errors as well as debugging information. We’ll look at logging in detail in Log Files: The logging Module. |
17. Optional Operating System Services. This section includes less-common modules for handling threading other features that are more-or-less unavailable in Windows.
18. Interprocess Communication and Networking. This section includes modules for creating processes and doing simple interprocess communication (IPC) using the standard socket abstraction.
| subprocess: | The subprocess module provides the class required to create a separate process. The standard approach is called forking a subprocess. Under Windows, similar functionality is provided. Using this, you can write a Python program which can run any other program on your computer. This is very handy for automating complex tasks, and it allows you to replace clunky, difficult shell scripts with Python scripts. |
|---|---|
| socket: | This is a Python implementation of the standard socket library that supports the TCP/IP protocol. |
19. Internet Data Handling. The Internet Data Handling modules contain a number of handy algorithms. A great deal of data is defined by the Internet Request for Comments (RFC). Since these effectively standardize data on the Internet, it helps to have modules already in place to process this standardized data. Most of these modules are specialized, but a few have much wider application.
| mimify: | |
|---|---|
| base64: | |
| binascii: | |
| binhex: | |
| quopri: | |
| uu: | These modules all provide various kinds of conversions, ecapes or quoting so that binary data can be manipulated as safe, universal ASCII text. The number of these modules reflects the number of different clever solutions to the problem of packing binary data into ordinary email messages. |
20. Structured Markup Processing Tools. The following modules contain algorithms for working with structured markup: Standard General Markup Lanaguage (SGML), Hypertext Markup Language (HTML) and Extensible Markup Language (XML). These modules simplify the parsing and analysis of complex documents. In addition to these modules, you may also need to use the CSV module for processing files; that’s in chapter 9, File Formats.
| htmllib: | Ordinary HTML documents can be examined with the htmllib module. This module based on the sgmllib module. The basic HTMLParser class definition is a superclass; you will typically override the various functions to do the appropriate processing for your application. One problem with parsing HTML is that browsers – in order to conform with the applicable standards – must accept incorrect HTML. This means that many web sites publish HTML which is tolerated by browsers, but can’t easily be parsed by htmllib. When confronted with serious horrors, consider downloading the Beautiful Soup module (http://www.crummy.com/software/BeautifulSoup/). This handles erroneous HTML more gracefully than htmllib. |
|---|---|
| xml.sax: | |
| xml.dom: | |
| xml.dom.minidom: | |
The xml.sax and xml.dom modules provide the classes necessary to conveniently read and process XML documents. A SAX parser separates the various types of content and passes a series of events the handler objects attached to the parser. A DOM parser decomposes the document into the Document Object Model (DOM). The xml.dom module contains the classes which define an XML document’s structure. The xml.dom.minidom module contains a parser which creates a DOM object. |
|
Additionally, the formatter module, in chapter 24 (Miscellaneous Modules) goes along with these.
21. Internet Protocols and Support. The following modules contain algorithms for responding the several of the most common Internet protocols. These modules greatly simplify developing applications based on these protocols.
| cgi: | The cgi module can be used for web server applications invoked as Common Gateway Interface (CGI) scripts. This allows you to put Python programming in the cgi-bin directory. When the web server invokes the CGI script, the Python interpreter is started and the Python script is executed. |
|---|---|
| wsgiref: | The Web Services Gateway Interface (WSGI) standard provides a much simpler framework for web applications and web services. See PEP 333 for more information. Essentially, this subsumes all of CGI, plus adds several features and a systematic way to compose larger applications from smaller components. |
| urllib: | |
| urllib2: | |
| urlparse: | These modules allow you to write relatively simple application programs which open a URL as if it were a standard Python file. The content can be read and perhaps parsed with the HTML or XML parser modules, described below. The urllib module depends on the httplib, ftplib and gopherlib modules. It will also open local files when the scheme of the URL is file:. The urlparse module includes the functions necessary to parse or assemble URL’s. The urllib2 module handles more complex situations where there is authentication or cookies involved. |
| httplib: | |
| ftplib: | |
| gopherlib: | The httplib, ftplib and gopherlib modules include relatively complete support for building client applications that use these protocols. Between the html module and httplib module, a simple character-oriented web browser or web content crawler can be built. |
| poplib: | |
| imaplib: | The poplib and imaplib modules allow you to build mail reader client applications. The poplib module is for mail clients using the Post-Office Protocol, POP3 (RFC 1725), to extract mail from a mail server. The imaplib module is for mail servers using the Internet Message Access Protocol, IMAP4 (RFC 2060) to manage mail on an IMAP server. |
| nntplib: | The nntplib module allows you to build a network news reader. The newsgroups, like comp.lang.python, are processed by NNTP servers. You can build special-purpose news readers with this module. |
| SocketServer: | The SocketServer module provides the relatively advanced programming required to create TCP/IP or UDP/IP server applications. This is typically the core of a stand-alone application server. |
| SimpleHTTPServer: | |
| CGIHTPPServer: | |
| BaseHTTPServer: | The SimpleHTTPServer and CGIHTTPServer modules rely on the basic BaseHTTPServer and SocketServer modules to create a web server. The SimpleHTTPServer module provides the programming to handle basic URL requests. The CGIHTTPServer module adds the capability for running CGI scripts; it does this with the fork() and exec() functions of the os module, which are not necessarily supported on all platforms. |
| asyncore: | |
| asynchat: | The asyncore (and asynchat) modules help to build a time-sharing application server. When client requests can be handled quickly by the server, complex multi-threading and multi-processing aren’t really necessary. Instead, this module simply dispatches each client communication to an appropriate handler function. |
22. Multimedia Services. This is beyond the scope of this book.
23. Internationalization. A well-written application avoids including messages as literal strings within the program text. Instead, all messages, prompts, labels, etc., are kept as a separate resource. These separate string resources can then be translated.
| locale: | The locale module fetches the current locale’s date, time, number and currency formatting rules. This provides functions which will format and parse dates, times, numbers and currency amounts. A user can change their locale with simple operating system settings, and your application can work consistently with all other programs. |
|---|
24. Program Frameworks. We’ll talk about a number of program-related issues in Programs: Standing Alone and Architecture: Clients, Servers, the Internet and the World Wide Web. Much of this goes beyond the standard Python library. Within the library are two modules that can help you create large, sophisticated command-line application programs.
| cmd: | The cmd module contains a superclass useful for building the main command-reading loop of an interactive program. The standard features include printing a prompt, reading commands, providing help and providing a command history buffer. A subclass is expected to provide functions with names of the form do_command(). When the user enters a line beginning with command, the appropriate do_command() function is called. |
|---|---|
| shlex: | The shlex module can be used to tokenize input in a simple language similar to the Linux shell languages. This module defines a basic shlex class with parsing methods that can separate words, quotes strings and comments, and return them to the requesting program. |
25. Graphical User Interfaces with Tk. This is beyond the scope of this book.
26. Development Tools. The testing tools are central to creating reliable, complete and correct software.
| doctest: | When a function or a class docstring includes a snippet of interactive Python, the doctest module can use this snippet to confirm that the function or class works as advertised. For example: def myFunction( a, b ):
""">>> myFunction( 2, 3 )
6
>>> myFunction( 5.0, 7.0 )
35.0
"""
return a * b
The >>> myFunction( 2, 3 ) lines are parsed by doctest. They are evaluated, and the actual result compared with the docstring comment. |
|---|---|
| unittest: | This is more sophisticated testing framework in which you create TestCases which define a fixture, an operation and expected results. |
| 2to3: | This module is used to convert Python 2 files to Python 3. Prior to using this, you should run your Python programs with the -3 option to identify any potential incompatibilities. Once you’ve fixed all of the incompatibilities, you can confidently convert your program to Python 3. Do not “tweak” the output from this conversion. If your converted program doesn’t work under Python 3, it’s almost always a problem with your original program playing fast and loose with Python rules. In the unlikely event that this module cannot convert your program, you should probably rewrite your program to eliminate the “features” that are causing problems. |
27. Debugging and Profiling. Debugging is an important skill, as is performance profiling. Much of this is beyond the scope of this book.
| timeit: | This is a handy module that lets you get timing information to compare performance of alternative implementations of an algorithm. |
|---|
28. Python Runtime Services. The Python Runtime Services modules are considered to support the Python runtime environment.
| sys: | The sys module contains execution context information. It has the command-line arguments (in sys.argv) used to start the Python interpreter. It has the standard input, output and error file definitions. It has functions for retrieving exception information. It defines the platform, byte order, module search path and other basic facts. This is typically used by a main program to get run-time environment information. |
|---|
Most of the remaining sections of the library, with one exception, are too advanced for this book.
34. Miscellaneous Services. This is a vague catch-all that only has one module.
| formatter: | The formatter module can be used in conjunction with the HTML and XML parsers. A formatter instance depends on a writer instance that produces the final (formatted) output. It can also be used on its own to format text in different ways. The HTML parser can produce a plain-text version of a web page. To do this, it uses the formatter module. |
|---|
Why are there multiple versions of some packages? Look at some places where there are two modules which clearly do the same or almost the same things. Examples include time and datetime, urllib and urllib2, pickle and cPickle, StringIO and cStringIO, subprocess and popen2, getopt and optparse.
Why allow this duplication? Why not pick a “best” module and discard the others?
Is it better to build an application around the library or simply design the application and ignore the library? Assuming that we have some clear, detailed requirements, what is the benefit of time spent searching through the library? What if most library modules are a near-miss? Should we alter our design to leverage the library, or just write the program without considering the library?
Which library modules are deprecated or disabled? Why are these still documented in the library?