This is an implementation of a simple XML parser for Common Lisp. PLEASE NOTE: This now a project at CL.net as http://common-lisp.net/project/s-xml.
This XML parser implementation has the following features:
This XML parser implementation has the following limitations:
This implementation was written by Sven Van Caekenberghe (http://homepage.mac.com/svc) at the end of 2002, beginning of 2003.
You can download this software from my homepage at http://homepage.mac.com/svc where you will find the archive http://homepage.mac.com/svc/xml/xml.tgz. This code is also meant to be included as an example in the OpenMCL distribution.
All software and documentation is Copyright (C) 2002, 2003 Sven Van Caekenberghe. You are granted the rights to distribute and use this software as governed by the terms of the Lisp Lesser GNU Public License (see http://opensource.franz.com/preamble.html), also known as the LLGPL.
The implementation consists of the following files:
Basically, you load the files respecting their dependency order: dom depends on xml, sxml-dom, lxml-dom and xml-struct-dom depend on dom and xml. Two ASDF build files are included, as well as various test code.
This is open-source software. Have a look at the source code, it is reasonably clean, commented and meant to be read. The test directory contains some unit tests, one test file for each source file: just load them to run the tests. Reading the tests might help you understand how the parser was intended to be used. For the impatient, the XML Parser and DOM API can be found here.
Using a DOM parser is easier, but usually less efficient: see the next sections. To use the event-based API of the parser, you call the function start-parse-xml on a stream, specifying 3 hook functions:
As an example, consider the next function that will echo an XML input stream to an output stream:
(defun echo-xml (in out)
(start-parse-xml
in
(make-instance 'xml-parser-state
:new-element-hook #'(lambda (name attributes seed)
(declare (ignore seed))
(format out "<~a~:{ ~a='~a'~}>"
name
(mapcar #'(lambda (p) (list (car p) (cdr p)))
attributes)))
:finish-element-hook #'(lambda (name attributes parent-seed seed)
(declare (ignore attributes parent-seed seed))
(format out "</~a>" name))
:text-hook #'(lambda (string seed)
(declare (ignore seed))
(princ string out)))))
The seed parameters and return values are not used here. As a simplification, we just print text and attribute values, in real code we would have to use print-xml-string to properly escape special characters. Have a look at the implementations of the different DOM representations, as well as the XML-RPC code and the CLOS serialization code elsewhere for more real-world examples.
The parse state can be used to specify the initial seed value (nil by default), and the set of known entities (the 5 standard entities (lt, gt, amp, qout, apos) and nbps by default).
Using a DOM parser is easier, but usually less efficient. Currently three different DOM's are supported:
There is a generic API that is identical for each type of DOM, with an extra parameter input-type or output-type used to specify the type of DOM. The default DOM type is :lxml. Here are some examples:
? (in-package :xml)
#<Package "XML">
? (setf xml-string "<foo id='top'><bar>text</bar></foo>")
"<foo id='top'><bar>text</bar></foo>"
? (parse-xml-string xml-string)
((:|foo| :|id| "top") (:|bar| "text"))
? (parse-xml-string xml-string :output-type :sxml)
(:|foo| (:@ (:|id| "top")) (:|bar| "text"))
? (parse-xml-string xml-string :output-type :xml-struct)
#S(XML-ELEMENT :NAME :|foo| :ATTRIBUTES ((:|id| . "top"))
:CHILDREN (#S(XML-ELEMENT :NAME :|bar|
:ATTRIBUTES NIL
:CHILDREN ("text"))))
? (print-xml * :pretty t :input-type :xml-struct)
<foo id="top">
<bar>text</bar>
</foo>
NIL
? (print-xml '(p "Interesting stuff at " ((a href "http://slashdot.org") "SlashDot")))
<P>Interesting stuff at <A HREF="http://slashdot.org">SlashDot</A></P>
NIL
Tag and attribute names are converted to keywords. Note that XML is case-sensitive, hence the fact that Common Lisp has to resort to the special literal symbol syntax.
You can find a full, automatically generated, listing of the XML Parser and DOM API here.
$Id: readme.html,v 1.9 2004/01/13 10:33:17 sven Exp $