Navigation: Homepage |
xmlgawk |
Buchkritik |
Sitemap
Lightning Talk @ OSCON06: XGAWK
Whats all about
- Extensible GNU Awk, with an XML parsing extension
- essential: replace the line stream by an XML event stream
- events (like start-, or endelement) are defined by
global variables (XMLEVENT, XMLNAME, ...)
- nice oneliners for XML processing
Example code
# example for 'core' features
BEGIN { XMLMODE=1 } # switch to XML processing
XMLSTARTELEM {
printf("XMLSTARTELEM %s\t", XMLSTARTELEM)
for (i in XMLATTR)
printf(" %s='%s'", i, XMLATTR[i])
print ""
}
XMLENDELEM {
print "XMLENDELEM", XMLENDELEM
}
XMLCHARDATA {
print "XMLCHARDATA", "'" $0 "'"
}
#...
# books.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<books>
<book publisher="O'Reilly" on-loan="Sanjay">
<title>XML in a Nutshell</title>
<author>Elliotte Rusty Harold, W. Scott Means</author>
</book>
<book publisher="Addison-Wesley">
<title>The Mythical Man Month</title>
<author>Frederick Brooks</author>
</book>
<!-- more to come here -->
</books>
# example for using the xmllib.awk
xgawk -f xmllib.awk --source ' # or shorter: xmlgawk
EE == "title" { title = CDATA }
EE == "author" { author = CDATA }
EE == "book" && loaner = ATTR[PATH"@on-loan"] {
print loaner, "was loaned", title, "by", author
}' books2.xml
# same in XQuery
for $b in document("books2.xml")/books/book[@on-loan]
return (string($b/@on-loan),
" was loaned ", $b/title/text(),
" by ", $b/author/text())
Goodies
- can handle streams (multiple documents in one file)
- xgawk: fast and simple core implementation (expat based)
- a library with convenience functions (xmllib.awk) for
shorter scripts (xmlgawk)
- many XML encodings supported (code taken over from Perl ;-)
- dynamic loading of shared libs/extensions now possible
- GNU Awk @-commands now builtin
- a Postgres and MPFR extension
Implementation
Room for improvement
- documentation!
- XML namespace support
- DTD and Schema validation; still in discussion whether this
is seen as essential
- usage of many special variables in contrast to one array is
still discussed
Developers
- Jürgen Kahrs
- Andrew Schorr
- Stefan Tramm (me :-)
- and some others