http://swpat.ffii.org/
Action against software patents
http://www.gnome.org/
Gnome2 Logo
http://www.w3.org/Status
W3C Logo
http://www.redhat.com/
Red Hat Logo
http://xmlsoft.org/
Made with Libxml2 Logo
The XML C parser and toolkit of Gnome
A real example
Developer Menu
index.html
Main Menu
html/index.html
Reference Manual
examples/index.html
Code Examples
guidelines.html
XML Guidelines
tutorial/index.html
Tutorial
xmlreader.html
The Reader Interface
ChangeLog.html
ChangeLog
XSLT.html
XSLT
python.html
Python and bindings
architecture.html
libxml2 architecture
tree.html
The tree output
interface.html
The SAX interface
xmlmem.html
Memory Management
xmlio.html
I/O Interfaces
library.html
The parser interfaces
entities.html
Entities or no entities
namespaces.html
Namespaces
upgrade.html
Upgrading 1.x code
threads.html
Thread safety
DOM.html
DOM Principles
example.html
A real example
xml.html
flat page
,
site.xsl
stylesheet
API Indexes
APIchunk0.html
Alphabetic
APIconstructors.html
Constructors
APIfunctions.html
Functions/Types
APIfiles.html
Modules
APIsymbols.html
Symbols
Related links
http://mail.gnome.org/archives/xml/
Mail archive
http://xmlsoft.org/XSLT/
XSLT libxslt
http://phd.cs.unibo.it/gdome2/
DOM gdome2
http://www.aleksey.com/xmlsec/
XML-DSig xmlsec
ftp://xmlsoft.org/
FTP
http://www.zlatkovic.com/projects/libxml/
Windows binaries
http://www.blastwave.org/packages.php/libxml2
Solaris binaries
http://www.explain.com.au/oss/libxml2xslt.html
MacOsX binaries
http://libxmlplusplus.sourceforge.net/
C++ bindings
http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4
PHP bindings
http://sourceforge.net/projects/libxml2-pas/
Pascal bindings
http://rubyforge.org/projects/xml-tools/
Ruby bindings
http://tclxml.sourceforge.net/
Tcl bindings
http://bugzilla.gnome.org/buglist.cgi?product=libxml2
Bug Tracker
Here is a real size example, where the actual content of the application
data is not kept in the DOM tree but uses internal structures. It is based on
a proposal to keep a database of jobs related to Gnome, with an XML based
storage structure. Here is an
gjobs.xml
XML encoded jobs
base
:
<?xml version="1.0"?>
<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location">
<gjob:Jobs>
<gjob:Job>
<gjob:Project ID="3"/>
<gjob:Application>GBackup</gjob:Application>
<gjob:Category>Development</gjob:Category>
<gjob:Update>
<gjob:Status>Open</gjob:Status>
<gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST</gjob:Modified>
<gjob:Salary>USD 0.00</gjob:Salary>
</gjob:Update>
<gjob:Developers>
<gjob:Developer>
</gjob:Developer>
</gjob:Developers>
<gjob:Contact>
<gjob:Person>Nathan Clemons</gjob:Person>
<gjob:Email>nathan@windsofstorm.net</gjob:Email>
<gjob:Company>
</gjob:Company>
<gjob:Organisation>
</gjob:Organisation>
<gjob:Webpage>
</gjob:Webpage>
<gjob:Snailmail>
</gjob:Snailmail>
<gjob:Phone>
</gjob:Phone>
</gjob:Contact>
<gjob:Requirements>
The program should be released as free software, under the GPL.
</gjob:Requirements>
<gjob:Skills>
</gjob:Skills>
<gjob:Details>
A GNOME based system that will allow a superuser to configure
compressed and uncompressed files and/or file systems to be backed
up with a supported media in the system.  This should be able to
perform via find commands generating a list of files that are passed
to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
or via operations performed on the filesystem itself. Email
notification and GUI status display very important.
</gjob:Details>
</gjob:Job>
</gjob:Jobs>
</gjob:Helping>
While loading the XML file into an internal DOM tree is a matter of
calling only a couple of functions, browsing the tree to gather the data and
generate the internal structures is harder, and more error prone.
The suggested principle is to be tolerant with respect to the input
structure. For example, the ordering of the attributes is not significant,
the XML specification is clear about it. It's also usually a good idea not to
depend on the order of the children of a given node, unless it really makes
things harder. Here is some code to parse the information for a person:
/*
* A person record
*/
typedef struct person {
char *name;
char *email;
char *company;
char *organisation;
char *smail;
char *webPage;
char *phone;
} person, *personPtr;
/*
* And the code needed to parse it
*/
personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
personPtr ret = NULL;
DEBUG("parsePerson\n");
/*
* allocate the struct
*/
ret = (personPtr) malloc(sizeof(person));
if (ret == NULL) {
fprintf(stderr,"out of memory\n");
return(NULL);
}
memset(ret, 0, sizeof(person));
/* We don't care what the top level element name is */
cur = cur->xmlChildrenNode;
while (cur != NULL) {
if ((!strcmp(cur->name, "Person")) && (cur->ns == ns))
ret->name = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
if ((!strcmp(cur->name, "Email")) && (cur->ns == ns))
ret->email = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
cur = cur->next;
}
return(ret);
}
Here are a couple of things to notice:
Usually a recursive parsing style is the more convenient one: XML data
is by nature subject to repetitive constructs and usually exhibits highly
structured patterns.
The two arguments of type
xmlDocPtr
and
xmlNsPtr
,
i.e. the pointer to the global XML document and the namespace reserved to
the application. Document wide information are needed for example to
decode entities and it's a good coding practice to define a namespace for
your application set of data and test that the element and attributes
you're analyzing actually pertains to your application space. This is
done by a simple equality test (cur->ns == ns).
To retrieve text and attributes value, you can use the function
xmlNodeListGetString
to gather all the text and entity reference
nodes generated by the DOM output and produce an single text string.
Here is another piece of code used to parse another level of the
structure:
#include <libxml/tree.h>
/*
* a Description for a Job
*/
typedef struct job {
char *projectID;
char *application;
char *category;
personPtr contact;
int nbDevelopers;
personPtr developers[100]; /* using dynamic alloc is left as an exercise */
} job, *jobPtr;
/*
* And the code needed to parse it
*/
jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
jobPtr ret = NULL;
DEBUG("parseJob\n");
/*
* allocate the struct
*/
ret = (jobPtr) malloc(sizeof(job));
if (ret == NULL) {
fprintf(stderr,"out of memory\n");
return(NULL);
}
memset(ret, 0, sizeof(job));
/* We don't care what the top level element name is */
cur = cur->xmlChildrenNode;
while (cur != NULL) {
if ((!strcmp(cur->name, "Project")) && (cur->ns == ns)) {
ret->projectID = xmlGetProp(cur, "ID");
if (ret->projectID == NULL) {
fprintf(stderr, "Project has no ID\n");
}
}
if ((!strcmp(cur->name, "Application")) && (cur->ns == ns))
ret->application = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
if ((!strcmp(cur->name, "Category")) && (cur->ns == ns))
ret->category = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
if ((!strcmp(cur->name, "Contact")) && (cur->ns == ns))
ret->contact = parsePerson(doc, ns, cur);
cur = cur->next;
}
return(ret);
}
Once you are used to it, writing this kind of code is quite simple, but
boring. Ultimately, it could be possible to write stubbers taking either C
data structure definitions, a set of XML examples or an XML DTD and produce
the code needed to import and export the content between C data and XML
storage. This is left as an exercise to the reader :-)
Feel free to use
example/gjobread.c
the code for the full C
parsing example
as a template, it is also available with Makefile in the
Gnome CVS base under gnome-xml/example
bugs.html
Daniel Veillard
