http://swpat.ffii.org/
Action against software patents
http://www.gnome.org/
Gnome2 Logo
http://www.w3.org/Status
W3C Logo
http://www.redhat.com/
Red Hat Logo
http://xmlsoft.org/
Made with Libxml2 Logo
The XML C parser and toolkit of Gnome
Entities or no entities
Developer Menu
index.html
Main Menu
html/index.html
Reference Manual
examples/index.html
Code Examples
guidelines.html
XML Guidelines
tutorial/index.html
Tutorial
xmlreader.html
The Reader Interface
ChangeLog.html
ChangeLog
XSLT.html
XSLT
python.html
Python and bindings
architecture.html
libxml2 architecture
tree.html
The tree output
interface.html
The SAX interface
xmlmem.html
Memory Management
xmlio.html
I/O Interfaces
library.html
The parser interfaces
entities.html
Entities or no entities
namespaces.html
Namespaces
upgrade.html
Upgrading 1.x code
threads.html
Thread safety
DOM.html
DOM Principles
example.html
A real example
xml.html
flat page
,
site.xsl
stylesheet
API Indexes
APIchunk0.html
Alphabetic
APIconstructors.html
Constructors
APIfunctions.html
Functions/Types
APIfiles.html
Modules
APIsymbols.html
Symbols
Related links
http://mail.gnome.org/archives/xml/
Mail archive
http://xmlsoft.org/XSLT/
XSLT libxslt
http://phd.cs.unibo.it/gdome2/
DOM gdome2
http://www.aleksey.com/xmlsec/
XML-DSig xmlsec
ftp://xmlsoft.org/
FTP
http://www.zlatkovic.com/projects/libxml/
Windows binaries
http://www.blastwave.org/packages.php/libxml2
Solaris binaries
http://www.explain.com.au/oss/libxml2xslt.html
MacOsX binaries
http://libxmlplusplus.sourceforge.net/
C++ bindings
http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4
PHP bindings
http://sourceforge.net/projects/libxml2-pas/
Pascal bindings
http://rubyforge.org/projects/xml-tools/
Ruby bindings
http://tclxml.sourceforge.net/
Tcl bindings
http://bugzilla.gnome.org/buglist.cgi?product=libxml2
Bug Tracker
Entities in principle are similar to simple C macros. An entity defines an
abbreviation for a given string that you can reuse many times throughout the
content of your document. Entities are especially useful when a given string
may occur frequently within a document, or to confine the change needed to a
document to a restricted area in the internal subset of the document (at the
beginning). Example:
1 <?xml version="1.0"?>
2 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [
3 <!ENTITY xml "Extensible Markup Language">
4 ]>
5 <EXAMPLE>
6    &xml;
7 </EXAMPLE>
Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing
its name with '&' and following it by ';' without any spaces added. There
are 5 predefined entities in libxml2 allowing you to escape characters with
predefined meaning in some parts of the xml document content:
&lt;
for the character '<',
&gt;
for the character '>',
&apos;
for the character ''',
&quot;
for the character '"', and
&amp;
for the character '&'.
One of the problems related to entities is that you may want the parser to
substitute an entity's content so that you can see the replacement text in
your application. Or you may prefer to keep entity references as such in the
content to be able to save the document back without losing this usually
precious information (if the user went through the pain of explicitly
defining entities, he may have a a rather negative attitude if you blindly
substitute them as saving time). The
html/libxml-parser.html#xmlSubstituteEntitiesDefault
xmlSubstituteEntitiesDefault()
function allows you to check and change the behaviour, which is to not
substitute entities by default.
Here is the DOM tree built by libxml2 for the previous document in the
default case:
/gnome/src/gnome-xml -> ./xmllint --debug test/ent1
DOCUMENT
version=1.0
ELEMENT EXAMPLE
TEXT
content=
ENTITY_REF
INTERNAL_GENERAL_ENTITY xml
content=Extensible Markup Language
TEXT
content=
And here is the result when substituting entities:
/gnome/src/gnome-xml -> ./tester --debug --noent test/ent1
DOCUMENT
version=1.0
ELEMENT EXAMPLE
TEXT
content=     Extensible Markup Language
So, entities or no entities? Basically, it depends on your use case. I
suggest that you keep the non-substituting default behaviour and avoid using
entities in your XML document or data if you are not willing to handle the
entity references elements in the DOM tree.
Note that at save time libxml2 enforces the conversion of the predefined
entities where necessary to prevent well-formedness problems, and will also
transparently replace those with chars (i.e. it will not generate entity
reference elements in the DOM tree or call the reference() SAX callback when
finding them in the input).
WARNING
: handling entities
on top of the libxml2 SAX interface is difficult!!! If you plan to use
non-predefined entities in your documents, then the learning curve to handle
then using the SAX API may be long. If you plan to use complex documents, I
strongly suggest you consider using the DOM interface instead and let libxml
deal with the complexity rather than trying to do it yourself.
bugs.html
Daniel Veillard
