http://swpat.ffii.org/
Action against software patents
http://www.gnome.org/
Gnome2 Logo
http://www.w3.org/Status
W3C Logo
http://www.redhat.com/
Red Hat Logo
http://xmlsoft.org/
Made with Libxml2 Logo
The XML C parser and toolkit of Gnome
Validation & DTDs
Main Menu
index.html
Home
http://xmlsoft.org/wiki
Wiki
html/index.html
Reference Manual
intro.html
Introduction
FAQ.html
FAQ
docs.html
Developer Menu
bugs.html
Reporting bugs and getting help
help.html
How to help
downloads.html
Downloads
news.html
Releases
XMLinfo.html
XML
XSLT.html
XSLT
xmldtd.html
Validation & DTDs
encoding.html
Encodings support
catalog.html
Catalog support
namespaces.html
Namespaces
contribs.html
Contributions
examples/index.html
Code Examples
html/index.html
API Menu
guidelines.html
XML Guidelines
ChangeLog.html
Recent Changes
Related links
http://mail.gnome.org/archives/xml/
Mail archive
http://xmlsoft.org/XSLT/
XSLT libxslt
http://phd.cs.unibo.it/gdome2/
DOM gdome2
http://www.aleksey.com/xmlsec/
XML-DSig xmlsec
ftp://xmlsoft.org/
FTP
http://www.zlatkovic.com/projects/libxml/
Windows binaries
http://www.blastwave.org/packages.php/libxml2
Solaris binaries
http://www.explain.com.au/oss/libxml2xslt.html
MacOsX binaries
http://libxmlplusplus.sourceforge.net/
C++ bindings
http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4
PHP bindings
http://sourceforge.net/projects/libxml2-pas/
Pascal bindings
http://rubyforge.org/projects/xml-tools/
Ruby bindings
http://tclxml.sourceforge.net/
Tcl bindings
http://bugzilla.gnome.org/buglist.cgi?product=libxml2
Bug Tracker
Table of Content:
#General5
General overview
#definition
The definition
#Simple
Simple rules
#reference
How to reference a DTD from a document
#Declaring
Declaring elements
#Declaring1
Declaring attributes
#Some
Some examples
#validate
How to validate
#Other
Other resources
General overview
Well what is validation and what is a DTD ?
DTD is the acronym for Document Type Definition. This is a description of
the content for a family of XML files. This is part of the XML 1.0
specification, and allows one to describe and verify that a given document
instance conforms to the set of rules detailing its structure and content.
Validation is the process of checking a document against a DTD (more
generally against a set of construction rules).
The validation process and building DTDs are the two most difficult parts
of the XML life cycle. Briefly a DTD defines all the possible elements to be
found within your document, what is the formal shape of your document tree
(by defining the allowed content of an element; either text, a regular
expression for the allowed list of children, or mixed content i.e. both text
and children). The DTD also defines the valid attributes for all elements and
the types of those attributes.
The definition
The
http://www.w3.org/TR/REC-xml
W3C XML Recommendation
(
http://www.xml.com/axml/axml.html
Tim Bray's annotated version of
Rev1
):
http://www.w3.org/TR/REC-xml#elemdecls
Declaring
elements
http://www.w3.org/TR/REC-xml#attdecls
Declaring
attributes
(unfortunately) all this is inherited from the SGML world, the syntax is
ancient...
Simple rules
Writing DTDs can be done in many ways. The rules to build them if you need
something permanent or something which can evolve over time can be radically
different. Really complex DTDs like DocBook ones are flexible but quite
harder to design. I will just focus on DTDs for a formats with a fixed simple
structure. It is just a set of basic rules, and definitely not exhaustive nor
usable for complex DTD design.
How to reference a DTD from a document
:
Assuming the top element of the document is
spec
and the dtd
is placed in the file
mydtd
in the subdirectory
dtds
of the directory from where the document were loaded:
<!DOCTYPE spec SYSTEM "dtds/mydtd">
Notes:
The system string is actually an URI-Reference (as defined in
http://www.ietf.org/rfc/rfc2396.txt
RFC 2396
) so you can use a
full URL string indicating the location of your DTD on the Web. This is a
really good thing to do if you want others to validate your document.
It is also possible to associate a
PUBLIC
identifier (a
magic string) so that the DTD is looked up in catalogs on the client side
without having to locate it on the web.
A DTD contains a set of element and attribute declarations, but they
don't define what the root of the document should be. This is explicitly
told to the parser/validator as the first element of the
DOCTYPE
declaration.
Declaring elements
:
The following declares an element
spec
:
<!ELEMENT spec (front, body, back?)>
It also expresses that the spec element contains one
front
,
one
body
and one optional
back
children elements in
this order. The declaration of one element of the structure and its content
are done in a single declaration. Similarly the following declares
div1
elements:
<!ELEMENT div1 (head, (p | list | note)*, div2?)>
which means div1 contains one
head
then a series of optional
p
,
list
s and
note
s and then an
optional
div2
. And last but not least an element can contain
text:
<!ELEMENT b (#PCDATA)>
b
contains text or being of mixed content (text and elements
in no particular order):
<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
p
can contain text or
a
,
ul
,
b
,
i
or
em
elements in no particular
order.
Declaring attributes
:
Again the attributes declaration includes their content definition:
<!ATTLIST termdef name CDATA #IMPLIED>
means that the element
termdef
can have a
name
attribute containing text (
CDATA
) and which is optional
(
#IMPLIED
). The attribute value can also be defined within a
set:
<!ATTLIST list type (bullets|ordered|glossary)
"ordered">
means
list
element have a
type
attribute with 3
allowed values "bullets", "ordered" or "glossary" and which default to
"ordered" if the attribute is not explicitly specified.
The content type of an attribute can be text (
CDATA
),
anchor/reference/references
(
ID
/
IDREF
/
IDREFS
), entity(ies)
(
ENTITY
/
ENTITIES
) or name(s)
(
NMTOKEN
/
NMTOKENS
). The following defines that a
chapter
element can have an optional
id
attribute
of type
ID
, usable for reference from attribute of type
IDREF:
<!ATTLIST chapter id ID #IMPLIED>
The last value of an attribute definition can be
#REQUIRED
meaning that the attribute has to be given,
#IMPLIED
meaning that it is optional, or the default value (possibly prefixed by
#FIXED
if it is the only allowed).
Notes:
Usually the attributes pertaining to a given element are declared in a
single expression, but it is just a convention adopted by a lot of DTD
writers:
<!ATTLIST termdef
id      ID      #REQUIRED
name    CDATA   #IMPLIED>
The previous construct defines both
id
and
name
attributes for the element
termdef
.
Some examples
The directory
test/valid/dtds/
in the libxml2 distribution
contains some complex DTD examples. The example in the file
test/valid/dia.xml
shows an XML file where the simple DTD is
directly included within the document.
How to validate
The simplest way is to use the xmllint program included with libxml. The
--valid
option turns-on validation of the files given as input.
For example the following validates a copy of the first revision of the XML
1.0 specification:
xmllint --valid --noout test/valid/REC-xml-19980210.xml
the -- noout is used to disable output of the resulting tree.
The
--dtdvalid dtd
allows validation of the document(s)
against a given DTD.
Libxml2 exports an API to handle DTDs and validation, check the
http://xmlsoft.org/html/libxml-valid.html
associated
description
.
Other resources
DTDs are as old as SGML. So there may be a number of examples on-line, I
will just list one for now, others pointers welcome:
http://www.xml101.com:8081/dtd/
XML-101 DTD
I suggest looking at the examples found under test/valid/dtd and any of
the large number of books available on XML. The dia example in test/valid
should be both simple and complete enough to allow you to build your own.
bugs.html
Daniel Veillard
