XML/Ada

a full XML suite

Release

The latest version of XML/Ada can now be downloaded directly from the GNAT GPL download area, in the tools/xmlada section.

This release of XML/Ada includes support for parsing XML files, including DTDs, as well as a full support for SAX, and an almost complete support for the core part of the DOM. It also includes support for validating your XML files with XML schemas.

What is XML?

This page doesn’t intend to be a full explanation on what XML is. There are tons of such documents on the Web, especially on the official web page.

Basically, XML is the Extensible Markup Language. This is a format used to organize text files into a set of tags and some associated value. See the following small example

<?xml version="1.0"?>
<!DOCTYPE mydtd  [
   <!ENTITY foo "foo_replacement">
   <!ELEMENT root (elem1 | elem2)*>
   <!ELEMENT elem1 (#PCDATA)>
   <!ELEMENT elem2 (#PCDATA|elem1)>
 ]>

<root>
  <elem1 attr="foo">any text</elem1>
  <elem1/>
  <elem2>text&foo;</elem2>
  <elem2><elem1>text</elem1></elem2>
</root>

				

The above valid XML file can be split into three parts:

  • The first line is a processing instruction that indicates that this is an XML file, and that also specifies which version of the XML standard it follows
  • The DOCTYPE section below, which is optional, specifies the grammar used to organize the data in the file. It defines the three valid tags that can be used in the document (root, elem1, and elem2), as well as their content and possible children. It also defines an entity foo, which basically is a simpler replacement for the associated string. This section is also known as the DTD (Document Type Definition).
  • The last part is the data itself. It is fully included between the start of the root tag and its end. It includes an empty Element on the second line (ie an element with no contents). It also references the foo entity that was defined in the DTD.

What is XML/Ada?

XML/Ada is a set of modules that provide a simple manipulation of XML streams. It supports the whole XML 1.1 specifications, and can parse any file that follows this standard (including the contents of the DTD, although no validation of the document is done based on those).

It also provides support for a number of other standard associated with XML, like SAX, DOM and XML schemas.

In addition, it includes a module to manipulate Unicode streams, since this is required by the XML standard.

Implementation rationale

This toolkit is made of several modules, that each try to address some part of the XML associated standard. Each of this module works in full cooperation with the others, so that XML/Ada can be used a full XML toolkit. The list of modules currently supported are:

Unicode module

Unicode is a standard encoding for all characters used in the world (European, Asian, Arabic, math,…). In its 3.1 revision, there are more than 92000 characters encoded.

Some standard encoding schemes are defined in the standard: Utf8, Utf16 and Utf32. Our unicode module provides a set of functions to convert from one encoding scheme to another, including to and from standard Ada strings.

This module also provides various functions to convert between characters sets, including Unicode itself, Latin1, Latin2, Latin3, Latin4 and Latin5. Other character sets can easily be added.

A number of automatically generated packages provide standard naming for characters, similar to what is defined in the Ada standard for the Ada.Characters.Latin1 package. This packages currently name about 10000 characters among the most currently used ones.

A SAX 2.0 module

SAX, the Simple API for XML, is a set of callbacks that are automatically called by the XML parser. You can define your own callbacks to be called when special events are seen in the XML file, like start tags, end tags, characters, DTD definitions,…

Although this is not an official standard, this is an interface that is often implement by XML parsers since it provides a very efficient way to parse XML files, and avoid the need to store whole trees in memory. Depending on your application, this might be the best way to process the file.

This Ada library includes full support for SAX 2.0, including support for the Sax extensions that report events like comments, element declaration in the DTD,…

A DOM 2.0 module

DOM is the Document Object Model. This is a set of standard subprograms to create and manipulate an XML tree.

A tree is the most natural representation for an XML stream. Each element can have several children, textual data,… DOM is the standard interface to this tree.

This Ada library creates the DOM tree through a set of callbacks in the SAX parser described above.

A XML schema module

XML schemas are a way to describe how a valid XML document is made, ie how nodes can be nested, what attributes they allow,…

Using the XML schema module, XML/Ada will take care of veryfing your documents, so that your code can assume the XML tree contains only valid data, and doesn’t need to be filled with extra tests

Online Documentation

Related Web Sites

The following are some useful related web sites:

  • Libre software for Ada
    This page provides several packages and libraries that can be used with the Ada language, including GtkAda, a graphical toolkit, and GVD, a graphical debugger.
  • The official W3 web site.
    It contains a huge number of XML-related standard. It would be nice to implement them all, but this takes a lot of time. Contributions are more than welcome.