RE: XSL vs. XSLT and processors vs. parsers

Subject: RE: XSL vs. XSLT and processors vs. parsers
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 20 Sep 2000 15:15:59 +0100
Hi all,

This is an interesting question, so I'm going to take a shot at it. Mike
Kay has already stated quite succinctly what "XSL vs. XSLT" is. To what he
says about processors vs. parsers, I'd like to add an interpretation
[longish]....

An "XML processor" is a software program that does something with XML
documents. These documents may be input to the program either as static
entities in the notation described in the XML Recommendation (that is,
files using tags following the XML definition of "well-formed"), or more
generally, as "documents" constructed through some other method (for
example, presented by some other application as a pre-built DOM tree, or
fired as a series of SAX events). Accordingly, the term "XML Processor" is
somewhat loose with respect to its input, and wide open with respect to its
operations or output.

An XSLT processor is a species of XML processor that can take as input,
both arbitrary XML documents, and stylesheets ("transformation
specifications") as described in the XSLT Recommendation, and perform
operations on them also as described in the Rec. Since XSLT is a
transformation language, this usually means turning one kind of XML
document into another kind of (structured) output.

An "XML parser" is a class of XML processor whose job it is to interpret
the *notation* described in the XML Rec, and present the information in a
document (the Rec describes what is a "document") in some way to a
processor. (The notation indicates what, in a document, is an element, and
element type name, an attribute, an attribute value, etc.) The XML Rec
describes this operation as occurring in two stages, the second of which is
optional: one, in which a well-formed document is simply presented, the
second, in which it is first validated against a formal model in DTD syntax
(also described in the XML Rec).

Since, in doing its job, the parser is compelled by the specification to
recognize some kinds of input as XML (conforming to the rules of the
notation), and others not, and further to recognize documents as being
valid to a given model (DTD), it is natural for it to create, as output,
error messages, instead of the hoped-for result, a "parsed document"
(whatever that may be, it depends on the parser). Accordingly, a parser may
be used to check whether something is (a) actually XML ("well-formed"), and
(b) valid to a given DTD (according to the XML definition of "valid").
Sometimes this error-reporting has been taken to be the main function of a
parser, which is not the case. But it is very useful in this role, even on
a stand-alone basis (that is, parsing is the only operation performed,
essentially as a test of the input document).

The XML Rec describes the syntax (i.e. the notations) both for documents
and for DTDs. But what it says about the data model implied by these
notations is not very formal or complete: much has been left up to
applications to determine. Consequently, there is quite a bit of play in
how any XML processor does its job, or even in the kind of information
that's presented by an XML parser (remember the parser's job is to go from
notation to data model), or the manner of its presentation (in-memory tree,
series of events, stack of punchcards). While this poses problems for
interoperability of tools, it may be a medium-term benefit, encouraging
experiments and allowing Darwin to do his thing. (After all, we always have
the notation to fall back on.)

XSLT processors are sometimes built so they may accept input in different
forms. However, since the XML notation is normative, but an XML data model
or "Infoset" is not, the usual case is for an XSLT processor to be wired to
a parser to accept XML files using that notation, as input. The parser
becomes a component of the processor. XSLT processors usually come with
their own parser, but many allow you to switch that parser out for some
other one (that provides the application with the same input).

Because of the nature of XSLT transformations, which operate on a tree-like
data model described in the XPath spec, the inputs and outputs take the
form of one or another kind of representation of a tree. Since these trees
often start and end life as documents in XML notation, we can think we're
changing tags in those documents, but we're not: what XSLT is doing goes
deeper than that, the tags being only a way of representing underlying data
structures. A whole class of misunderstandings about the way XSLT
stylesheets are best written stems from this misconception.

Perhaps the real savants on this list will weigh in if I've misstated
anything. Note that this take on it differs from what Andrew Watt said on
this list (an XML processor and XML parser are "one and the same"). I am
making a distinction between them, related to a distinction I am making
between software that does something with XML-the-notation (a processor
which must be or contain a parser), and software that does something with
XML-a-data-model (a processor which does not necessarily have a parser,
like an XSLT processor, though it may sit downstream from one).

Cheers,
Wendell

At 01:25 PM 9/20/00 +0100, you wrote:
>> I have recently become confused between the difference of 
>> XSLT and actual XSL.
>
>W3C uses XSL to mean the as-yet-unfinished family of standards of which XSLT
>is part.
>Microsoft uses XSL to mean the language they implemented in IE5, which is
>vaguely related to an early draft of XSLT.
>
>An XML processor (popularly called a parser, but called a processor in the
>XML Recommendation) reads a source XML file and identifies the syntactic
>units such as elements, attributes, and text content.
>
>An XSLT processor takes a stylesheet and applies it to the tree
>representation of a source XML document (produced by an XML parser), and
>generates a tree representation of an output XML document.  
>
>Mike Kay
>
>
> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>

======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread