Subject: Re: Embedding HTML in XML documents using HTML dtd From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx> Date: Mon, 09 Oct 2000 12:57:49 +0100 |
Mila, >I would like to enable HTML tags within my XML file - using the HTML >dtd. For example, if there is a list in the XML: ><list> ><UL> > <LI>List item 1</LI> > <LI>List item 2</LI> > </UL> ></list> > >What do I have to add to the XML, the DTD and the XSL to be able to >convert this to a list when I generate an HTML file using xalan? I would >like to use the HTML dtd to make this work. > >I am currently using the whole section starting at <UL> to </UL> within >a ![CDATA[ ]] element, and reading the output in xsl with > <xsl:for-each select="list"> ><xsl:value-of disable-output-escaping="yes" select="."/> ></xsl:for-each> >This works, but we would prefer the solution of using the HTML dtd, >except I am not sure how to implement that. Certainly the CDATA section solution is less than optimal! There are two approaches to this problem: either you can use what you know about your XML to say "the content of a 'list' element is HTML and should be copied directly" or you can explicitly put the UL and LI elements in the HTML namespace within the source XML, and then within your stylesheet say "all HTML elements should be copied". In either case, you need to know about the xsl:copy and xsl:copy-of elements. xsl:copy copies the current node, but none of its contents or attributes. xsl:copy-of copies a node set that you select, including all of its contents and any attributes or namespace nodes. The first is simpler but less extensible: when you find a 'list' element, you make a copy of its element content: <xsl:template match="list"> <xsl:copy-of select="*" /> </xsl:template> Given an input of: <list> <UL> <LI>List item 1</LI> <LI>List item 2</LI> </UL> </list> This will give: <UL> <LI>List item 1</LI> <LI>List item 2</LI> </UL> The problem is that you have to do something similar anywhere else where you have HTML elements within your XML elements and you want them copied. It might be that 'lists' are the only elements where HTML elements occur, in which case this is the easiest solution. The second solution is to use namespaces to explicitly say that the UL and LI elements are HTML elements. To do that, you associate a namespace prefix (a string that you can choose) to a namespace name (a string that you can choose, but that should probably be a URI pointing to a DTD, schema, or human-readable documentation about the elements you're using). For common XML dialects like HTML, there is usually a namespace name defined somewhere, and using that namespace name could enable you to use other people's stylesheets that also process elements in that namespace. In the case of XHTML, the namespace name is: http://www.w3.org/1999/xhtml You can associate the prefix 'html' with this namespace name using a namespace attribute: xmlns:html="http://www.w3.org/1999/xhtml" You don't have to use the prefix 'html' - you can use anything you want. This attribute should be put on an element that is an ancestor of the HTML elements (or is itself an HTML element). A namespace attribute makes a namespace 'in scope' (i.e. usable) for the element that it's on and all its descendents. Usually you'd put it on your document element (i.e. the top-most element). In your case, you could put it on the 'list' element: <list xmlns:html="http://www.w3.org/1999/xhtml"> ... </list> Within the 'list' element, any elements that are within the HTML namespace need to be given qualified names to indicate that fact. You do this by adding the prefix (i.e. 'html') and a colon before the name of the element, so: <list xmlns:html="http://www.w3.org/1999/xhtml"> <html:UL> <html:LI>List item 1</html:LI> <html:LI>List item 2</html:LI> </html:UL> </list> As a quick aside, XHTML defines that element names should be in lower case, so I'd make this: <list xmlns:html="http://www.w3.org/1999/xhtml"> <html:ul> <html:li>List item 1</html:li> <html:li>List item 2</html:li> </html:ul> </list> for compliance to that standard. In terms of the DTD for the source XML, DTDs and namespaces don't mix particularly well: you have to use the same qualified names within the DTD as you use within your XML, which means that the prefix is fixed within the DTD. [You could get around this using a parameter entity.] If you have to validate your source XML against a DTD, then the DTD should hold something like: <!ELEMENT list (html:ul)> <!ATTLIST list xmlns:html CDATA #FIXED 'http://www.w3.org/1999/xhtml'> <!ELEMENT html:ul (html:li+)> <!ELEMENT html:li (#PCDATA)> You may be able to draw on some of the XHTML modularisation work to import relevant parts of the HTML DTD, but they may not be using qualified names, I'm not sure. Within the XSLT stylesheet, you have to ensure that all the relevant namespaces are declared so that whenever you use a qualified name (like 'html:UL'), the namespace declaration for it is 'in scope'. This usually means putting the namespace attribute on the xsl:stylesheet document element: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:html="http://www.w3.org/1999/xhtml"> ... </xsl:stylesheet> Again, you don't have to use the 'html' prefix, but you *do* have to make sure that the namespace name (the http://www.w3.org/1999/xhtml URI) is the same in your source XML and your stylesheet. It's actually the namespace name (or URI) that is used to determine the namespace that an element is in, not its prefix. Within your stylesheet, then, you can now place the rule "copy all HTML elements". The following template matches any element in the source that's within the HTML namespace (whether it's within a 'list' or not): <xsl:template match="html:*"> <xsl:copy-of select="." /> </xsl:template> However, when you're producing HTML output, copying is a bad idea because while the XSLT processor will produce something that is technically correct XML, it will not be interpreted correctly by the vast majority of HTML browsers. The above, for example, produces: <html:ul xmlns:html="http://www.w3.org/1999/xhtml"> <html:li>List item 1</html:li> <html:li>List item 2</html:li> </html:ul> because it literally copies everything, including the namespace nodes. Instead, then, you should create by hand the relevant elements and attributes, giving them names corresponding to the local part of their name, without the namespace prefix: <xsl:template match="html:*"> <xsl:element name="{local-name()}"> <xsl:for-each select="@html:*"> <xsl:attribute name="{local-name()}"> <xsl:value-of select="." /> </xsl:attribute> </xsl:for-each> <xsl:apply-templates /> </xsl:element> </xsl:template> This has the added advantage that if you have any specialised XML embedded within your HTML elements, it will be treated as that specialised XML rather than simply copied without paying attention to what it is. So, to summarise: 1. declare the HTML namespace within your source document (namespace attribute on document element) 2. change the names of HTML elements within your source document to give them the relevant namespace prefix 3. add the namespace attribute to the DTD and change the names of the relevant elements to reflect the namespace prefix 4. declare the HTML namespace within your stylesheet (namespace attribute on xsl:stylesheet element) 5. use the above template to copy all HTML elements into your result I hope that this helps, Jeni Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Embedding HTML in XML documents usi, pmitra | Thread | Updated Benchmark Available (Again!, Kevin Jones |
question regarding ", michael.veeck | Date | RE: question regarding ", Kay Michael |
Month |