Subject: RE: [xsl] Ingoring HTML - A Solution From: Jay Burgess <lists@xxxxxxxxxxx> Date: Tue, 21 Jun 2005 08:01:38 -0700 |
I thought I'd post a solution to my request last week to remove "HTML tags" from a block of XML. There may be a better way to do this, but this seems to work in my case. Thanks for everyone's input. <xsl:template name="strip-HTML"> <xsl:param name="text"/> <xsl:choose> <xsl:when test="contains($text, '>')"> <xsl:choose> <xsl:when test="contains($text, '<')"> <xsl:value-of select="substring-before($text, '<')"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="substring-before($text, '>')"/> </xsl:otherwise> </xsl:choose> <xsl:call-template name="strip-HTML"> <xsl:with-param name="text" select="substring-after($text, '>')"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$text"/> </xsl:otherwise> </xsl:choose> </xsl:template> Jay | Jay Burgess [Vertical Technology Group] | "Essential Technology Links via RSS" | http://www.vtgroup.com/ > Re: [xsl] Ingoring HTML > Subject: Re: [xsl] Ingoring HTML > From: "Sam D. Chuparkoff" <sdc@xxxxxxxxxx> > Date: Fri, 17 Jun 2005 13:39:59 -0700 > > On the dangerous side, I'd try something like: > > perl -ne '$c.=$_;eof&&($c=~s/<(([^<>](?!<))*?)>//sg&print$c);' > foo.xml > > Because it will probably be fine. For extra danger points, you can put > it in a Makefile with no comment. > > You should be able to do something similar with xsl, but of course this > isn't very safe, and I think it would be a lot more complicated. > > s/<(([^<>](?!<))*?)>//sg; > > This is '<' some text '>' with no intervening '<', '<', or '>' > replaced with nothing. I thought about actually trying to turn this > content into xml, but note there's no close quote on that style > attribute! Watch out! > > sdc > > On Fri, 2005-06-17 at 15:13 -0500, Jon Gorman wrote: > On 6/17/05, Jay Burgess <lists@xxxxxxxxxxx> wrote: > > I apologize if this is in the FAQ, but I've searched and can't find it. (I'm > > kind of new to XSL, so I may just have not seen it.) > > This is a faq of sorts, but I had a little bit of a difficult time > finding an answer to it in Dave Pawson's FAQ as well. Of course, I > just did a quick glance. I'd recommend skimming the the CDATA section > as well. > > > > > I've got some XML that contains HTML-formatted text. For example: > > > > <title><SPAN style="font-size: 13pt; font-family: Verdana; >The > > <b>Text</b> That I Want</SPAN></title> > > > > "HTML-formatted text" is a little bit nonsensical. HTML itself says > that < is meant as a stand-in for <, so when you have it it's not a > tag. Since namespaces were rather slow to get off to start, we ended > up seeing people put so-called "HTML" in XML *cough* RSS *cough*. But > to any XML application, this is one big chunk of text. > > So, some possible advice: > > 1) if you can change the input format so that it uses namespaces and > actually embeds real XHTML into the documents you're creating, do so. > Or at least have it be an option. > > 2) If you can't do that, I'm sure you can find a more general solution > if you hunt through the archives. The essential solution will > probably be along the lines of looking for < and >s and throwing > any text in them out via some of the XPATH/XSLT string functions. > Might be much easier with XSLT 2.0 > > 3) It may be possible with a combination of d-o-e and doing multiple > transformations, regex scripting or other techniques to replace the > various < and > in certain elements but not others, then > reprocess that document through your final stylesheet. Of couse, this > makes it slightly dangerous. > > Dig through the archives there might be a more general solution > already done or someone else will be able to give you one instead of > just giving you some ranting. (I blame Friday afternoon and a slow > server for my current long-winded explanation why this type of > embedding is evil). > > Short answer, it's probably not difficult as long as it's relatively > straightforward. If the "html" inside the xml is complex at all or > you are using < in other places, you might have difficulty. > > Extremely simple if you can just have the input source use namespaces > and you're comfortable with how XSLT deals with namespaces. > > Jon Gorman
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] dynamic document() templa, Arian Hojat | Thread | [xsl] Re: [xml-dev] Indentation usi, David Carlisle |
RE: [xsl] dynamic document() templa, Michael Kay | Date | Re: [xsl] xsl:include href - relati, Hardy Merrill |
Month |