Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft) From: David Carlisle <davidc@xxxxxxxxx> Date: Sat, 12 Jan 2002 17:04:21 GMT |
Jeni, > > \para{\italic{this} is \bold{bold \italic{and italic}} text.} Ohh looks just like TeX, we'll get you using that yet... I can think of two ways of attacking the above with regexp. * Plan A (which is the way I'd do it in emacs) is to have a regexp replace \(\\[a-z]*\){\([^{}*]\)} to <\1>\2</\1> This matches innermost groups first, they don't have any nested {} so you can easily find the matching }. As the replace also removes the {} you just need a loop which terminates once the regexp no longer matches, so the replacements go \para{\italic{this} is \bold{bold \italic{and italic}} text.} \para{<italic>this</italic> is \bold{bold <italic>and italic</italic>} text.} \para{<italic>this</italic> is <bold>bold <italic>and italic</italic></bold> text.} <para><italic>this</italic> is <bold>bold <italic>and italic</italic></bold> text.</para> (generated the above using emacs:-) That's fine but requires that either you consider the XML markup just to be part of the string (which is what I did here but what we want to avoid in XSLT) or that your regexps can match across mixed content models ie instead of [^{}]* meaning any character other than a brace you'd need something that says any character-or-node other than a brace. The alternative to Plan A is of course: Plan 2: work from the outside in: (This is the way I'd do it in omnimark) Basically the plan here is not to try to match a whole matching brace clause but just to match each start and end in turn, maintaining a counter that increments on { and decrements on } so you know what matches with what. It's a bit hard to fit that counter model into the XSLT world view but there is a variant, plan 2': I suspect that one way to attack this in xslt2 is just to have two simple regexp replaces \\\([a-z]*\){ -> <start name="\1"/> } -> <end/> so after doing the regexp matching I'd have: <start name="para"/><start name="italic"/>this<end/> is <start name="bold"/>bold <start name="italic"/>and italic<end/><end/> text.<end/> so now we've got rid of that flat string and replaced it by something that's still flat but is mixed content with empty element nodes and text. Getting from that flat mixed content to a hierarchical element tree is just the famous xslt grouping problem which a typical Gumbie Cat ought to be able to do in her sleep, especially if given the xslt2 grouping constructs. So while I'm tempted to see if plan A can be made to work as the the two stage plan 2' doesn't seem so clean in some ways. I suspect that integrating plan 2' would be much simpler, as you wouldn't have to extend regexp searching to search mixed content, just extend regexp replace so it can generate mixed content. David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Scanning Service. For further information visit http://www.star.net.uk/stats.asp or alternatively call Star Internet for details on the Virus Scanning Service. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Regular expression functions (W, Jeni Tennison | Thread | Re: Regular expression functions (W, Jeni Tennison |
RE: [xsl] using Xpath predicate to , Kevin Jones | Date | [xsl] A question about the expressi, Dimitre Novatchev |
Month |