Subject: Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2 From: "G. Ken Holman g.ken.holman@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 29 Apr 2016 19:44:21 -0000 |
I have my generated analyze-text approach working generally. However, some of my regular expressions are not matching when I would expect them to.
For example, given this @regex value:
regex="'([©®℠™]+)|([¦²³¹¼& #xbd;¾Ð×ÝÞðýþŠš∂ ∏∑−∫≠≤≥]+)|([➤]+)'" >
And this text:
"©®"
The regular expression does not match, even though the first group clearly matches on \uA9 and \uAE.
However, this text:
"ÝÞ"
does match (second group).
If I copy the entire regex or any group from the @regex value and try it in Oxygen against the same text I get the expected matches.
Have I made a stupid syntax mistake in my regular expression? Is there some subtlety to matching groups that makes XSLT different from what Oxygen is doing? I can't see any obvious syntax error in the regular expression.
Thanks,
Eliot
---- Eliot Kimber, Owner Contrext, LLC http://contrext.com
On 4/29/16, 11:54 AM, "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>Dimitre, > >I see how that can work. > >Cheers, > >E. >---- >Eliot Kimber, Owner >Contrext, LLC >http://contrext.com > > > > >On 4/29/16, 11:38 AM, "Dimitre Novatchev dnovatchev@xxxxxxxxx" ><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > >>I am at work and don't have the time for a complete/tested >>implementation, but one can use the function string-to-codepoints() >>and then perform on the result: >> >><xsl:for-each-group select="$theCodepoints" >>group-adjacent=f:codepointToRange(.)> >> >> . . . . . . . . >></xsl:for-each-group> >> >>Cheers, >>Dimitre >> >>On Fri, Apr 29, 2016 at 8:04 AM, Eliot Kimber ekimber@xxxxxxxxxxxx >><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: >>> Using XSLT 2, I have a requirement to take text and group contiguous >>> sequences of characters in markup according to a given character range >>>the >>> characters are in. This is to support the application of range-specific >>> fonts to text in HTML. >>> >>> I have a static definition of the character ranges for a given national >>> language and there shouldn't be any overlap between ranges. Given this >>> static definition, I'm generating XSLT code to operate on text nodes in >>> order to apply the range markup. The >>> >>> For example, given the text string "abcdefg" where range "R1" is "cde" >>>and >>> R2 is "g", the marked up result should be: abc<span >>> class="R1">cde</span>f<span class="R2">g</span> >>> >>> My initial approach is to generate a template that takes the current >>> language and the text node and then applies templates in a >>> language-specific mode. >>> >>> For each language I'm then generating a template to do the range >>>matching. >>> >>> My question, once I'm in a language-specific template for a text node, >>> what is the most efficient and/or easiest to code way to map the string >>>to >>> ranges? Since I'm generating the code it doesn't have to be concise. >>> >>> I'm thinking along the lines of using analyze-string to match on any of >>> the groups and then within the matching-substring clause have a choice >>> group to determine which range actually matched. But it feels like I'm >>> missing a more elegant way to determine the actual range. >>> >>> Or maybe there's a clearer/simpler/more efficient way using tail >>>recursion? >>> >>> Thanks, >>> >>> Eliot >>> ---- >>> Eliot Kimber, Owner >>> Contrext, LLC >>> http://contrext.com >>> >>> >> >> >> >>-- >>Cheers, >>Dimitre Novatchev >>--------------------------------------- >>Truly great madness cannot be achieved without significant intelligence. >>--------------------------------------- >>To invent, you need a good imagination and a pile of junk >>------------------------------------- >>Never fight an inanimate object >>------------------------------------- >>To avoid situations in which you might make mistakes may be the >>biggest mistake of all >>------------------------------------ >>Quality means doing it right when no one is looking. >>------------------------------------- >>You've achieved success in your field when you don't know whether what >>you're doing is work or play >>------------------------------------- >>To achieve the impossible dream, try going to sleep. >>------------------------------------- >>Facts do not cease to exist because they are ignored. >>------------------------------------- >>Typing monkeys will write all Shakespeare's works in 200yrs.Will they >>write all patents, too? :) >>------------------------------------- >>Sanity is madness put to good use. >>------------------------------------- >>I finally figured out the only reason to be alive is to enjoy it. >> >> > >
-- Check our site for free XML, XSLT, XSL-FO and UBL developer resources | Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK | Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/s/ | G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:gkholman@xxxxxxxxxxxxxxxxxxxx | Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts | Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |
--- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Implementation Advice: Gr, Martin Honnen martin | Thread | Re: [xsl] Implementation Advice: Gr, Eliot Kimber ekimber |
Re: [xsl] Implementation Advice: Gr, Martin Honnen martin | Date | Re: [xsl] Implementation Advice: Gr, Eliot Kimber ekimber |
Month |