Subject: RE: [xsl] How to parse text into words, phrases, clauses, sentences, and paragraphs From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Thu, 7 Jun 2007 15:20:22 +0100 |
> This is my first problem. How to apply a template match ysing > the tokenize() function. And which order to apply (from > paragraph -> word or word -> paragraph). It's generally easiest to do it top-down, I think. Something like this: <xsl:for-each select="tokenize(., $sentence-delimiter)"> <sentence id="{position()}"> <xsl:for-each select="tokenize(., $phrase-delimiter)"> <phrase id="{position()}"> <xsl:for-each select="tokenize(., $word-delimiter)"> <word id="{position()}"> <xsl:value-of select="."/> > > > (d) doing the output numbering. > I think you just need position() as shown above. Sometimes you need to work bottom-up if the "sentences" can't be recognized until you've identified the "words", for example if you want to avoid treating "." as ending a sentence if it appears in a number. You're then sometimes in the domain of positional grouping: create a long flat list of words, and then group it into sentences using some kind of test applied to the individual words. Michael Kay http://www.saxonica.com/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] How to parse text into wo, mark bordelon | Thread | RE: [xsl] How to parse text into wo, mark bordelon |
RE: [xsl] Using one nodeset to dict, Michael Kay | Date | Re: [xsl] Using one nodeset to dict, David Carlisle |
Month |