Subject: [xsl] faster complicated counting From: Syd Bauman <Syd_Bauman@xxxxxxxxx> Date: Wed, 29 Feb 2012 10:31:28 -0500 |
I am working with a relatively small dataset (~ 1 MiB) which uses a TEI encoding. In TEI, a line of verse is encoded with an <l> element (of which I have just about 306,000), which are grouped into groups (like poems or stanzas) using <lg> (for "line group"). In the output of the particular process I am working on now, I'd like to adorn each <l> element with three new attributes that indicate the count of the current <l> element in various contexts: wwp:num-global = with respect to the entire document wwp:num-local = with respect to the current stanza or other small unit of poetry wwp:num-regional = with respect to the current poem or other large unit of poetry So, as a toy example, see tiny.in.xml and tiny.out.xml, below. I have worked out code that gets me the desired counts. My problem is that all the tree-walking it does slows down my process by well over an order of magnitude. I am betting there is a much better way to do this, probably using keys or <xsl:number>, but have not been able to wrap my mind around it. The English-like pseudo-code for @num-local is "the count in the context of the closest ancestor <lg> that itself has > 4 metrical lines". The English-like pseudo-code for @num-regional is "the count in the context of the closest ancestor <lg> that has a @type that contains "poem" or whose first descendant <l> has n='1'". Here's what I have (note that we are only counting those <l> elements that have an @part of 'I' or do not have a @part attribute at all): <xsl:attribute name="wwp:num-global"> <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/> </xsl:attribute> <xsl:attribute name="wwp:num-regional"> <xsl:variable name="region" select="(ancestor::lg[contains( @type,'poem') ]|ancestor::lg[ descendant::l[ @n eq '1'] ])[last()]"/> <xsl:value-of select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id() = $region/generate-id() ] ) +1"/> </xsl:attribute> <xsl:attribute name="wwp:num-local"> <xsl:variable name="region" select="ancestor::lg[count( descendant::l[not(@part) or @part='I'] ) > 4 ][1]"/> <xsl:value-of select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id() = $region/generate-id() ] ) +1"/> </xsl:attribute> Thoughts appreciated. Notes ----- * Yes, I realize that the test above is for *any* descendant <l> with n='1', not the first. We simply don't have any that aren't the first, so I didn't worry about it. * It's pretty likely we'll change the definition of what is "regional" in the near future, but it probably won't affect the basic problem I'm having. I.e., I'm hoping that if someone shows me how to do this "regional" better, I'll be able to do any future version on my own. Cross your fingers :-) toy input --- ----- <?xml version="1.0" encoding="UTF-8" standalone="no"?> <TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0"> <teiHeader> <!-- blah, blah, blah --> </teiHeader> <text> <body> <lg type="superStructure"> <lg type="poem.duck"> <l>one</l> <l>two</l> <l>three</l> <l>four</l> <l>five</l> <l>six</l> <l>seven</l> <l>eight</l> <l>nine</l> <l>ten</l> </lg> <lg type="poem.duck"> <l>one</l> <l>two</l> <l>three</l> <l>four</l> <lg type="tercet"> <l>five</l> <l>six</l> <l>seven</l> </lg> <l>eight</l> <l>nine</l> <l>ten</l> </lg> <lg type="poem.duck"> <lg type="stanza"> <l>one</l> <l>two</l> <l>three</l> <l>four</l> <l>five</l> <l>six</l> <l>seven</l> <l>eight</l> </lg> <lg type="stanza"> <l>nine</l> <l>ten</l> <l>eleven</l> <l>twelve</l> <l>thirteen</l> <l>fourteen</l> <l>fifteen</l> <l>sixteen</l> </lg> <lg type="stanza"> <l>seventeen</l> <l>eighteen</l> <l>nineteen</l> <l>twenty</l> <l>twentyone</l> <l>twentytwo</l> <l>twentythree</l> <l>twentyfour</l> </lg> </lg> </lg> </body> </text> </TEI> toy code --- ---- <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0" xmlns="http://www.tei-c.org/ns/1.0" xpath-default-namespace="http://www.tei-c.org/ns/1.0" version="2.0"> <xsl:template match="/"> <xsl:text>
</xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="@*|text()|processing-instruction()|comment()"> <xsl:copy/> </xsl:template> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="l"> <xsl:copy> <xsl:attribute name="wwp:num-global"> <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/> </xsl:attribute> <xsl:attribute name="wwp:num-regional"> <xsl:variable name="region" select="(ancestor::lg[ contains( @type,'poem') ]|ancestor::lg[ descendant::l[ @n eq '1'] ])[last()]"/> <xsl:value-of select="count( (preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id() = $region/generate-id() ] ) +1" /> </xsl:attribute> <xsl:attribute name="wwp:num-local"> <xsl:variable name="region" select="ancestor::lg[count( descendant::l[not(@part) or @part='I'] ) > 4 ][1]"/> <xsl:value-of select="count( (preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id() = $region/generate-id() ] ) +1" /> </xsl:attribute> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> toy output --- ------ <?xml version="1.0" encoding="UTF-8"?> <TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0"> <teiHeader> <!-- blah, blah, blah --> </teiHeader> <text> <body> <lg type="superStructure"> <lg type="poem.duck"> <l wwp:num-global="1" wwp:num-regional="1" wwp:num-local="1">one</l> <l wwp:num-global="2" wwp:num-regional="2" wwp:num-local="2">two</l> <l wwp:num-global="3" wwp:num-regional="3" wwp:num-local="3">three</l> <l wwp:num-global="4" wwp:num-regional="4" wwp:num-local="4">four</l> <l wwp:num-global="5" wwp:num-regional="5" wwp:num-local="5">five</l> <l wwp:num-global="6" wwp:num-regional="6" wwp:num-local="6">six</l> <l wwp:num-global="7" wwp:num-regional="7" wwp:num-local="7">seven</l> <l wwp:num-global="8" wwp:num-regional="8" wwp:num-local="8">eight</l> <l wwp:num-global="9" wwp:num-regional="9" wwp:num-local="9">nine</l> <l wwp:num-global="10" wwp:num-regional="10" wwp:num-local="10">ten</l> </lg> <lg type="poem.duck"> <l wwp:num-global="11" wwp:num-regional="1" wwp:num-local="1">one</l> <l wwp:num-global="12" wwp:num-regional="2" wwp:num-local="2">two</l> <l wwp:num-global="13" wwp:num-regional="3" wwp:num-local="3">three</l> <l wwp:num-global="14" wwp:num-regional="4" wwp:num-local="4">four</l> <lg type="tercet"> <l wwp:num-global="15" wwp:num-regional="5" wwp:num-local="5">five</l> <l wwp:num-global="16" wwp:num-regional="6" wwp:num-local="6">six</l> <l wwp:num-global="17" wwp:num-regional="7" wwp:num-local="7">seven</l> </lg> <l wwp:num-global="18" wwp:num-regional="8" wwp:num-local="8">eight</l> <l wwp:num-global="19" wwp:num-regional="9" wwp:num-local="9">nine</l> <l wwp:num-global="20" wwp:num-regional="10" wwp:num-local="10">ten</l> </lg> <lg type="poem.duck"> <lg type="stanza"> <l wwp:num-global="21" wwp:num-regional="1" wwp:num-local="1">one</l> <l wwp:num-global="22" wwp:num-regional="2" wwp:num-local="2">two</l> <l wwp:num-global="23" wwp:num-regional="3" wwp:num-local="3">three</l> <l wwp:num-global="24" wwp:num-regional="4" wwp:num-local="4">four</l> <l wwp:num-global="25" wwp:num-regional="5" wwp:num-local="5">five</l> <l wwp:num-global="26" wwp:num-regional="6" wwp:num-local="6">six</l> <l wwp:num-global="27" wwp:num-regional="7" wwp:num-local="7">seven</l> <l wwp:num-global="28" wwp:num-regional="8" wwp:num-local="8">eight</l> </lg> <lg type="stanza"> <l wwp:num-global="29" wwp:num-regional="9" wwp:num-local="1">nine</l> <l wwp:num-global="30" wwp:num-regional="10" wwp:num-local="2">ten</l> <l wwp:num-global="31" wwp:num-regional="11" wwp:num-local="3">eleven</l> <l wwp:num-global="32" wwp:num-regional="12" wwp:num-local="4">twelve</l> <l wwp:num-global="33" wwp:num-regional="13" wwp:num-local="5">thirteen</l> <l wwp:num-global="34" wwp:num-regional="14" wwp:num-local="6">fourteen</l> <l wwp:num-global="35" wwp:num-regional="15" wwp:num-local="7">fifteen</l> <l wwp:num-global="36" wwp:num-regional="16" wwp:num-local="8">sixteen</l> </lg> <lg type="stanza"> <l wwp:num-global="37" wwp:num-regional="17" wwp:num-local="1">seventeen</l> <l wwp:num-global="38" wwp:num-regional="18" wwp:num-local="2">eighteen</l> <l wwp:num-global="39" wwp:num-regional="19" wwp:num-local="3">nineteen</l> <l wwp:num-global="40" wwp:num-regional="20" wwp:num-local="4">twenty</l> <l wwp:num-global="41" wwp:num-regional="21" wwp:num-local="5">twentyone</l> <l wwp:num-global="42" wwp:num-regional="22" wwp:num-local="6">twentytwo</l> <l wwp:num-global="43" wwp:num-regional="23" wwp:num-local="7">twentythree</l> <l wwp:num-global="44" wwp:num-regional="24" wwp:num-local="8">twentyfour</l> </lg> </lg> </lg> </body> </text> </TEI>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] how to cast a sequence (e, Florent Georges | Thread | Re: [xsl] faster complicated counti, Alex Muir |
Re: [xsl] how to cast a sequence (e, David Carlisle | Date | Re: [xsl] how to cast a sequence (e, Andrew Welch |
Month |