Subject: [xsl] Re: Efficiency and replace() From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx> Date: Sun, 10 Sep 2006 12:08:20 -0700 |
Cyrillic characters in the quoted message replaced by spaces as they caused bin64 encoding to be used by gmail, which was rejected by the xsl-list server.
If you can send me the actual troff file and the definition of the mappings I will be interested to look for a better solution.
It seems to me that the str-map template of FXSL 1.x should be more efficient, as it only performs a single pass on the string and will do all the replacements.
-- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk
On 9/10/06, David J Birnbaum <djbpitt+xml@xxxxxxxx> wrote: > Dear XSLTians, > > For a troff-to-XML/Unicode conversion I've implemented a strategy that > produces the desired result, but that does the conversion to Unicode > slowly, and I would be grateful for advice about improving the efficiency. > > I handle the conversion of the structural marked up XML first, and I > wind up with all of my XML tagging in place, but the text strings use > troff escape sequences, rather than Unicode. The text is almost all > medieval Cyrillic, and most of the Cyrillic characters are represented > in the troff with sequences of several ascii characters. The strategy I > adopted to convert the troff character encoding to Unicode was to create > a mapping file for the troff-to-Unicode character correspondences. > Here's a snippet (a single mapping correspondence): > > <mapping> > <troff>\(qb</troff> > <unicode> </unicode> > </mapping> > > I then wrote an XSLT script that reads the file of mappings and > generates another XSLT script that will do the actual remapping. Here's > a snippet of the generated XSLT script; this snippet is taken from > within a template rule for text() nodes (the named template that gets > called follows the snippet): > > <xsl:variable name="temp52"> > <xsl:call-template name="replacement"> > <xsl:with-param name="text"> > <xsl:value-of select="$temp51"/> > </xsl:with-param> > <xsl:with-param name="troff">\\\(\?s</xsl:with-param> > <xsl:with-param name="unicode"> </xsl:with-param> > </xsl:call-template> > </xsl:variable> > <xsl:variable name="temp53"> > <xsl:call-template name="replacement"> > <xsl:with-param name="text"> > <xsl:value-of select="$temp52"/> > </xsl:with-param> > <xsl:with-param name="troff">\\\(\?c</xsl:with-param> > <xsl:with-param name="unicode"> </xsl:with-param> > </xsl:call-template> > </xsl:variable> > . . . > <xsl:template name="replacement"> > <xsl:param name="text"/> > <xsl:param name="troff"/> > <xsl:param name="unicode"/> > <xsl:value-of select="replace($text, $troff, $unicode)"/> > </xsl:template> > > The program logic is that for each text node, the template rule passes > the textual contents to a replace() function that replaces a troff > encoding with the corresponding Unicode value. The replace() function is > then called again with the next mapping. The textual content is passed > along through repeated remappings, and when it emerges on the other end, > all multi-character troff sequences have been replaced with Unicode > characters. There are 64 such mappings. I use replace() only for places > where a multi-character troff string has to be replaced by a single > Unicode character; at the end of the series of calls to replace() I use > translate() to do the remaining one-to-one mappings (there are > approximately 50 of them) in a single function call. The order of the > mappings is (obviously) important; I need to remap longer strings before > shorter ones, since the shorter ones may be subcomponents of the longer > ones. In particular, I can remap individual characters (the one-to-one > mappings) only after I've taken care of all of the many-to-one ones. > > The input file (XML with troff character coding instead of the desired > Unicode) is 6.7MB and the Unicode output is 7.8MB. The transformation > takes approximately five minutes to run, which feels like an eternity, > but I'm not sure to what extent the execution time reflects the size of > the input file and the number of replacements that needs to be > performed, and to what extent it reflects inefficient program design. > Can anyone suggest a revision that would provide a considerable > improvement in efficiency (bearing in mind that the XSLT script that > does the actual character remapping must be generated by XSLT from the > mappings file)? > > Thanks, > > David > djbpitt+xml@xxxxxxxx > >
-- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XSL:problem printing the , Mukul Gandhi | Thread | [xsl] XSL:FO - how to render a sing, Ann Marie Rubin \(an |
Re: [xsl] Removing all line breaks , Mark Peters | Date | [xsl] Re:[xsl] Sort By calclulated , Giancarlo Rossi |
Month |