Dear Dimitre (cc xsl-list),
> If you can send me the actual troff file and
> the definition of the mappings I will be
> interested to look for a better solution.
Thank you for your willingness to look at this. Because the troff file
is quite large (6.7MB), instead of sending it by mail I have uploaded it to:
http://clover.slavic.pitt.edu/~djb/troff-to-unicode.zip
troff-to-unicode.zip contains:
temp3.xml: xml file with troff character coding. Note that by this stage
I have already converted the troff structural and procedural markup to
xml; the only part of the conversion still to be done involves the
character coding of the textual data.
pvl_mappings.xml: xml file with troff/unicode mapping pairs
pvl_regex_fix.xsl: xsl stylesheet that inserts extra backslashes into
the mapping file so that replace() will work in subsequent stylesheet. I
built the mapping file in two stages this way because that makes it
easier for me to read.
pvl_mappingGenerator.xsl: operates on the output of pvl_regex_fix.xsl to
produce a new stylesheet (which I call pvl_unicode.xsl), which can be
used to convert the character coding in temp3.xml from troff to unicode.
I don't include pvl_unicode.xsl in the zip file because it can be
generated from the included files (see below).
To process:
saxon8 -o pvl_mappings1.xml pvl_mappings.xml pvl_regex_fix.xsl
saxon8 -o pvl_unicode.xsl pvl_mappings1.xml pvl_mappingGenerator.xsl
saxon8 -o temp4.xml temp3.xml pvl_unicode.xsl
Step 1 adds extra backslashes to the mapping file so that regex will
work correctly. Step 2 reads the output of Step 1 and builds the
stylesheet (which I call pvl_unicode.xsl) that will do the actual
character conversion. Step 3 applies that stylesheet to temp3.xsml,
which is the troff-encoded input. temp4.xml is final output. It has the
same structure as temp3.xml, but the troff character coding in temp3.xml
is replaced with unicode in temp4.xml
The problem is the inefficiency of the actual character conversion (the
application of pvl_unicode.xsl to temp3.xml to produce temp4.xml).
Thank you for any advice or suggestions.
> It seems to me that the str-map template of FXSL 1.x
> should be more efficient, as it only performs a single
> pass on the string and will do all the replacements.
I haven't had occasion to use FXSL in any projects yet (although I was
very interested in and impressed by the demonstration at Extreme), so if
that proves to be an effective solution, I'll look forward to learning
more about it.
Best,
David
djbpitt+xml@xxxxxxxx