Subject: Re: Bug in 'xsl:sort'. ( XT vs SAXON. ) From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx> Date: Sat, 05 Aug 2000 19:26:24 +0100 |
Paul, >I wish nobody will kill me, but I'm sure that there is >a bug either in XT or in SAXON. And I wish somebody >who can read the specs better than me will tell me >who is right. XT is latest XT, Saxon is instant SAXON >downloaded today. ( It says : SAXON 5.4 from Michael Kay of ICL ) If you go a little further on in the XSLT Recommendation, it says: "NOTE: It is possible for two conforming XSLT processors not to sort exactly the same. Some XSLT processors may not support some languages. Furthermore, there may be variations possible in the sorting of any particular language that are not specified by the attributes on xsl:sort, for example, whether Hiragana or Katakana is sorted first in Japanese. Future versions of XSLT may provide additional attributes to provide control over these variations. Implementations may also use implementation-specific namespaced attributes on xsl:sort for this. NOTE: It is recommended that implementers consult [UNICODE TR10] for information on internationalized sorting." The values should be sorted "lexicographically in the culturally correct manner for the language specified by lang" but I guess the question arises in English (as it does in other languages) about whether '-' is lexicographically before '0' or not. If you follow up the UNICODE reference, there is a file that gives the order for sorting just about every character you can think of [http://www.unicode.org/unicode/reports/tr10/basekeys.txt]. In this file, various sorts of hyphens: 00AD ; [*020B.0020.0002.00AD] # SOFT HYPHEN 002D ; [*020C.0020.0002.002D] # HYPHEN-MINUS FF0D ; [*020C.0020.0003.FF0D] # FULLWIDTH HYPHEN-MINUS; COMPAT FE63 ; [*020C.0020.000F.FE63] # SMALL HYPHEN-MINUS; COMPAT 2010 ; [*020D.0020.0002.2010] # HYPHEN 2011 ; [*020D.0020.001B.2011] # NON-BREAKING HYPHEN; COMPAT 2012 ; [*020E.0020.0002.2012] # FIGURE DASH 2013 ; [*020F.0020.0002.2013] # EN DASH FE32 ; [*020F.0020.0016.FE32] # PRESENTATION FORM FOR VERTICAL EN DASH; COMPAT 2014 ; [*0210.0020.0002.2014] # EM DASH FE58 ; [*0210.0020.000F.FE58] # SMALL EM DASH; COMPAT come before (i.e. should be sorted before) various forms of 0: 0030 ; [.06B9.0020.0002.0030] # DIGIT ZERO FF10 ; [.06B9.0020.0003.FF10] # FULLWIDTH DIGIT ZERO; COMPAT 24EA ; [.06B9.0020.0006.24EA] # CIRCLED DIGIT ZERO; COMPAT 2070 ; [.06B9.0020.0014.2070] # SUPERSCRIPT ZERO; COMPAT 2080 ; [.06B9.0020.0015.2080] # SUBSCRIPT ZERO; COMPAT 0660 ; [.06B9.011C.0002.0660] # ARABIC-INDIC DIGIT ZERO 06F0 ; [.06B9.011D.0002.06F0] # EXTENDED ARABIC-INDIC DIGIT ZERO 0966 ; [.06B9.011E.0002.0966] # DEVANAGARI DIGIT ZERO 09E6 ; [.06B9.011F.0002.09E6] # BENGALI DIGIT ZERO 0A66 ; [.06B9.0121.0002.0A66] # GURMUKHI DIGIT ZERO 0AE6 ; [.06B9.0122.0002.0AE6] # GUJARATI DIGIT ZERO 0B66 ; [.06B9.0123.0002.0B66] # ORIYA DIGIT ZERO 0C66 ; [.06B9.0125.0002.0C66] # TELUGU DIGIT ZERO 0CE6 ; [.06B9.0126.0002.0CE6] # KANNADA DIGIT ZERO 0D66 ; [.06B9.0127.0002.0D66] # MALAYALAM DIGIT ZERO 0E50 ; [.06B9.0128.0002.0E50] # THAI DIGIT ZERO 0ED0 ; [.06B9.0129.0002.0ED0] # LAO DIGIT ZERO 0F20 ; [.06B9.012A.0002.0F20] # TIBETAN DIGIT ZERO 0F33 ; [.06B9.012A.0002.0F33] # TIBETAN DIGIT HALF ZERO; COMPAT 3007 ; [.06B9.012B.0002.3007] # IDEOGRAPHIC NUMBER ZERO This would imply that '-1' should be before '0' because '-' sorts before '0'. However, on [http://www.unicode.org/unicode/reports/tr10/index.html#Alternate Weighting] there is some extra stuff about options involving the weighting of hyphens (& various other characters) that might contradict this but that I can't get my head around right now. I don't think that either SAXON or XT is 'right'. They employ different sort orders, but from what I can gather, it's fine for them to do so and still both be compliant. Eventually the differences between them should be diminished through the specification of additional attributes. Cheers, Jeni Dr Jeni Tennison Epistemics Ltd * Strelley Hall * Nottingham * NG8 6PE tel: 0115 906 1301 * fax: 0115 906 1304 * email: jeni.tennison@xxxxxxxxxxxxxxxx XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Bug in 'xsl:sort'. ( XT vs SAXON. ), Paul Tchistopolskii | Thread | Re: Bug in 'xsl:sort'. ( XT vs SAXO, Paul Tchistopolskii |
Re: replacing key() with pipe., Steve Muench | Date | Re: <xsl:stylesheet xmlns..., Sebastian Rahtz |
Month |