Subject: Re: Bug in 'xsl:sort'. ( XT vs SAXON. ) From: Paul Tchistopolskii <paul@xxxxxxx> Date: Sat, 05 Aug 2000 16:25:35 -0700 |
----- Original Message ----- From: Jeni Tennison > If you go a little further on in the XSLT Recommendation, it says: > > "NOTE: It is possible for two conforming XSLT processors not to sort > exactly the same. Some XSLT processors may not support some languages. > Furthermore, there may be variations possible in the sorting of any > particular language that are not specified by the attributes on xsl:sort, > for example, whether Hiragana or Katakana is sorted first in Japanese. This is not the case here, right? ( Actualy I don't understand why something other than UTF * should supported by W3C standards, but that's another story ). > Future versions of XSLT may provide additional attributes to provide > control over these variations. Implementations may also use > implementation-specific namespaced attributes on xsl:sort for this. This is also not the case, right ? > NOTE: It is recommended that implementers consult [UNICODE TR10] for > information on internationalized sorting." > > The values should be sorted "lexicographically in the culturally correct > manner for the language specified by lang" but I guess the question arises > in English (as it does in other languages) about whether '-' is > lexicographically before '0' or not. Right. But I'm not sure the question is about 'English'. I think the question realy is 'in UTF8' ? > If you follow up the UNICODE reference, there is a file that gives the > order for sorting just about every character you can think of > [http://www.unicode.org/unicode/reports/tr10/basekeys.txt]. In this file, > various sorts of hyphens: > > 00AD ; [*020B.0020.0002.00AD] # SOFT HYPHEN <cut/> > come before (i.e. should be sorted before) various forms of 0: > 0030 ; [.06B9.0020.0002.0030] # DIGIT ZERO <cut/> > This would imply that '-1' should be before '0' because '-' sorts before > '0'. However, on > [http://www.unicode.org/unicode/reports/tr10/index.html#Alternate > Weighting] there is some extra stuff about options involving the weighting > of hyphens (& various other characters) that might contradict this but that > I can't get my head around right now. Looks this is correct. String minus_one = "-1"; String zero = "0"; System.out.println( zero.compareTo( minus_one ) ); prints 3 ( this means zero is greater than minus_one ). This is realy interesteing, huh? 'how many documents should you read to understand what comes first '-' or '0' ? > I don't think that either SAXON or XT is 'right'. They employ different > sort orders, Why? There is no special encodings or special sorting attributes. Both engines receive the same 'lang' environment ( Or they dont??? ) , why they employ different sort orders? > but from what I can gather, it's fine for them to do so and > still both be compliant. I still think something is strange here. They both are sorting UTF8 (?) without any special cases mentioned in the W3C paper and the question is : "in UTF8(?) what comes first '-' or '0' ?" - Right? Is it legal they are giving the different ansewers to teh same question? > Eventually the differences between them should be > diminished through the specification of additional attributes. Pardon, what attrubutes do you mean ??? I now think maybe this is is the bug in XT ? Rgds.Paul. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Bug in 'xsl:sort'. ( XT vs SAXO, Jeni Tennison | Thread | Re: Bug in 'xsl:sort'. ( XT vs SAXO, Jeni Tennison |
Re: <xsl:stylesheet xmlns..., Paul Tchistopolskii | Date | Re: problem using Xalan from within, Wendell Piez |
Month |