Subject: What is XSLT? ( very long ) From: Paul Tchistopolskii <paul@xxxxxxx> Date: Tue, 08 Aug 2000 00:16:36 -0700 |
<DISCLAIMER> I apologize for the style of this letter. This letter should be considered to be a loose translation of some short story called "The weekend of YAXSLT hacker". I apologize for possible offtopic and I'l not longer write such a letters into this list, but will try to keep the 'official' style and spirit. However, because I think that this letter is almost about XSLT, it maybe be not really wrong to post it here. </DISCLAIMER> <DISCLAIMER1> The 'quotes' are not the actual quotes, but how some words have been reflected in my head. For example, of course Sebastian was never saying some words - this is my 'internal' interpretation of his words. This is kind of bad literature. </DISCLAIMER1> 1. Sebastian says : "XT sucks - it has no key()". I reflect this in this way because the rest of XSLT is almost implemented by XT, things like encodings is not a big deal and I know Sebastian knows it. 2. I really don't use key() and never had to. I remember James Clark saying ( again - it is my reflection ) "yeah, people are really using key() to my surprise". I smell something is suspicious here. I *have* to understand what happens with key(). I have started the battle, asking Sebastian to provide a usecase. 3. Usecase is here. Yeah.... I hardly can read it .... What it really does? OK, from the output I understand what it does. 4. Looks that I could split it into 2-steps. This helped me when I was rendering reports. ( See below. This is actually important. ). 5. Splitted. It will be nice if first step will render it into <flat1> <year>1234</year> <fnm>name </fnm> <snm>name </snm> <fnm>name </fnm> <snm>name </snm> <year>456</year> ... </flat1> just a bunch of 'lines to draw'. This means step1 should not generate the 'same' year more than once. What??? Why this 'preceding' thing takes so long? OK - let me 'filter' out the 'same year' on step 2. Works. Now I have to enclose the list into <ol> </ol> WHAT ??? Hm. I can not generate 'start-tag' and 'end-tag'. I know that - just forgot ... I actually know that this thing should not be allowed, even I saw some tools already allowing that, and I myself was some day thinking that 'this is handy to generate 'start-tag' and 'end-tag'. But today I'm sure - XSLT is right. (a) I never hit the wall because of this restriction (b) this restriction forces better code - this means it is good restriction. If I'l encounter the situation when (a) or (b) will not be true - I'l of course change my mind. It is the only way to judge the programming language, I think. Whatever - I have been in such situation before, so I know how it should be worked around. I should put those <ol> </ol> and then in the body I have to dump out the list ... of 'what' How can I understand how many persons belong to the 'year' ? I'm not passing the info about the year 'attached' to the person. I should. I should pass the <person> and the <year> OK. Works. 6. Sebastian says 'you can do it in one stylesheet, in 2 stylehseets, or with key'. TIMTOWTDI. Right, right. But some of the ways are ugly. Also, when I tried to put it into one stylehseet *not*piped* - it was darn slow. But wait - I can place those 2 transformations *piped* into one stylehseet with no problem. I should tell it to Sebastian later. ( See below ). 7. Wait ... what I *really* did? Why it was possible for me to go from 'flat' to 'hierarhy' ? Aha - because in the flat file which I got after step1 each record has a 'year' - so it was easy to 'take part out of the flat list' and 'call yourself with the rest'. At this point I *should* already see the truth, but as any human being I was stupid enough to bypass it. ( The truth comes later ). 8. OK. This was easy because the 'flat' list on step2 was 'of 'simple' structure'. What if I make it *complex* structure so that it will be not easy to process it like I did? Here is 'Flat puzlle'. Damn. I dont see the easy way to get from this 'flat' to hierarchy ! I'm tired and have to sleep. 9. Steve says : "we have tons of usecases". Great! I ask him - he should already got such a 'flat' thing from somebody - looks very typical. Also I have some strange feeling ... I smell the odor of the 'key()' function here. The odor is so strong ... But why ??? I come to this 'Flat puzzle' just 'making ' flat file more complex' ... Verrry strange .... 10. Dang. Steve responded and there is of course key(). I should do it without key(). 11. Got a sleep. Looked at this once again. Gee - soo easy! I *already* know how the 'flat' file *should* look to make it 'hierarchical. ( test6 part 2 ). Why I just don't convert this 'flat puzzle' file to *that* flat file! Done. Works. WAIT!!! 12. What is the structure of my 'flat file which is easy to convert to hierarchy ? It is the ordered list of records where each record has a *KEY*. O HOW STUPID I AM!!!! All I'm doing I'm just serializing the HASHTABLE. 13. I implemented the key() functionality in XSLT itself. And of course it is slower than bult-in HASHTABLE AKA key(). ( The funny thing is that my 'implementation' is not *that* slower, but I already know that there are hashtables of hashtables behind the scenes. Yeah - James Clark implemented his own Hahstable for XT - not using the java Hashtable. Sure he is already using hashing techniques here and there. If not - my 'hand-made hashtable' should be *darn* slower than key(). 14. How could *my* hashtable be improved? OK, for example - this typical 'count()' thing in step2 is always doing some useless things. *I* know that the list is sorted. But this stupid 'count()' does not know this! How can I tell to the count() that it should 'stop after 'key changed but not to look over the entire list again and again? I CAN DO THAT WITH THE ROADSIGN - but this means 'key()' again and again .... And it is not readable construction, that key()... and it sometimes will not improve the things... and some optimizations may require another 'logic'... 15. There should be no 'key()' function. There should be only <xsl:key element bulding the tricky indexes and when engine encounters some expression it should use those indexes *if they are applicable for this expression* ( could be signaled by new syntax of <xsl:key,) This is very hard task , but the idea is like PRIMARY KEY in the SQL. Like 'precompiled' option of regular expression in perl. This means <xsl:key will become what it has to be - plain roadsign, forcing building of particular ( maybe more complex than it is now ) indexes to speedup some particular 'regular expressions'. And it should be called <xsl:index, of course. This is the way to go and I think XSLT has missed it, masquerading the real problem with that 'key()' hack, like they did with document of 2 parameters ( masquerading RTF / node-set conversion ). 16. Should I use current key()? Why key() is not readable? key() is not readable because in fact even there is *one* regular expression, the *parts* of this 'regular expression' are placed in different places of the stylesheet! ( not the case with my view on <xsl:index ) In current key() some parts are located in <xsl:key and some parts are in key(). This is the only place in XSLT when to see what happens you should jump from one place to another, composing the actual expression which will be used, but you can see this 'only in your head'. 17. OK, maybe I'm cheating myself here. I'l test it next time if something will smell like key(). Thanks to Steve - I now understand how key() could be used ( look at the pipe and it will show what could become key ;-), so if I'l fail in any trouble with my way - I can always test key() way. Maybe I'm still missing something ... ( I should be honest. I don't think I'm missing something, but there is always a chance. ) 18. Well.... Now I should write to Sebastian that pipe or 'one stylesheet' is a mythical distinction. If I have 'a | b' I can always write a1.xsl with the structure of <xsl:variable name="step1"/> <doc> tranbsformation 1 </doc> </xsl:variable> <xsl:apply-templates select="xt:node-set($step1)" mode = "tranformation2"/> <xsl:template match="/doc" mode="transformation2" > ... 19. Well - in perl I can do the same. Create some hash, pass it down the road... What is the *difference*. The difference is that in perl ( and any other language ) I have access 'by value' and 'by pointer' ( or by reference, but let me call it 'by pointer' ). In XSLT I have only 'access by value'. Java tried hard to kill the distinction between those 'access by pointer' and 'access by value'. They ended with 'mostly access by pointer, but not really'. The way Java did it results in the situation that when you see foo( bar, baz ) you can not tell will the bar or baz be modified in the code, because those bar and baz could be passed down the road e t.c. ( With other languages you at least can *guess* that if it is 'by pointer' it is 'to be modified'. Well - very hypothetical, because it is also not true ;-) It at least gives some chance. Or you can start utilizing the special notation - but this all is very weak, I think. Anything based on 'notation' is weak, a last chance try .... Hmmm... XSLT looks very strong here because ... WITH XSLT WE HAVE NO SUCH PROBLEMS. XSLT's 'weakness' ( lack of updateable variables ) is actually *very* strong feature of XSLT. I was stupid not understanding this for a *very* long time, but nobody told me the right thing! They were talking about 'side-effects', 'declarative languages' e t.c. But the point is that XSLT is the language which has only ACCESS BY VALUE, but no 'access by pointer', because updating the variable is just a simple case of ACCESS BY POINTER. Well, maybe there were somebody saying it in this words, but I can not remember this. The Bible is messy on this topic, explaining some mythical usecases ( even the Bible is in fact trying to say about NO ACCESS BY POINTER ( or by reference ) - just using another words... ) 20. Why it is good *not* to have access by pointer then? We are used to access by pointer and was Niklaus Wirth an idiot? ( The answer is - of course he was *not*) Why we are used to access by pointer? I think it is because of this 'efficiency' thing. It is a 'roadsign' for 'more efficient' ( 'memory-saving' ) internal dataflows. Consider the hypothetical situation when you are passing the *entire* context of *entire* program to *every* function in some special way. 'fast-searchable global variable', but not many 'prepared' variables. 'database instead of variables'. Like in SQL, for example - rows have no names. ;-) Do you need the access by pointer, if each of your functions has 'efficient way to 'search' for the knowledge' which otherwise 'was accumulated in the appropriate variable' ? The answer is : it looks that it is really possible for most of cases to 're-do' some things constantly. Is it *less* efficient than accumulating knowledge in the variables? Yes. For 'simple architecture' ( no parallelism ) XSLT ( access by value only ) is *by design* less efficient than any language which allows access-by-pointer. What the hell? Why should we use it if it is less efficient? Because of the same reason we can live without key(). A bit ( even twice as ) non-efficient, but 'clean'. XSLT semantics allows writing the cleanest possible code ( below there will be one more critical feature of XSLT which allows that. ) This is really funny. XSLT says : "use key() roadsign to hack for speed" and on another hand the same XSLT says : "*don't* use 'access-by-pointer' to hack for speed." Could you please explain it again? Well... At the moment I can show some things by example ( as I said, Bible is actually talking about the similar stuff, it is just using suspicious usecases - there should be better ones, but this requires somebody with nice hardware background. How good will be parallelism for XSLT is questionable - this is another long topic. I'm talking about the software part AKA clean code and 'current' hardware architecture. ) 21. The example is again the 'flat' -> Hierarchical 'converter' step2. count() is the 'fast recalculator'. "searcher of information" instead of 'storer of information' ( updateable variable ). In the language with updateable variables I could ( on step 1 ) iterate over the list and then store the number of members, and then pass this information down the road. Because I can not do that, I'm passing the 'entire content down the road' and 'when I need that number of members - I'm recalculating it'. There will always be overhead of those recalculations in XSLT, the question is 'is this overhead worth the clean code you get' ? My answer is "yes". I'm rejecting the key() for the sake of clear code, so of course I'm rejecting the evil of 'access by pointer', no matter that 'access-by-pointer' gives me yet another ability for manual tuning of efficiency. I don't need malloc(), I'm OK with garbage collector. This is all about the same. I'm betting on XSLT with clear understanding that XSLT will be slower than any 'ordinary' language. I know that XSLT will be *more clean* in return. And also I know that XSLT is for pipes and pipes are for XSLT. 22. Why pipes are for XSLT ? 22.1 What are those 'pipes' ? First - please do not forget that pipes have almost *nothing* to do with the number of styleheets. Having it in multiple stylesheets is just 'a bit cleaner and easier' ( For example, to dump the intermediate dataflow I can just redirect it to the file , ( not inserting <xsl:copy-of select ), a bit easier is to make per-node validation. Small thing here - small thing there. Not a rocket science. 'pipe' is first of all a logical entity. 'Thinking pipes' is very important UNIX skill which is rare in current world of people with no math education. To love pipes you should love math. Writing pipes component - after - component is like first making lemma, then theorem then another ... and then re-use a theorem. Soo cool. But this is also hard, yes. Not anybody gets the beauty of math. Well - people are all different. I for example don't get the beauty of chemistry, and I remember some developers who were very good in chemistry in the university - they had very special view on programming from my point of view ;-) Actually some of them were very good developers, just 'different'. Education has a huge impact, actually. Those who have no fun from math have no fun from pipes and usually simply have no skills ( even they think they do have). Collecting complex statistics from log files is a nice task to get those skills. Not any UNIX activity helps you to 'get it'. This is the reality, sorry if this sounds strange to some of you with, say, VMS-only background. ( Not bad thing to remember that ugly unreliable UNIX have crushed reliable and accurate VMS. Because of pipes. ) Math is hard. Pipes are also hard. Not many people get math. Not many people get pipes. Pipes is a concept and it is not obvious concept. 22.2. OK, stop it, this is all simple - we know this - show how 'thinking pipes' works with XSLT. Hm. First - have you mentioned the test6 and 'Flat puzzle' ? But pipes can do more than just 'serializing hashtable' and then 're-using the theorem'. Consider some statistical report. Let's say "Batches of checks processed on some box". You get the number of complex records of unknown height and you want to print a footer at the end of each page. This is all of course plain ASCII. So in 'normal' language when outputting each new line to the printer, I'm just Nlines++; if ( NLines == PAGE_WIDTH ) { print_footer() } Not that simple with XSLT ;-) I think this actually looks darn hard and almost unsolvable 'clearly', if not 'thinking pipes'. key() will not help here ;-) However, the solution is very much similar to those used before. Just assume that there is another stylesheet 'down the road' out there and print the 'flat list' <line> content </line> <line> content </line> .... And then in the second stylesheet 'group' the number of lines into 'page' exactly like test6 step 2 works - just use select="list[ position() gt; $PAGE_WIDTH ]" See - the same grouping component again. And no 'keys' to worry about at all. Maybe this grouping component is a generic thing? Maybe there are more ? Yes and yes. In the presence of 'second stylesheet' you may realize that you need some better balancing between first stylesheet and second e t.c. e t.c. This will result in 'clean' dataflow between the nodes. I mean only things which should be passed will be passed ( remember - there is no 'access by pointer', no evil pointers are passed down the road ;-) Mind to compare this to what happens inside the typical intranet written in Java / perl / C++ / whatever ? XSLT really helps. 'Thinking pipes' really helps. It is good for your code, it is good for your data, it is good, because XSLT is 'closer' to UNIX pipes than any other language. Why? Because of no 'access by pointer', XSLT is 'pushing more and more content down the road not looking back'. Pardon - but this is the way how UNIX pipe works! Both XSLT and UNIX pipes are 'looking only ahead but not look back'. They are good for each other. 22.3. Why XSLT syntax is good for pipes. In fact every time you write <xsl:call-template non-recursive and with 'simple parameters' you should think twice. WHAT????? Yes, you should. In the presence of the second stylesheet (transformation) you can always write <xsl:call-template name="foo" ... with param="bar> In the form of <FOO attr="bar"/> And then provide the <xsl:template match="FOO" into second stylesheet. And look - what could be better to read? Worth thinking every time, actually. 23. Heck - I can do the same with perl. I can pass only hashes by value e t.c. What is special about XSLT ? ... Syntax ... You can not do 22.3, for example, easily switching from 'this is data - this is code' Yes, I know, I know - you think you *can* with Text::Template and things like, say, XPathScript. The truth is that you can not. Small thing here, small thing there. XSLT is darn good. The problem was XSMLish notation,but XSLTScript notation always solved this problem for me. Yes - no else and no auto-recursion. This is not the big deal with XSLScript or other preprocessor. The core of XSLT as a 'templatish dataflow language suited for piped transformations by design' It is very healthy. 24. Forget all the crap about those Access-by-pointer tricks. Think pipes and dataflows. XSLT appears to be the first language which forces clean dataflows - even it appears they were not understanding what they really invented. This happens very often. 25. The challenge is still open. I'l be glad to see something which really needs the updateable variable. For a while I was thinking that 22.2 is the case, but I think now it is clear that it is not the case and the Nlines++; if ( NLines == PAGE_WIDTH ) { print_footer() } Is a hack, but piped XSLT view is *better*. 25. There should be some problems!!! Yes: 1. Extensions. If not 'thinking pipes'. If 'thinking pipes' - could be possible to cooperate even with 'event driven GUI'. Check Plan 9 for how to write GUI with awk and do the same. Research required. 2. Speed. If not providing some smart way to 'prebuild' some indexes - some parts could be darn slow or become horrible mess of key() statements. It could be partially solved with XSLScript introducing some meta-construction(s) for autogeneration bunch of 'key()' - but this is a mess. It is better to have it in a core. But this is questionable. Because nobody was thinking about generation of key() out of Xpath expressions, I doubt that current design or even syntax of key() will survive 'the good desgin'. Research required. Rgds.Paul. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Flat puzzle. Replacing key() with p, Paul Tchistopolskii | Thread | RE: What is XSLT? ( very long ), Linda van den Brink |
Announce: PHP Class Wrapper for Sab, Bill Humphries | Date | vertical text in tables?, Ralf Kempkens |
Month |