RE: more on XSLT processor performance

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxx]On Behalf Of Kay Michael
> 
> Compiling won't solve the memory problem. If we're going to make XSLT
> processing of such large files practical, the only way we'll do it is by
> using persistent storage rather than memory for the tree.

I suggested in the Apache's Xalan-J-Dev mailling list the use of
indexed persistent storage.

This is the relevant bit:

If you are talking indexed XML, I also believe so.

I have several ideas on indexing XML for XPath access, but the trouble 
is always to know what to index.

For me, a funny transform cycle concept is:
 1. Analyze the XSLT source and figure out what kind of (XPath) 
    selections from a source document are necessary in order to get 
    all the nodes required for the transformation;
 2. Pre-parse the document indexing only the parts found to be 
    relevant on 1. One should end up with index information much 
    smaller than the full XML source - small enough to fit in memory;
 3. Use a "XLocator" that knows how to use this index to perform the 
    XSLT transformation.

Example of "parts found to be relevant": if you find that the XSLT 
only causes the selection of some elements from the XML source, than 
only the location of those elements should be indexed.

If you use this idea to transform a XML stream, you need to save that
XML (or maybe only relevant parts of it) to temporary disk storage
an build the index information. Only than you proceed generating the 
output stream.
(For the most generic cases. I am not considering that some cases 
could be handled on the fly, as already mentioned in this list.)


In cases where one has a XSLT that gets a small amount of data from
a very big XML file, this approach can be faster than trying to build
a DOM:
 - A full pass is always necessary, but then you only re-read a small
   amount of data (thanks to the indexing);
 - Even during the full pass, full paRsing of the file can be avoided;
 - Creating an index can require much less processing than creating 
   a DOM;
 - Since the index requires less memory use, Virtual Memory use is 
   avoided (less disk swapping).


I know my language is not formaly correct, but...
...does this make sense?


Have fun,
Paulo Gaspar


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

<- Previous	Index	Next ->
RE: more on XSLT processor performa, Kay Michael	Thread	RE: more on XSLT processor performa, Thorbjørn Ravn Ander
Help : Problem with CDATA section, Masaoud T. Moonim	Date	RE: Including files into a styleshe, Kay Michael
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home