Subject: Re: [xsl] Tree Comparing Algorithm From: "Vasu Chakkera vasucv@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 3 Feb 2020 20:10:04 -0000 |
Thanks both. Martin's solution sort of worked, but it only gave me 21 children, but I had around 21000 nodes in the xml. I am not sure to what depth the comparison is happening. Vasu On Mon, 3 Feb 2020 at 12:16, Michael Kay mike@xxxxxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > The only facility in the XSLT 3.0 to allow streaming of two input files > "in parallel" is xsl:merge, and as Martin points out, that's rather > specialised and not really suited to your requirements. > > In Saxon, streaming is in most cases done in push mode (where the parser > owns the control loop, and sends events to the XSLT processor). You can't > have two parallel control loops except with multi-threading, so the > opportunities for streaming multiple files are limited (with xsl:merge, > Saxon indeed uses multi-threading). > > At first sight, I don't see an XSLT-based answer to this one. > > Except, perhaps: you could do a streamed transformation of each input > documents into an XML representation of an event stream, like > > <startElement name="folder" path="" hash=""/> > <startElement name="folder" path="" hash=""/> > <endElement name="folder"/> > > etc > > and then attempt to do an xsl:merge of the two event streams. > > Michael Kay > Saxonica > > On 3 Feb 2020, at 13:47, Vasu Chakkera vasucv@xxxxxxxxx < > xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi All, > I am planning to write a XML Tree comparing XSLT using streaming. > The XML Trees look something like this > <root path="" mhash =" "> > <folder path ="" mhash =""> > <folder path ="" mhash =""> > <leaf path ="" mhash =""> > </leaf> > </folder> > </folder> > </root> > There will be two such XML files to compare . These two XMLs are generated > before and after moving a folder from source to destination. Source and > destination could be two different OS. > This is essentially the serialized Merkle Tree output of a folder > structure. The idea is to run a Merkle Tree comparator that will pick the > nodes that did not match. Rules are as follows. > > 1. If the root node in both the tree matches, then there is not > difference in the entire tree(because of how the Merkle tree is generated) > 2. If root node hash does not match, we go to the child container and > compare the hash of the child container in both the XML files. ( the XML > folders structure will be identical with respect to the hash, but the > folder path may be different because of the linux, windows path > conventions. Otherwise the folder structure is meant to be the same.) > 3. If the hash of a folder from both the trees are same, the entire > tree under the folder that matches the hash is ignored. > 4. if the hash of a folder from both the trees are not the same, then > the tree is further traversed and the step 3 is repeated. > 5. The XSLT keeps writing out the nodes that do not match the hashes > in the source and target xml files > > > So at the end of the processing, A comparator tree should be serialized, > that has the nodes that have a non matching leaf node. > Looking at the serialized tree, we can determine, which files got messed > up while doing a transfer from Source to target. > > > > I am able to do this using non streaming xslt, but with streaming, since > we need to stream two trees at a time and match compare the nodes, i am > not very sure how to proceed. > I am able to do manipulations on one XML with streaming. I tried a few > tricks, but did not get anywhere ( I am not very comfortable copying my > code scribbling here) > > I need streaming because the XML files may be big. > If someone has done something similar, or point me to an intelligent way > to do this, I will be thankful. > > Vasu > > > > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by > email) > > > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/620062> (by > email <>) > -- Vasu Chakkera NodeLogic Limited Oxford www.node-logic.com ==============
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Tree Comparing Algorithm, Michael Kay mike@xxx | Thread | Re: [xsl] Tree Comparing Algorithm, Martin Honnen martin |
Re: [xsl] Tree Comparing Algorithm, Michael Kay mike@xxx | Date | Re: [xsl] Tree Comparing Algorithm, Martin Honnen martin |
Month |