Subject: Re: [xsl] Comparing documents: what of P is a subset of D? From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx> Date: Fri, 28 Feb 2014 11:57:03 +0100 |
@Michael: your answer triggered a thought process that outlined the way to a solution I'm able to implement. I don't know whether this is of any interest to others, but it's a nice little exercise for a training, illustrating mode, key, another input document. Problem: Given two XML files according to the same XML schema, find all leave nodes (text() and @*) in one document ("Patch") that have an identical value at the same iXPath in the other document ("Data"), where an iXPath is an XPath using element, attribute names and predicates [@_ix eq n] wherever they occur (in repeating elements). Solution outline: Process the Patch document, creating a set of nodes <p2v @path @value> mapping iXPaths to values, with a key based on @path. Then, process the Data document analoguously, looking up iXPaths in the key and comparing values, where found. Below is the code, very likely not perfect ;-) (Note that the output would be much more readable if an iXPath could be truncated at a point where the subtree is identical in the defined way.) Thanks W <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:wl="http://members.inode.at/w.laun" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" /> <xsl:strip-space elements = '*' /> <xsl:param name="patchfile" as="xs:string"/> <xsl:variable name="patch" select="document($patchfile)" /> <xsl:key name = "path2value" match = "p2v" use = "@path"/> <!-- pass over patch file --> <xsl:variable name="map" as="document-node()"> <xsl:document> <map> <xsl:for-each select = "$patch"> <xsl:apply-templates select = "*" mode="indexing"> <xsl:with-param name = "path" select = "''" /> </xsl:apply-templates> </xsl:for-each> </map> </xsl:document> </xsl:variable> <xsl:template match="*" mode="indexing"> <xsl:param name = "path" as = "xs:string" /> <xsl:apply-templates select = "*|@*|text()" mode="indexing"> <xsl:with-param name = "path" select = "concat( $path, '/', local-name() )"/> </xsl:apply-templates> </xsl:template> <xsl:template match="*[@_ix]" mode="indexing"> <xsl:param name = "path" as = "xs:string" /> <xsl:apply-templates select = "*|@*|text()" mode="indexing"> <xsl:with-param name = "path" select = "concat( $path, '/', local-name(), '[', @_ix, ']' )"/> </xsl:apply-templates> </xsl:template> <xsl:template match="@*" mode="indexing"> <xsl:param name = "path" as = "xs:string" /> <xsl:variable name = "fp" select = "concat( $path, '/', local-name() )"/> <p2v path = "{$fp}" value = "{.}"/> </xsl:template> <xsl:template match="@_ix" mode="indexing"/> <xsl:template match="text()" mode="indexing"> <xsl:param name = "path" as = "xs:string" /> <p2v path = "{$path}" value = "{.}"/> </xsl:template> <!-- Pass over DB data file --> <xsl:template match = "/"> <xsl:apply-templates mode="comparing"> <xsl:with-param name = "path" select = "''" /> </xsl:apply-templates> </xsl:template> <xsl:template match="*" mode="comparing"> <xsl:param name = "path" as = "xs:string" /> <xsl:apply-templates select = "*|@*|text()" mode="comparing"> <xsl:with-param name = "path" select = "concat( $path, '/', local-name() )"/> </xsl:apply-templates> </xsl:template> <xsl:template match="*[@_ix]" mode="comparing"> <xsl:param name = "path" as = "xs:string" /> <xsl:apply-templates select = "*|@*|text()" mode="comparing"> <xsl:with-param name = "path" select = "concat( $path, '/', local-name(), '[', @_ix, ']' )"/> </xsl:apply-templates> </xsl:template> <xsl:template match="@*" mode="comparing"> <xsl:param name = "path" as = "xs:string" /> <xsl:variable name = "fp" select = "concat( $path, '/', local-name() )"/> <xsl:variable name = "pval" select = "key( 'path2value', $fp, $map/map )/@value"/> <xsl:if test = "$pval eq ."> <xsl:value-of select = "concat( $fp, ' ... ', $pval)"/><xsl:text> </xsl:text> </xsl:if> </xsl:template> <xsl:template match="@_ix" mode="comparing"/> <xsl:template match="text()" mode="comparing"> <xsl:param name = "path" as = "xs:string" /> <xsl:variable name = "pval" select = "key( 'path2value', $path, $map/map )/@value"/> <xsl:if test = "$pval eq ."> <xsl:value-of select = "concat( $path, ' ... ', $pval)"/><xsl:text> </xsl:text> </xsl:if> </xsl:template> </xsl:stylesheet> On 27/02/2014, Michael Kay <mike@xxxxxxxxxxxx> wrote: > I'm not sure I've completely understood your "equality" relation that > underpins the intersection. Perhaps it's based on equality of the function > > string-join(ancestor-or-self::*/@_ix, '|') > > let's call this function $f, and we can use this as a parameter to the rest > of the solution. > > we then need to do > > doc('d.xml')//fc[some $e in doc('p.xml') satisfies $f($e) eq $f(.)] ! > path(.) > > where path(.) is a function you can write to display the path to the > selected fc element. > > The only remaining problem is that this is O(n*m) where n and m are the > sizes of D and P. For a more efficient solution, define a key on P.XML that > indexes each element on the value of the function $f, and replace the > predicate by a call on key(). > > The above uses XPath 3.0, but it can probably be expressed in XPath 2.0 > easily enough at the cost of hard-coding the equality function. > > Michael Kay > Saxonica > > > On 27 Feb 2014, at 10:25, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote: > >> <cca><!-- a D XML --> >> <rela _ix='0' fa='0' fb='1'> >> <fc _ix='1' fc_fa='X1' fc_fb='1'/> >> <fc _ix='2' fc_fa='X2' fc_fb='2'/> >> </rela> >> <rela _ix='1' fa='10' fb='11'> >> <fc _ix='1' fc_fa='Y1' fc_fb='11'/> >> <fc _ix='2' fc_fa='Y2' fc_fb='12'/> >> </rela> >> <rela _ix='5' fa='50' fb='51'> >> <fc _ix='1' fc_fa='A1' fc_fb='51'/> >> <fc _ix='2' fc_fa='A2' fc_fb='52'/> >> </rela> >> <relb>...</relb> >> <relc>...</relc> >> </cca> >> >> <cca><!-- a P XML --> >> <rela _ix='1' fa='10'> >> <fc _ix='1' fc_fa='Y1' fc_fb='99'/> >> </rela> >> <rela _ix='5' fa='50' fb='51'> >> <fc _ix='1' fc_fb='51' fc_fc='123'/> >> <fc _ix='2' fc_fa='A2' fc_fb='52' fc_fc='456'/> >> </rela> >> </cca> >> >> Expected output: >> >> /cca/rela(1)/fa 10 >> /cca/rela(1)/fc(1)/fc_fa Y1 >> /cca/rela(5)/fa 50 >> /cca/rela(5)/fa 51 >> /cca/rela(5)/fc(1)/fc_fb 51 >> /cca/rela(5)/fc(2)/fc_fa A2 >> /cca/rela(5)/fc(2)/fc_fb 52 >> >> Note that parentheses enclose values of @_ix. >> >> -W >> >> On 27/02/2014, Michael Kay <mike@xxxxxxxxxxxx> wrote: >>> It would be easier to understand the problem with some example data. >>> >>> Michael Kay >>> Saxonica >>> >>> On 27 Feb 2014, at 08:05, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote: >>> >>>> The data model for a set of similarly (but not identically) built XML >>>> documents is: a collection of arrays of records, which may contain >>>> (recursively) arrays, records and scalars. (The terms "array" and >>>> "record" are used in their "classic" meaning as, e.g., in Pascal.) >>>> Document structures are fairly stable, but they do change over time. >>>> Array elements are identified (indexed) by @_ix, not by position. >>>> Record fields can be elements or attributes (when they are scalar). >>>> Order is undefined, since XPaths plus @_Ix's pinpoint each node. >>>> >>>> One XML document D contains a full population for such a data set >>>> (O(1MB)). A second XML document P contains "patches", i.e., each node >>>> appearing in P is expected to be in D as well. >>>> >>>> If S(P) is the sequence of nodes (annotated with their XPaths) in P >>>> and S(D) the one with nodes from D, how can I determine S(P) intersect >>>> S(D) (except all @_ix, whose values are bound to be identical)? Of >>>> course, I don't want the common set of *data items* - I want the XML >>>> paths of those common data items. >>>> >>>> A solution (in XSLT 2.0) should not need individual adaption for each >>>> kind of data set. >>>> >>>> I'm confident that I can create text files for D and P containing one >>>> line <path> <value> for each node and run diff (after sort). >>>> >>>> Any better ideas? >>>> >>>> Cheers >>>> Wolfgang
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Comparing documents: what, Michael Kay | Thread | [xsl] Unicode characters being repl, dvint |
Re: [xsl] executable but unreadable, Michael Kay | Date | [xsl] Can someone explain this gene, russurquhart1 |
Month |