You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xalan.apache.org by Simon Kitching <si...@ecnetwork.co.nz> on 2002/11/27 01:53:07 UTC

how best can tags be grouped?

Hi all,

I hope that someone here can help me out with the following issue.

I have input which is something like this:

<tags-in>
  <tag ref="1" value="1"/>
  <tag ref="1" value="2"/>
  <tag ref="2" value="3"/>
  <tag ref="2" value="4"/>
</tags-in>

and need to generate something like this:

<tags-out>
  <group ref="1">
    <item value="1"/>
    <item value="2"/>
  </group>
  <group ref="2">
    <item value="3"/>
    <item value="4"/>
  </group>
</tags-out>

In other words, I need to identify each *unique* value of "ref", and
then process the set of <tag> nodes with the same ref value as a group.

I have an implementation which uses "preceding", but this algorithm is
order n-cubed :-(. And my real input may have quite a large number of
<tag> nodes. See the end of this email for the current algorithm..

Does anyone have a good suggestion for a better solution? I would like
to find a standard-compatible solution, but a xalan-specific one would
do. Maybe there's some trick that can be used with exslt extensions
"map" or "closure"?

One thought was to use <xsl:key>, which will nicely build a "hashtable"
with each entry being the set of matching nodes. However, the <xsl:key>
functionality doesn't seem to provide any way to iterate over the set of
unique keys. 

In lieu of any other solution, I will probably implement a custom
extension to duplicate <xsl:key> (or extend it) with the additional
functionality to get the set of keys. It would then look something like:

<xsl:key nme="tag-by-ref" match="tag" use="@ref"/>

<xsl:for-each select="keyset-for-key('tag-by-ref')">
  <group ref="{.}">
    <xsl:for-each select="key('tag-by-ref', .)">
      <item value='{@value}'/>
    </xsl:for-each>
  </group>
</xsl:for-each>

Any better solutions than implementing a custom extension would be
welcome. I feel I may be missing something obvious...

If the best solution is a custom extension, does anyone think such a
thing might be useful in the standard xalan extension library? I'm not
sure how many people might face a problem like this, but wouldn't be
surprised to find this fairly often; it is the sort of thing that can
happen when a database is exported as xml [which is basically what is
happening here].

Any help will be much appreciated!

Regards,

Simon

PS: the current solution is (roughly):
<xsl:for-each select="tag[not(@ref=preceding::@ref)]">
 <xsl:for-each select="../tag[@ref=./@ref]">
   ....
 </xsl:for-each>
</xsl:for-each>

or in words:
 for each tag  [order N]
   if the set of previous nodes with a matching tag is empty: [order
N-squared]
     // the current tag's ref hasn't been encountered before
     scan the set of tags, and for each one with a ref matching the
current tag [N-cubed]
       // process

Re: how best can tags be grouped?

Posted by Simon Kitching <si...@ecnetwork.co.nz>.

Thanks David!

That URL's right on topic. Nice to know I'm not alone :-)

To summarize for those who are interested...

(a)
XSLT 2.0 will have an <xsl:for-each-group> tag to address this sort of
issue.
(b)
The SAXON processor has a custom <saxon:group> tag for this.
(c)
Xalan doesn't have anything to address this directly. However the
"Muenchian method" can solve this with *reasonable* performance :-
certainly good enough for me for the moment.

The "Muenchian method" for solving this issue in XSLT 1.x without any
custom extensions is basically:
(1) use xsl:key 
(2) use
  <xsl:for-each 
    select= 
     "tag[
       generate-id(@ref) = 
         generate-id(key('tag-by-ref', keyfield)[1])]">
    <!-- we now have a key to index into the xsl:key -->
    <xsl:for-each select="key(@ref)">
      ...

Part two scans the list of nodes, finding the first node that has a
specific @ref value by testing whether the node is the first node in the
nodelist returned by the key() method for its ref.

This isn't as good as being able to get the set of keys directly, but is
close enough for me.

Thanks again, David.

Regards,

Simon

On Wed, 2002-11-27 at 14:18, David N Bertoni/Cambridge/IBM wrote:
> 
> 
> This is a grouping problem, so it sounds like a job for the Muenchian
> method:
> 
>    http://www.dpawson.co.uk/xsl/sect2/N4486.html
> 
> Dave
> 
> 
> 
>                                                                                                                                         
>                       Simon Kitching                                                                                                    
>                       <simon@ecnetwork         To:      xalan-j-users@xml.apache.org                                                    
>                       .co.nz>                  cc:      (bcc: David N Bertoni/Cambridge/IBM)                                            
>                                                Subject: how best can tags be grouped?                                                   
>                       11/26/2002 04:53                                                                                                  
>                       PM                                                                                                                
>                                                                                                                                         
> 
> 
> 
> Hi all,
> 
> I hope that someone here can help me out with the following issue.
> 
> I have input which is something like this:
> 
> <tags-in>
>   <tag ref="1" value="1"/>
>   <tag ref="1" value="2"/>
>   <tag ref="2" value="3"/>
>   <tag ref="2" value="4"/>
> </tags-in>
> 
> and need to generate something like this:
> 
> <tags-out>
>   <group ref="1">
>     <item value="1"/>
>     <item value="2"/>
>   </group>
>   <group ref="2">
>     <item value="3"/>
>     <item value="4"/>
>   </group>
> </tags-out>
> 
> In other words, I need to identify each *unique* value of "ref", and
> then process the set of <tag> nodes with the same ref value as a group.
> 
> I have an implementation which uses "preceding", but this algorithm is
> order n-cubed :-(. And my real input may have quite a large number of
> <tag> nodes. See the end of this email for the current algorithm..
> 
> Does anyone have a good suggestion for a better solution? I would like
> to find a standard-compatible solution, but a xalan-specific one would
> do. Maybe there's some trick that can be used with exslt extensions
> "map" or "closure"?
> 
> One thought was to use <xsl:key>, which will nicely build a "hashtable"
> with each entry being the set of matching nodes. However, the <xsl:key>
> functionality doesn't seem to provide any way to iterate over the set of
> unique keys.
> 
> In lieu of any other solution, I will probably implement a custom
> extension to duplicate <xsl:key> (or extend it) with the additional
> functionality to get the set of keys. It would then look something like:
> 
> <xsl:key nme="tag-by-ref" match="tag" use="@ref"/>
> 
> <xsl:for-each select="keyset-for-key('tag-by-ref')">
>   <group ref="{.}">
>     <xsl:for-each select="key('tag-by-ref', .)">
>       <item value='{@value}'/>
>     </xsl:for-each>
>   </group>
> </xsl:for-each>
> 
> Any better solutions than implementing a custom extension would be
> welcome. I feel I may be missing something obvious...
> 
> If the best solution is a custom extension, does anyone think such a
> thing might be useful in the standard xalan extension library? I'm not
> sure how many people might face a problem like this, but wouldn't be
> surprised to find this fairly often; it is the sort of thing that can
> happen when a database is exported as xml [which is basically what is
> happening here].
> 
> Any help will be much appreciated!
> 
> Regards,
> 
> Simon
> 
> PS: the current solution is (roughly):
> <xsl:for-each select="tag[not(@ref=preceding::@ref)]">
>  <xsl:for-each select="../tag[@ref=./@ref]">
>    ....
>  </xsl:for-each>
> </xsl:for-each>
> 
> or in words:
>  for each tag  [order N]
>    if the set of previous nodes with a matching tag is empty: [order
> N-squared]
>      // the current tag's ref hasn't been encountered before
>      scan the set of tags, and for each one with a ref matching the
> current tag [N-cubed]
>        // process
> 
> 
> 
> 
> 
-- 
Simon Kitching <si...@ecnetwork.co.nz>

Re: how best can tags be grouped?

Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.




This is a grouping problem, so it sounds like a job for the Muenchian
method:

   http://www.dpawson.co.uk/xsl/sect2/N4486.html

Dave



                                                                                                                                        
                      Simon Kitching                                                                                                    
                      <simon@ecnetwork         To:      xalan-j-users@xml.apache.org                                                    
                      .co.nz>                  cc:      (bcc: David N Bertoni/Cambridge/IBM)                                            
                                               Subject: how best can tags be grouped?                                                   
                      11/26/2002 04:53                                                                                                  
                      PM                                                                                                                
                                                                                                                                        



Hi all,

I hope that someone here can help me out with the following issue.

I have input which is something like this:

<tags-in>
  <tag ref="1" value="1"/>
  <tag ref="1" value="2"/>
  <tag ref="2" value="3"/>
  <tag ref="2" value="4"/>
</tags-in>

and need to generate something like this:

<tags-out>
  <group ref="1">
    <item value="1"/>
    <item value="2"/>
  </group>
  <group ref="2">
    <item value="3"/>
    <item value="4"/>
  </group>
</tags-out>

In other words, I need to identify each *unique* value of "ref", and
then process the set of <tag> nodes with the same ref value as a group.

I have an implementation which uses "preceding", but this algorithm is
order n-cubed :-(. And my real input may have quite a large number of
<tag> nodes. See the end of this email for the current algorithm..

Does anyone have a good suggestion for a better solution? I would like
to find a standard-compatible solution, but a xalan-specific one would
do. Maybe there's some trick that can be used with exslt extensions
"map" or "closure"?

One thought was to use <xsl:key>, which will nicely build a "hashtable"
with each entry being the set of matching nodes. However, the <xsl:key>
functionality doesn't seem to provide any way to iterate over the set of
unique keys.

In lieu of any other solution, I will probably implement a custom
extension to duplicate <xsl:key> (or extend it) with the additional
functionality to get the set of keys. It would then look something like:

<xsl:key nme="tag-by-ref" match="tag" use="@ref"/>

<xsl:for-each select="keyset-for-key('tag-by-ref')">
  <group ref="{.}">
    <xsl:for-each select="key('tag-by-ref', .)">
      <item value='{@value}'/>
    </xsl:for-each>
  </group>
</xsl:for-each>

Any better solutions than implementing a custom extension would be
welcome. I feel I may be missing something obvious...

If the best solution is a custom extension, does anyone think such a
thing might be useful in the standard xalan extension library? I'm not
sure how many people might face a problem like this, but wouldn't be
surprised to find this fairly often; it is the sort of thing that can
happen when a database is exported as xml [which is basically what is
happening here].

Any help will be much appreciated!

Regards,

Simon

PS: the current solution is (roughly):
<xsl:for-each select="tag[not(@ref=preceding::@ref)]">
 <xsl:for-each select="../tag[@ref=./@ref]">
   ....
 </xsl:for-each>
</xsl:for-each>

or in words:
 for each tag  [order N]
   if the set of previous nodes with a matching tag is empty: [order
N-squared]
     // the current tag's ref hasn't been encountered before
     scan the set of tags, and for each one with a ref matching the
current tag [N-cubed]
       // process