You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Sc...@lotus.com on 2000/03/17 22:48:04 UTC

Grouping and Summaries in XSLT

Over time I've been asked a lot about how to do grouping and summaries in
XSLT, and this came up the other day again, and I thought I would share the
results.  This is better a subject for the xsl list, but I thought it would
be useful to post it to the xalan list and the cocoon list.

You can do anything with XSLT, if you're warped enough.  That said, XSLT is
not particularly good at grouping at this time (this is one of the things
that will go into XSLT V2).

The only technique I know of to group without knowing the selection
criteria of the groups ahead of time, is to recurse into a named template
for each element passing in a string, add the found selection criteria to
the string via the concat function, and then test with the contains()
function if the selection criteria exists withing the string.  It's ugly
and a bit slow, but it works.  (This would be easy if you had for loops and
assignable variables in XSLT, but such is the design constraints of the
language...)

In the sample below, the user wanted to group the rows by server, and then
provide summaries for each month's memory consumption.   The transformation
is a useful example of how to do two-level grouping using the above
technique.  It would be nice to paramiterise the templates and simply put
them in a library, but that would be difficult, I think.  I'm sure the
named templates could be improved in a couple of ways for better
performance.  I'll be interested to see what ideas folks come up with.
BTW, you could also use extensions to do this, but the transformation below
is totally interoperable.

The input data:
<TABLE >
   <ROW>
       <fqhn  Value="pebbles"/>
       <mem_pct_used  Value="54"/>
       <date Value="20/02/00"/>
   </ROW>
   <ROW>
       <fqhn  Value="pebbles"/>
       <mem_pct_used  Value="11"/>
       <date  Value="21/02/00"/>
   </ROW>
   <ROW >
       <fqhn  Value="bambam"/>
       <mem_pct_used  Value="27"/>
       <date  Value="10/03/00"/>
   </ROW>
   <ROW >
       <fqhn  Value="pebbles"/>
       <mem_pct_used  Value="22"/>
       <date  Value="10/03/00"/>
   </ROW>
</TABLE>

And a stylsheet:
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

  <xsl:output indent="yes"/>

  <!-- Named recursive template that finds groups by fqhn and month -->
  <xsl:template name="group-by-fqhn-and-month">
    <xsl:param name="found-names"/>
    <xsl:param name="row-position"/>
    <xsl:param name="fqhn"/>

    <!-- if the string does not contain our fqhn/@Value, we found a first
match -->
    <xsl:for-each select="ROW[$row-position][fqhn/@Value=$fqhn]">
      <xsl:variable name="date-value" select="substring(date/@Value, 4,
2)"/>
      <xsl:if test="not(contains($found-names, $date-value))">
        <xsl:variable name="date-group" select="/TABLE/ROW[fqhn/@Value
=$fqhn and substring(date/@Value, 4, 2)=$date-value]"/>
        <Server_Performance_Info Month="{$date-value}/00" Sum="{sum
($date-group/mem_pct_used/@Value)}" />
      </xsl:if>
    </xsl:for-each>

    <!-- recurse to the next position, passing in the fqhn/@Value
concatenated to
         the found-names variable -->
    <xsl:if test="ROW[$row-position+1]">
      <xsl:variable name="month" select="substring(ROW
[$row-position][fqhn/@Value=$fqhn]/date/@Value, 4, 2)"/>
      <xsl:call-template name="group-by-fqhn-and-month">
        <xsl:with-param name="fqhn" select="$fqhn"/>
        <xsl:with-param name="row-position" select="$row-position+1"/>
        <xsl:with-param name="found-names" select="concat($found-names,
$month, '#')"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>


  <!-- Named recursive template that finds groups by fqhn -->
  <xsl:template name="group-by-fqhn">
    <xsl:param name="found-names"/>
    <xsl:param name="row-position"/>

    <!-- if the string does not contain our fqhn/@Value, we found a first
match -->
    <xsl:for-each select="ROW[$row-position]">
      <xsl:if test="not(contains($found-names, fqhn/@Value))">
        <Server>
          <xsl:variable name="fqhn-value" select="fqhn/@Value"/>
          <Server_Configuration_Info  Server_id="{$fqhn-value}"/>
          <!-- now select the rows that match our date -->
          <xsl:for-each select="/TABLE">
            <xsl:call-template name="group-by-fqhn-and-month">
              <xsl:with-param name="fqhn" select="$fqhn-value"/>
              <xsl:with-param name="row-position" select="1"/>
              <xsl:with-param name="found-names" select="''"/>
            </xsl:call-template>
          </xsl:for-each>
        </Server>
      </xsl:if>
    </xsl:for-each>

    <!-- recurse to the next position, passing in the fqhn/@Value
concatenated to
         the found-names variable -->
    <xsl:if test="ROW[$row-position+1]">
      <xsl:call-template name="group-by-fqhn">
        <xsl:with-param name="row-position" select="$row-position+1"/>
        <xsl:with-param name="found-names" select="concat($found-names, ROW
[$row-position]/fqhn/@Value, '#')"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

  <xsl:template match="/">
    <RESULT>
    <xsl:for-each select="TABLE">
      <xsl:call-template name="group-by-fqhn">
        <xsl:with-param name="row-position" select="1"/>
        <xsl:with-param name="found-names" select="''"/>
      </xsl:call-template>
    </xsl:for-each>
    </RESULT>
  </xsl:template>

</xsl:stylesheet>

will give you:

<?xml version="1.0" encoding="UTF-8"?>
<RESULT>
    <Server>
        <Server_Configuration_Info Server_id="pebbles"/>
        <Server_Performance_Info Sum="65" Month="02/00"/>
        <Server_Performance_Info Sum="22" Month="03/00"/>
    </Server>
    <Server>
        <Server_Configuration_Info Server_id="bambam"/>
        <Server_Performance_Info Sum="27" Month="03/00"/>
    </Server>
</RESULT>

-scott




Re: Grouping and Summaries in XSLT

Posted by John Prevost <pr...@maya.com>.
Scott_Boag@lotus.com writes:

> The only technique I know of to group without knowing the selection
> criteria of the groups ahead of time, is to recurse into a named template
> for each element passing in a string, add the found selection criteria to
> the string via the concat function, and then test with the contains()
> function if the selection criteria exists withing the string.  It's ugly
> and a bit slow, but it works.  (This would be easy if you had for loops and
> assignable variables in XSLT, but such is the design constraints of the
> language...)

There's a much much much easier way to do this.  It's a little odd to
the imperative-language trained mind, but it grows on you.  (Like a
fungus.)  Key idea: find a set of nodes which contains one item for
each "group" you want.  The item should allow you to get the group
it's associated with.  The easiest choice is usually "the first node
in the group", which translates (when you're working with non
pre-determined groups) to "every node for which no previous node has
the same 'grouping property'".)  I've done this with your example
below:

> In the sample below, the user wanted to group the rows by server, and then
> provide summaries for each month's memory consumption.   The transformation
> is a useful example of how to do two-level grouping using the above
> technique.  It would be nice to paramiterise the templates and simply put
> them in a library, but that would be difficult, I think.  I'm sure the
> named templates could be improved in a couple of ways for better
> performance.  I'll be interested to see what ideas folks come up with.
> BTW, you could also use extensions to do this, but the transformation below
> is totally interoperable.
> 
> The input data:
> <TABLE >
>    <ROW>
>        <fqhn  Value="pebbles"/>
>        <mem_pct_used  Value="54"/>
>        <date Value="20/02/00"/>
>    </ROW>
>    <ROW>
>        <fqhn  Value="pebbles"/>
>        <mem_pct_used  Value="11"/>
>        <date  Value="21/02/00"/>
>    </ROW>
>    <ROW >
>        <fqhn  Value="bambam"/>
>        <mem_pct_used  Value="27"/>
>        <date  Value="10/03/00"/>
>    </ROW>
>    <ROW >
>        <fqhn  Value="pebbles"/>
>        <mem_pct_used  Value="22"/>
>        <date  Value="10/03/00"/>
>    </ROW>
> </TABLE>

 <xsl:template match="TABLE">
  <RESULT>

   <xsl:for-each select="ROW
                           [not(fqhn/@Value =
                                previous-sibling::ROW/fqhn/@Value)]">
    <!-- Find the first occurence of each server name -->

    <Server>

     <xsl:variable name="server" select="fqhn/@Value"/>
     <!-- Remember it because we're lazy -->

     <Server_Configuration_Info Server_id="{$server}"/>

     <xsl:for-each
        select="../ROW
                   [fqhn/@Value = $server]
                   [not(substring(date/@Value, 4, 2) =
                        substring(previous-sibling::ROW[fqhn/@Value = $server]
                                        /date/@Value, 4, 2))]">
      <!-- find the first occurence of each month in the current server -->

      <xsl:variable name="month" select="substring(date/@Value, 4, 2)"/>
      <!-- remember it because we're lazy -->

      <Server_Performance_Info Month="{substring(date/@Value, 4, 2)}"
        Sum="{sum(../ROW[fqhn/@Value = $server]
                        [substring(date/@Value, 4, 2) = $month]
                        /mem_pct_used/@Value)}"/>
      <!-- output the sum over things with the same date and server -->

     </xsl:for-each>
    </Server>
   </xsl:for-each>
  </RESULT>
 </xsl:template>

 
> will give you:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <RESULT>
>     <Server>
>         <Server_Configuration_Info Server_id="pebbles"/>
>         <Server_Performance_Info Sum="65" Month="02/00"/>
>         <Server_Performance_Info Sum="22" Month="03/00"/>
>     </Server>
>     <Server>
>         <Server_Configuration_Info Server_id="bambam"/>
>         <Server_Performance_Info Sum="27" Month="03/00"/>
>     </Server>
> </RESULT>

Mine too.  (I tested it with Xalan and Cocoon while writing this mail.)

This can also be used to, in a really heinous way, do things like turn
data organized in "columns" at the top level into data organized in
"rows" (suitable for use in HTML.)  That is:

<row>
 <column>1A</column>
 <column>1B</column>
</row>
<row>
 <column>2A</column>
 <column>2B</column>
 <column>2C</column>
</row>

becomes

<column>
 <row>1A</row>
 <row>2A</row>
</column>
<column>
 <row>1B</row>
 <row>2B</row>
</column>
<column>
 <row/>
 <row>2C</row>
</column>

or vice-versa.

(I'll be happy to share the code with people if they think their
sanity will survive it.  I'll give you a hint: the first step is to
find the longest row.)


In any case--this needs to be fixed some day.  Neither Scott's method
or mine is terribly appealing to the well-balanced soul.

John.


Re: Grouping and Summaries in XSLT

Posted by John Prevost <pr...@maya.com>.
Scott_Boag@lotus.com writes:

> The only technique I know of to group without knowing the selection
> criteria of the groups ahead of time, is to recurse into a named template
> for each element passing in a string, add the found selection criteria to
> the string via the concat function, and then test with the contains()
> function if the selection criteria exists withing the string.  It's ugly
> and a bit slow, but it works.  (This would be easy if you had for loops and
> assignable variables in XSLT, but such is the design constraints of the
> language...)

There's a much much much easier way to do this.  It's a little odd to
the imperative-language trained mind, but it grows on you.  (Like a
fungus.)  Key idea: find a set of nodes which contains one item for
each "group" you want.  The item should allow you to get the group
it's associated with.  The easiest choice is usually "the first node
in the group", which translates (when you're working with non
pre-determined groups) to "every node for which no previous node has
the same 'grouping property'".)  I've done this with your example
below:

> In the sample below, the user wanted to group the rows by server, and then
> provide summaries for each month's memory consumption.   The transformation
> is a useful example of how to do two-level grouping using the above
> technique.  It would be nice to paramiterise the templates and simply put
> them in a library, but that would be difficult, I think.  I'm sure the
> named templates could be improved in a couple of ways for better
> performance.  I'll be interested to see what ideas folks come up with.
> BTW, you could also use extensions to do this, but the transformation below
> is totally interoperable.
> 
> The input data:
> <TABLE >
>    <ROW>
>        <fqhn  Value="pebbles"/>
>        <mem_pct_used  Value="54"/>
>        <date Value="20/02/00"/>
>    </ROW>
>    <ROW>
>        <fqhn  Value="pebbles"/>
>        <mem_pct_used  Value="11"/>
>        <date  Value="21/02/00"/>
>    </ROW>
>    <ROW >
>        <fqhn  Value="bambam"/>
>        <mem_pct_used  Value="27"/>
>        <date  Value="10/03/00"/>
>    </ROW>
>    <ROW >
>        <fqhn  Value="pebbles"/>
>        <mem_pct_used  Value="22"/>
>        <date  Value="10/03/00"/>
>    </ROW>
> </TABLE>

 <xsl:template match="TABLE">
  <RESULT>

   <xsl:for-each select="ROW
                           [not(fqhn/@Value =
                                previous-sibling::ROW/fqhn/@Value)]">
    <!-- Find the first occurence of each server name -->

    <Server>

     <xsl:variable name="server" select="fqhn/@Value"/>
     <!-- Remember it because we're lazy -->

     <Server_Configuration_Info Server_id="{$server}"/>

     <xsl:for-each
        select="../ROW
                   [fqhn/@Value = $server]
                   [not(substring(date/@Value, 4, 2) =
                        substring(previous-sibling::ROW[fqhn/@Value = $server]
                                        /date/@Value, 4, 2))]">
      <!-- find the first occurence of each month in the current server -->

      <xsl:variable name="month" select="substring(date/@Value, 4, 2)"/>
      <!-- remember it because we're lazy -->

      <Server_Performance_Info Month="{substring(date/@Value, 4, 2)}"
        Sum="{sum(../ROW[fqhn/@Value = $server]
                        [substring(date/@Value, 4, 2) = $month]
                        /mem_pct_used/@Value)}"/>
      <!-- output the sum over things with the same date and server -->

     </xsl:for-each>
    </Server>
   </xsl:for-each>
  </RESULT>
 </xsl:template>

 
> will give you:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <RESULT>
>     <Server>
>         <Server_Configuration_Info Server_id="pebbles"/>
>         <Server_Performance_Info Sum="65" Month="02/00"/>
>         <Server_Performance_Info Sum="22" Month="03/00"/>
>     </Server>
>     <Server>
>         <Server_Configuration_Info Server_id="bambam"/>
>         <Server_Performance_Info Sum="27" Month="03/00"/>
>     </Server>
> </RESULT>

Mine too.  (I tested it with Xalan and Cocoon while writing this mail.)

This can also be used to, in a really heinous way, do things like turn
data organized in "columns" at the top level into data organized in
"rows" (suitable for use in HTML.)  That is:

<row>
 <column>1A</column>
 <column>1B</column>
</row>
<row>
 <column>2A</column>
 <column>2B</column>
 <column>2C</column>
</row>

becomes

<column>
 <row>1A</row>
 <row>2A</row>
</column>
<column>
 <row>1B</row>
 <row>2B</row>
</column>
<column>
 <row/>
 <row>2C</row>
</column>

or vice-versa.

(I'll be happy to share the code with people if they think their
sanity will survive it.  I'll give you a hint: the first step is to
find the longest row.)


In any case--this needs to be fixed some day.  Neither Scott's method
or mine is terribly appealing to the well-balanced soul.

John.