You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by John English <jo...@gmail.com> on 2012/11/21 15:14:11 UTC

Pivoting tables?

I have a situation where I have a table listing users and products and 
associated values:

   USER   PRODUCT   VALUE
   abc    xyz       3
   def    ghi       5
   def    xyz       7

and I want to pivot this to display it with a column for each product 
like so:

   USER   ghi  xyz
   abc         3
   def    5    7

This means that the columns I have depend on the product list, which 
changes pretty regularly (at least at certain times) and they also 
depend on which department you're visiting (each has a different product 
list). At the moment I use a temporary table:

     Object lock = null;
     synchronized (state.tempTables) { // "state" is from the HttpSession
       if (state.tempTables.get("products") == null) {
         state.tempTables.put("products",new Object());
       }
       lock = state.tempTables.get("products");
     }
     synchronized (lock) {
       // start transaction
       // drop the temporary table if it exists
       // create the temporary table
       // select rows from the real table
       while (res.next()) {
         // insert into temporary table
       }
       // commit transaction
       // display the temporary table
     }

This is ugly and slow, but I've been unable to come up with a better 
way. The table is dropped at the start rather than at the end because 
the user might choose to download it as CSV, so it's left in existence 
after it's displayed in case it's needed for this purpose.

I thought about using a table function, but again the column list is 
fixed when the function is defined.

Does anyone have any ideas what else I could try? Or is there anything 
in the pipeline for a future version that might be relevant?

TIA,
-- 
John English

Re: AW: AW: Pivoting tables?

Posted by John English <jo...@gmail.com>.

On 04/12/2012 20:46, John English wrote:
> I'm going to try using a user-defined function to format the name and thus avoid
> the extra GROUP BY elements (which are my immediate suspect for the performance
> hit), but does anyone have any other suggestions for improving on this solution?

Just to provide an ending to this story: The performance hit turned out to be 
the join on the users table (where the username, a varchar(15), is for 
historical reasons the primary key, and it's much too late to do anything about 
it). A function to look up the username and format it is about 10x faster than 
the join, so now it takes between 100ms and 700ms to display a sample table as 
opposed to 6500ms to 8500ms with a join (where the variation is presumably 
caused by server latency).

Again, thanks to Malte for pointing me at the solution, and to Rick for 
providing food for future thought. And now I need to look for other places where 
I can avoid a join on the users table...
-- 
John English

Re: AW: AW: Pivoting tables?

Posted by John English <jo...@gmail.com>.

On 28/11/2012 16:30, Malte.Kempff@de.equens.com wrote:
> Hi John, It sounds to me that your first solution could work flexible enough,
> while the other might have lecks of flexibility, haven't they? I am not sure
> how much you like to studdy and translate it in derby and Java procedures in
> the example of the page from last answer.

I've finally got my head around the various options, and rather than wait for 
10.10 so I can experiment with user-defined aggregates, I've cobbled together 
something based on what you suggested. I programmatically build a query and use 
it as the table to be displayed by my monstrous showTable() method:

   "(SELECT username,SUM(" + total + ") AS total" + values +
   " FROM (SELECT username" + cases + " FROM products WHERE dept=?) AS s" +
   " GROUP BY username) AS t"

where the variables look like this when the items have IDs 53, 75 and 82:

   cases:  ",CASE WHEN itemid=53 THEN cost ELSE NULL END AS c53" +
           ",CASE WHEN itemid=53 THEN discounted ELSE 0 END AS d53" +
           ",CASE WHEN itemid=75 THEN cost ELSE NULL END AS c75" +
           ",CASE WHEN itemid=75 THEN discounted ELSE 0 END AS d75" +
           ",CASE WHEN itemid=82 THEN cost ELSE NULL END AS c82" +
           ",CASE WHEN itemid=82 THEN discounted ELSE 0 END AS d82"

   total:  "d53+d75+d82"

   values: ",SUM(c53) AS v53,SUM(c75) AS v75,SUM(c82) AS v82"

This speeds things up by a factor of somewhere between 20x and 50x compared to 
the temporary table approach, which is absolutely wonderful! (The code is still 
totally opaque, but you can't have everything...)

However, if I try to format the username nicely by pulling the user's surname 
and initials out of the users table, I end up with this:

   "(SELECT surname||','||initials,username,SUM(" + total + ") AS total" + values +
   " FROM (SELECT username" + cases + " FROM products WHERE dept=?) AS s,users" +
   " WHERE s.username=users.username  GROUP BY users.username,surname,initials) 
AS t"

which slows things down again by about 15x. In the temporary table approach I 
was putting the formatted name into the table as I built it, so there wasn't 
much extra overhead.

I'm going to try using a user-defined function to format the name and thus avoid 
the extra GROUP BY elements (which are my immediate suspect for the performance 
hit), but does anyone have any other suggestions for improving on this solution?

Thanks for all the help!
-- 
John English

AW: AW: Pivoting tables?

Posted by Ma...@de.equens.com.

Hi John,
It sounds to me that your first solution could work flexible enough, while the other might have lecks of flexibility, haven't they?
I am not sure how much you like to studdy and translate it in derby and Java procedures in the example of the page from last answer.
http://www.simple-talk.com/sql/t-sql-programming/creating-cross-tab-queries-and-pivot-tables-in-sql/
http://www.simple-talk.com/code/CrossTab/sys_CrossTab.txt

Malte

-----Ursprüngliche Nachricht-----
Von: John English [mailto:john.foreign@gmail.com] 
Gesendet: Mittwoch, 28. November 2012 11:29
An: derby-user@db.apache.org
Betreff: Re: AW: Pivoting tables?

On 21/11/2012 16:40, Malte.Kempff@de.equens.com wrote:
> Hi John,
> I think a smart SQL-Statement schould do.
> OK that would probably a bit inflxible, since you have a lot of changes concerning users and products.
> But you could may be write a SQL-Generating routine that takes care of it.

I've been playing around a bit with this; it helps, but doesn't quite hit the spot for me. So here I am again...

The problem for me is not so much gathering the results as presenting them. I have a monstrous method to display a table (or view), which builds the SQL query and appends WHERE and ORDER BY clauses as needed. 
This works fine, for cases where I can define a table or a view of a table, which means a fixed set of columns. I use temporary tables in the cases where the table has to be generated dynamically (when I want to pivot rows into columns), but as I said before, this is slow and ugly.

So apart from a temporary table, I thought about these possible solutions:

1) Create a real table every time a department is created and then use triggers to add and drop columns as products are added and removed, and to copy values when the "main" table is updated. The set of departments changes relatively slowly, but it does change; the products go through flurries of activity. I'm not sure how practical this is.

2) Use a function to return a virtual table. The problem here is that the table format is fixed when the function is declared. I could perhaps declare functions that return tables with maybe 100 columns and then hope that no-one ever needs 101 products, but this isn't very satisfactory either.

3) I've just been reading about procedures that return result sets, but then the problem is that I can't decorate the query (a CALL rather than a SELECT) with WHERE and ORDER BY clauses.

So for now, I'm still stuck.

Your suggestions and advice gratefully appreciated!
--
John English

Re: AW: Pivoting tables?

Posted by Rick Hillegas <ri...@oracle.com>.

On 11/29/12 1:23 AM, John English wrote:
> On 28/11/2012 23:31, Rick Hillegas wrote:
>> Hi John,
>>
>> You may be able to press user-defined aggregates into service here. They
>> are a new feature available in the development trunk and slated for
>> release 10.10 next year. To demonstrate how to do this, I'm enclosing an
>> ij script plus two Java classes which support the script.
>
> Hmm, looks interesting!
>
>> Hope this helps,
>
> Me too, but I need to spend some time digesting your example...
A couple notes so that the examples make more sense:

1) A pivot table is a lot like a grouped aggregate. Rows with a common 
key end up being grouped together into a single composite row.

2) More information on user-defined aggregates can be found in the 
section titled "Programming user-defined aggregates" here: 
http://db.apache.org/derby/docs/dev/devguide/

3) This user-defined aggregate operates on two user-defined types. The 
types probably deserve better names.

4) The first type (NamedInt) represents what will end up being a column 
in the result row. The NamedInt binds a column name to an int value.

5) The second type (HashMap) is just a bag of these columns. It ends up 
being the contents of the pivoted row.

6) Some user-defined functions have to be created in order to create and 
inspect the user-defined types.

7) The NamedIntAggregator is passed a stream of columns, which it puts 
into the result row.

8) The query at the end of the ij script puts it all together: The inner 
query constructs the result rows, but they are represented as HashMaps. 
The outer query unpacks the HashMaps into displayable rows.

Hope this is useful,
-Rick

>
> Many thanks for this,

Re: AW: Pivoting tables?

Posted by John English <jo...@gmail.com>.

On 28/11/2012 23:31, Rick Hillegas wrote:
> Hi John,
>
> You may be able to press user-defined aggregates into service here. They
> are a new feature available in the development trunk and slated for
> release 10.10 next year. To demonstrate how to do this, I'm enclosing an
> ij script plus two Java classes which support the script.

Hmm, looks interesting!

> Hope this helps,

Me too, but I need to spend some time digesting your example...

Many thanks for this,
-- 
John English

Re: AW: Pivoting tables?

Posted by Rick Hillegas <ri...@oracle.com>.

Hi John,

You may be able to press user-defined aggregates into service here. They 
are a new feature available in the development trunk and slated for 
release 10.10 next year. To demonstrate how to do this, I'm enclosing an 
ij script plus two Java classes which support the script.

Hope this helps,
-Rick

---------------

First the script:

connect 'jdbc:derby:memory:db;create=true';

create table products( userName varchar( 10 ), product varchar( 10 ), 
quantity int );
insert into products values ( 'me', 'abc', 3 ), ( 'you', 'abc', 5 ), ( 
'you', 'def', 7 );
select * from products;

create type HashMap external name 'java.util.HashMap' language java;
create type NamedInt external name 'NamedInt' language java;

create function nameInt( n varchar( 10 ), v int ) returns NamedInt
language java parameter style java no sql
external name 'NamedInt.nameInt';

create function getInt( n varchar( 10 ), h HashMap ) returns int
language java parameter style java no sql
external name 'NamedInt.getInt';

create derby aggregate nia for NamedInt returns HashMap external name 
'NamedIntAggregator';

-- pivot the table
select userName,
     getInt( 'abc', nia ) "abc",
     getInt( 'def', nia ) "def"
from
(
     select userName, nia( nameInt( product, quantity ) ) nia
     from products
     group by userName
) s
;

Now the two classes:

public class NamedInt   implements java.io.Serializable
{
     public  final   String  name;
     public  final   int     value;
     private NamedInt( String n, int v ) { name = n; value = v; }
     public  static  NamedInt    nameInt( String n, int v ) { return new 
NamedInt( n, v ); }
     public  static  Integer getInt( String n, 
java.util.HashMap<String,Integer> h ) { return  h.get( n ); }
}

... and

import java.util.HashMap;
public  class   NamedIntAggregator    implements  
org.apache.derby.agg.Aggregator<NamedInt,HashMap,NamedIntAggregator>
{
     private HashMap<String,Integer>  _accumulator;
     public  NamedIntAggregator() {}
     public  void    init() {  _accumulator = new 
HashMap<String,Integer>(); }
     public  void    accumulate( NamedInt ni ) { _accumulator.put( 
ni.name, ni.value ); }
     public  void    merge( NamedIntAggregator otherAggregator ) { 
_accumulator.putAll( otherAggregator._accumulator ); }
     public  HashMap terminate() { return _accumulator; }
}

Re: AW: Pivoting tables?

Posted by John English <jo...@gmail.com>.

On 21/11/2012 16:40, Malte.Kempff@de.equens.com wrote:
> Hi John,
> I think a smart SQL-Statement schould do.
> OK that would probably a bit inflxible, since you have a lot of changes concerning users and products.
> But you could may be write a SQL-Generating routine that takes care of it.

I've been playing around a bit with this; it helps, but doesn't quite 
hit the spot for me. So here I am again...

The problem for me is not so much gathering the results as presenting 
them. I have a monstrous method to display a table (or view), which 
builds the SQL query and appends WHERE and ORDER BY clauses as needed. 
This works fine, for cases where I can define a table or a view of a 
table, which means a fixed set of columns. I use temporary tables in
the cases where the table has to be generated dynamically (when I want
to pivot rows into columns), but as I said before, this is slow and ugly.

So apart from a temporary table, I thought about these possible solutions:

1) Create a real table every time a department is created and then use 
triggers to add and drop columns as products are added and removed, and 
to copy values when the "main" table is updated. The set of departments 
changes relatively slowly, but it does change; the products go through 
flurries of activity. I'm not sure how practical this is.

2) Use a function to return a virtual table. The problem here is that 
the table format is fixed when the function is declared. I could perhaps 
declare functions that return tables with maybe 100 columns and then 
hope that no-one ever needs 101 products, but this isn't very 
satisfactory either.

3) I've just been reading about procedures that return result sets, but 
then the problem is that I can't decorate the query (a CALL rather than 
a SELECT) with WHERE and ORDER BY clauses.

So for now, I'm still stuck.

Your suggestions and advice gratefully appreciated!
-- 
John English

Re: AW: Pivoting tables?

Posted by John English <jo...@gmail.com>.

On 21/11/2012 16:40, Malte.Kempff@de.equens.com wrote:
> Hi John,
> I think a smart SQL-Statement schould do.
> OK that would probably a bit inflxible, since you have a lot of changes concerning users and products.
> But you could may be write a SQL-Generating routine that takes care of it.

I have to do that anyway since the query to diplay the table changes 
with the table layout, and all the temporary table stuff involves 
creating lots of SQL dynamically.

> I googled a bit and found examples (for other database-systems) but a bit modified it should work also for derby.
> Maybe with derby's case expression it could work.
> Here an example function.
> [...snip...]
> Found in http://www.simple-talk.com/sql/t-sql-programming/creating-cross-tab-queries-and-pivot-tables-in-sql/

Thanks, it looks promising; I'll have a long think about this!
-- 
John English

AW: Pivoting tables?

Posted by Ma...@de.equens.com.

 
Hi John,
I think a smart SQL-Statement schould do.
OK that would probably a bit inflxible, since you have a lot of changes concerning users and products.
But you could may be write a SQL-Generating routine that takes care of it.

I googled a bit and found examples (for other database-systems) but a bit modified it should work also for derby.
Maybe with derby's case expression it could work.
Here an example 
 function.
SELECT      YEAR(ord.OrderDate) YEAR, 
            SUM(CASE prod.CategoryID WHEN 1 THEN
                    det.UnitPrice * det.Quantity ELSE 0 END) Beverages,
            SUM(CASE prod.CategoryID WHEN 2 THEN
                    det.UnitPrice * det.Quantity ELSE 0 END) Condiments,
            SUM(CASE prod.CategoryID WHEN 3 THEN
                    det.UnitPrice * det.Quantity ELSE 0 END) Confections,
            SUM(CASE prod.CategoryID WHEN 4 THEN
                    det.UnitPrice * det.Quantity ELSE 0 END) [Dairy Products],
            SUM(CASE prod.CategoryID WHEN 5 THEN
                    det.UnitPrice * det.Quantity ELSE 0 END) [Grains/Cereals],
            SUM(CASE prod.CategoryID WHEN 6 THEN   
                    det.UnitPrice * det.Quantity ELSE 0 END) [Meat/Poultry],
            SUM(CASE prod.CategoryID WHEN 7 THEN
                    det.UnitPrice * det.Quantity ELSE 0 END) Produce,
            SUM(CASE prod.CategoryID WHEN 8 THEN
                    det.UnitPrice * det.Quantity ELSE 0 END) Seafood

FROM        Orders ord

INNER JOIN  [Order Details] det
ON          det.OrderID = ord.OrderID

INNER JOIN  Products prod
ON          prod.ProductID = det.ProductID

GROUP BY    YEAR(ord.OrderDate)

ORDER BY    YEAR(ord.OrderDate)

Found in http://www.simple-talk.com/sql/t-sql-programming/creating-cross-tab-queries-and-pivot-tables-in-sql/





-----Ursprüngliche Nachricht-----
Von: John English [mailto:john.foreign@gmail.com] 
Gesendet: Mittwoch, 21. November 2012 15:14
An: Derby Discussion
Betreff: Pivoting tables?

I have a situation where I have a table listing users and products and associated values:

   USER   PRODUCT   VALUE
   abc    xyz       3
   def    ghi       5
   def    xyz       7

and I want to pivot this to display it with a column for each product like so:

   USER   ghi  xyz
   abc         3
   def    5    7

This means that the columns I have depend on the product list, which changes pretty regularly (at least at certain times) and they also depend on which department you're visiting (each has a different product list). At the moment I use a temporary table:

     Object lock = null;
     synchronized (state.tempTables) { // "state" is from the HttpSession
       if (state.tempTables.get("products") == null) {
         state.tempTables.put("products",new Object());
       }
       lock = state.tempTables.get("products");
     }
     synchronized (lock) {
       // start transaction
       // drop the temporary table if it exists
       // create the temporary table
       // select rows from the real table
       while (res.next()) {
         // insert into temporary table
       }
       // commit transaction
       // display the temporary table
     }

This is ugly and slow, but I've been unable to come up with a better way. The table is dropped at the start rather than at the end because the user might choose to download it as CSV, so it's left in existence after it's displayed in case it's needed for this purpose.

I thought about using a table function, but again the column list is fixed when the function is defined.

Does anyone have any ideas what else I could try? Or is there anything in the pipeline for a future version that might be relevant?

TIA,
--
John English