You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2010/09/18 12:28:07 UTC

Derby SQL ideas needed

Hi Folks,

For two of the report queries, ACF uses the following Postgresql
construct, which sadly seems to have no Derby equivalent:

SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
FROM (...) t3

In Postgresql, what this does is to return the FIRST entire row
matching each distinct idbucket result.  If Derby had a "FIRST()"
aggregate function, it would be the equivalent of:

SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
AS endtime FROM (...) t3 GROUP BY t3.bucket

Unfortunately, Derby has no such aggregate function.  Furthermore, it
would not be ideal if I were to do the work myself in ACF, because
this is a resultset that needs to be paged through with offset and
length, for presentation to the user and sorting, so it gets wrapped
in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
that does that part.

Does anyone have any ideas and/or Derby contacts?  I'd really like the
quick-start example to have a functional set of reports.

Karl

Re: Derby SQL ideas needed

Posted by Karl Wright <da...@gmail.com>.
The Derby table-result function syntax requires all output columns to
be declared as part of the function definition, and more importantly
it does not seem to allow calls into Derby itself to get results.  So
this would not seem to be a viable option for that reason.

Back to square 1, I guess.  Derby doesn't seem to allow any way to
declare aggregate functions either, so I couldn't declare a FIRST()
aggregate method as proposed below.  Simple arithmetic functions seem
like they would work, but that's not helpful here.

Karl



On Sat, Sep 18, 2010 at 6:45 AM, Karl Wright <da...@gmail.com> wrote:
> For what it's worth, defining a Derby function seems like the only way
> to do it.  These seem to call arbitrary java that can accept a query
> as an argument and return a resultset as the result.  But in order to
> write such a thing I will need the ability to call Derby at a java
> level, I think, rather than through JDBC.  Still looking for a good
> example from somebody who has done something similar.
>
> Karl
>
> On Sat, Sep 18, 2010 at 6:28 AM, Karl Wright <da...@gmail.com> wrote:
>> Hi Folks,
>>
>> For two of the report queries, ACF uses the following Postgresql
>> construct, which sadly seems to have no Derby equivalent:
>>
>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>> FROM (...) t3
>>
>> In Postgresql, what this does is to return the FIRST entire row
>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>> aggregate function, it would be the equivalent of:
>>
>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>
>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>> would not be ideal if I were to do the work myself in ACF, because
>> this is a resultset that needs to be paged through with offset and
>> length, for presentation to the user and sorting, so it gets wrapped
>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>> that does that part.
>>
>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>> quick-start example to have a functional set of reports.
>>
>> Karl
>>
>

Re: Derby SQL ideas needed

Posted by Karl Wright <da...@gmail.com>.
For what it's worth, defining a Derby function seems like the only way
to do it.  These seem to call arbitrary java that can accept a query
as an argument and return a resultset as the result.  But in order to
write such a thing I will need the ability to call Derby at a java
level, I think, rather than through JDBC.  Still looking for a good
example from somebody who has done something similar.

Karl

On Sat, Sep 18, 2010 at 6:28 AM, Karl Wright <da...@gmail.com> wrote:
> Hi Folks,
>
> For two of the report queries, ACF uses the following Postgresql
> construct, which sadly seems to have no Derby equivalent:
>
> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
> FROM (...) t3
>
> In Postgresql, what this does is to return the FIRST entire row
> matching each distinct idbucket result.  If Derby had a "FIRST()"
> aggregate function, it would be the equivalent of:
>
> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
> AS endtime FROM (...) t3 GROUP BY t3.bucket
>
> Unfortunately, Derby has no such aggregate function.  Furthermore, it
> would not be ideal if I were to do the work myself in ACF, because
> this is a resultset that needs to be paged through with offset and
> length, for presentation to the user and sorting, so it gets wrapped
> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
> that does that part.
>
> Does anyone have any ideas and/or Derby contacts?  I'd really like the
> quick-start example to have a functional set of reports.
>
> Karl
>

Re: Derby SQL ideas needed

Posted by Alexey Serba <as...@gmail.com>.
You can also try ORDER BY bytecount DESC LIMIT 1 instead of aggregate
function max, i.e.

SELECT
    t1.bucket, t1.bytecount, t1.windowstart, t1.windowend
FROM
    (xxx) t1
WHERE
    t1.bytecount=( SELECT t2.bytecount FROM (xxx) t2 WHERE t2.bucket =
t1.bucket ORDER BY t2.bytecount DESC LIMIT 1 )

On Sun, Sep 19, 2010 at 9:07 PM, Karl Wright <da...@gmail.com> wrote:
> Looking at your proposal:
>
> SELECT
>   bucket, primary_key, windowstart, etc
> FROM
>   table AS t1
> WHERE
>   windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
> bucket = t1.bucket )
>
> ... we'd be looking actually for something more like this:
>
>
> SELECT
>   t1.bucket, t1.bytecount, t1.windowstart, t1.windowend
> FROM
>   (xxx) t1
> WHERE
>   t1.bytecount=( SELECT max(t2.bytecount) FROM (xxx) t2 WHERE
> t2.bucket = t1.bucket )
>
> ... although I've never seen the =(SELECT...) structure before.
>
> Karl
>
>
> On Sun, Sep 19, 2010 at 12:48 PM, Karl Wright <da...@gmail.com> wrote:
>> Here you go:
>>
>>    // The query we will generate here looks like this:
>>    // SELECT *
>>    //   FROM
>>    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
>> t3.bytecount AS bytecount,
>>    //                               t3.windowstart AS starttime,
>> t3.windowend AS endtime
>>    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
>> t0.starttime AS windowstart, t0.starttime + <interval> AS windowend,
>>    //                   SUM(t1.datasize * ((case when t0.starttime +
>> <interval> < t1.endtime then t0.starttime + <interval> else t1.endtime
>> end) -
>>    //                     (case when t0.starttime>t1.starttime then
>> t0.starttime else t1.starttime end))
>>    //                      / (t1.endtime - t1.starttime)) AS bytecount
>>    //                   FROM (SELECT DISTINCT substring(entityid from
>> '<bucketregexp>') AS bucket, starttime FROM repohistory WHERE
>> <criteria>) t0, repohistory t1
>>    //                   WHERE t0.bucket=substring(t1.entityid from
>> '<bucket_regexp>')
>>    //                      AND t1.starttime < t0.starttime +
>> <interval> AND t1.endtime > t0.starttime
>>    //                      AND <criteria on t1>
>>    //                          GROUP BY bucket,windowstart,windowend
>>    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
>> <interval> AS windowstart, t0a.endtime AS windowend,
>>    //                   SUM(t1a.datasize * ((case when t0a.endtime <
>> t1a.endtime then t0a.endtime else t1a.endtime end) -
>>    //                     (case when t0a.endtime - <interval> >
>> t1a.starttime then t0a.endtime - <interval> else t1a.starttime end))
>>    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
>>    //                   FROM (SELECT DISTINCT substring(entityid from
>> '<bucketregexp>') AS bucket, endtime FROM repohistory WHERE
>> <criteria>) t0a, repohistory t1a
>>    //                   WHERE t0a.bucket=substring(t1a.entityid from
>> '<bucket_regexp>')
>>    //                      AND (t1a.starttime < t0a.endtime AND
>> t1a.endtime > t0a.endtime - <interval>
>>    //                      AND <criteria on t1a>
>>    //                          GROUP BY bucket,windowstart,windowend) t2
>>    //                              ORDER BY bucket ASC,bytecount
>> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;
>>
>> I have low confidence that ANY planner would be able to locate the
>> common part of a 2x larger query and not do it twice.
>>
>> Karl
>>
>>
>>
>> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba <as...@gmail.com> wrote:
>>>> The other thing is that we cannot afford to use the same "table"
>>>> twice, as it is actually an extremely expensive query in its own
>>>> right, with multiple joins, select distinct's, etc. under the covers.
>>> Even if you create indexes on bucket and activitycount columns? It
>>> might be that the query plans for these two queries (with "distinct
>>> on" hack and subquery max/subquery order limit/join) would be the
>>> same.
>>>
>>>> I'd be happy to post it but it may shock you. ;-)
>>> The way I indent SQL queries should say that I'm not afraid of
>>> multipage queries :)
>>>
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba <as...@gmail.com> wrote:
>>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3
>>>>> Do you have primary key in your t3 table?
>>>>>
>>>>>> In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result.
>>>>> FIRST based on which sort?
>>>>>
>>>>> Lets say you want to return FIRST row based on t3.windowstart column
>>>>> and you have primary key in t3 table. Then I believe your query can be
>>>>> rewritten in the following ways:
>>>>>
>>>>> 1. Using subqueries
>>>>> SELECT
>>>>>    bucket, primary_key, windowstart, etc
>>>>> FROM
>>>>>    table AS t1
>>>>> WHERE
>>>>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>>>>> bucket = t1.bucket )
>>>>>
>>>>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>>>>> subqueries - not sure about that )
>>>>> SELECT
>>>>>    t1.bucket, t1.primary_key, windowstart, etc
>>>>> FROM
>>>>>    table AS t1
>>>>>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
>>>>> t2.windowstart > t1.windowstart )
>>>>> WHERE
>>>>>    t2.primary_key IS NULL
>>>>>
>>>>> HTH,
>>>>> Alex
>>>>>
>>>>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>> Hi Folks,
>>>>>>
>>>>>> For two of the report queries, ACF uses the following Postgresql
>>>>>> construct, which sadly seems to have no Derby equivalent:
>>>>>>
>>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>>>>>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>>>>>> FROM (...) t3
>>>>>>
>>>>>> In Postgresql, what this does is to return the FIRST entire row
>>>>>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>>>>>> aggregate function, it would be the equivalent of:
>>>>>>
>>>>>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>>>>>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>>>>>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>>>>>
>>>>>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>>>>>> would not be ideal if I were to do the work myself in ACF, because
>>>>>> this is a resultset that needs to be paged through with offset and
>>>>>> length, for presentation to the user and sorting, so it gets wrapped
>>>>>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>>>>>> that does that part.
>>>>>>
>>>>>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>>>>>> quick-start example to have a functional set of reports.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Derby SQL ideas needed

Posted by Karl Wright <da...@gmail.com>.
Looking at your proposal:

SELECT
   bucket, primary_key, windowstart, etc
FROM
   table AS t1
WHERE
   windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
bucket = t1.bucket )

... we'd be looking actually for something more like this:


SELECT
   t1.bucket, t1.bytecount, t1.windowstart, t1.windowend
FROM
   (xxx) t1
WHERE
   t1.bytecount=( SELECT max(t2.bytecount) FROM (xxx) t2 WHERE
t2.bucket = t1.bucket )

... although I've never seen the =(SELECT...) structure before.

Karl


On Sun, Sep 19, 2010 at 12:48 PM, Karl Wright <da...@gmail.com> wrote:
> Here you go:
>
>    // The query we will generate here looks like this:
>    // SELECT *
>    //   FROM
>    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
> t3.bytecount AS bytecount,
>    //                               t3.windowstart AS starttime,
> t3.windowend AS endtime
>    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
> t0.starttime AS windowstart, t0.starttime + <interval> AS windowend,
>    //                   SUM(t1.datasize * ((case when t0.starttime +
> <interval> < t1.endtime then t0.starttime + <interval> else t1.endtime
> end) -
>    //                     (case when t0.starttime>t1.starttime then
> t0.starttime else t1.starttime end))
>    //                      / (t1.endtime - t1.starttime)) AS bytecount
>    //                   FROM (SELECT DISTINCT substring(entityid from
> '<bucketregexp>') AS bucket, starttime FROM repohistory WHERE
> <criteria>) t0, repohistory t1
>    //                   WHERE t0.bucket=substring(t1.entityid from
> '<bucket_regexp>')
>    //                      AND t1.starttime < t0.starttime +
> <interval> AND t1.endtime > t0.starttime
>    //                      AND <criteria on t1>
>    //                          GROUP BY bucket,windowstart,windowend
>    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
> <interval> AS windowstart, t0a.endtime AS windowend,
>    //                   SUM(t1a.datasize * ((case when t0a.endtime <
> t1a.endtime then t0a.endtime else t1a.endtime end) -
>    //                     (case when t0a.endtime - <interval> >
> t1a.starttime then t0a.endtime - <interval> else t1a.starttime end))
>    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
>    //                   FROM (SELECT DISTINCT substring(entityid from
> '<bucketregexp>') AS bucket, endtime FROM repohistory WHERE
> <criteria>) t0a, repohistory t1a
>    //                   WHERE t0a.bucket=substring(t1a.entityid from
> '<bucket_regexp>')
>    //                      AND (t1a.starttime < t0a.endtime AND
> t1a.endtime > t0a.endtime - <interval>
>    //                      AND <criteria on t1a>
>    //                          GROUP BY bucket,windowstart,windowend) t2
>    //                              ORDER BY bucket ASC,bytecount
> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;
>
> I have low confidence that ANY planner would be able to locate the
> common part of a 2x larger query and not do it twice.
>
> Karl
>
>
>
> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba <as...@gmail.com> wrote:
>>> The other thing is that we cannot afford to use the same "table"
>>> twice, as it is actually an extremely expensive query in its own
>>> right, with multiple joins, select distinct's, etc. under the covers.
>> Even if you create indexes on bucket and activitycount columns? It
>> might be that the query plans for these two queries (with "distinct
>> on" hack and subquery max/subquery order limit/join) would be the
>> same.
>>
>>> I'd be happy to post it but it may shock you. ;-)
>> The way I indent SQL queries should say that I'm not afraid of
>> multipage queries :)
>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba <as...@gmail.com> wrote:
>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3
>>>> Do you have primary key in your t3 table?
>>>>
>>>>> In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result.
>>>> FIRST based on which sort?
>>>>
>>>> Lets say you want to return FIRST row based on t3.windowstart column
>>>> and you have primary key in t3 table. Then I believe your query can be
>>>> rewritten in the following ways:
>>>>
>>>> 1. Using subqueries
>>>> SELECT
>>>>    bucket, primary_key, windowstart, etc
>>>> FROM
>>>>    table AS t1
>>>> WHERE
>>>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>>>> bucket = t1.bucket )
>>>>
>>>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>>>> subqueries - not sure about that )
>>>> SELECT
>>>>    t1.bucket, t1.primary_key, windowstart, etc
>>>> FROM
>>>>    table AS t1
>>>>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
>>>> t2.windowstart > t1.windowstart )
>>>> WHERE
>>>>    t2.primary_key IS NULL
>>>>
>>>> HTH,
>>>> Alex
>>>>
>>>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <da...@gmail.com> wrote:
>>>>> Hi Folks,
>>>>>
>>>>> For two of the report queries, ACF uses the following Postgresql
>>>>> construct, which sadly seems to have no Derby equivalent:
>>>>>
>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>>>>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>>>>> FROM (...) t3
>>>>>
>>>>> In Postgresql, what this does is to return the FIRST entire row
>>>>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>>>>> aggregate function, it would be the equivalent of:
>>>>>
>>>>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>>>>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>>>>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>>>>
>>>>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>>>>> would not be ideal if I were to do the work myself in ACF, because
>>>>> this is a resultset that needs to be paged through with offset and
>>>>> length, for presentation to the user and sorting, so it gets wrapped
>>>>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>>>>> that does that part.
>>>>>
>>>>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>>>>> quick-start example to have a functional set of reports.
>>>>>
>>>>> Karl
>>>>>
>>>>
>>>
>>
>

Re: Derby SQL ideas needed

Posted by Karl Wright <da...@gmail.com>.
Yes.  This is for the Max Activity and Max Bandwidth reports.
Karl

On Sun, Sep 19, 2010 at 2:13 PM, Alexey Serba <as...@gmail.com> wrote:
> And all of this is only with single table repohistory, right? Is this
> some kind of complex analytics/stats?
>
> On Sun, Sep 19, 2010 at 8:48 PM, Karl Wright <da...@gmail.com> wrote:
>> Here you go:
>>
>>    // The query we will generate here looks like this:
>>    // SELECT *
>>    //   FROM
>>    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
>> t3.bytecount AS bytecount,
>>    //                               t3.windowstart AS starttime,
>> t3.windowend AS endtime
>>    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
>> t0.starttime AS windowstart, t0.starttime + <interval> AS windowend,
>>    //                   SUM(t1.datasize * ((case when t0.starttime +
>> <interval> < t1.endtime then t0.starttime + <interval> else t1.endtime
>> end) -
>>    //                     (case when t0.starttime>t1.starttime then
>> t0.starttime else t1.starttime end))
>>    //                      / (t1.endtime - t1.starttime)) AS bytecount
>>    //                   FROM (SELECT DISTINCT substring(entityid from
>> '<bucketregexp>') AS bucket, starttime FROM repohistory WHERE
>> <criteria>) t0, repohistory t1
>>    //                   WHERE t0.bucket=substring(t1.entityid from
>> '<bucket_regexp>')
>>    //                      AND t1.starttime < t0.starttime +
>> <interval> AND t1.endtime > t0.starttime
>>    //                      AND <criteria on t1>
>>    //                          GROUP BY bucket,windowstart,windowend
>>    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
>> <interval> AS windowstart, t0a.endtime AS windowend,
>>    //                   SUM(t1a.datasize * ((case when t0a.endtime <
>> t1a.endtime then t0a.endtime else t1a.endtime end) -
>>    //                     (case when t0a.endtime - <interval> >
>> t1a.starttime then t0a.endtime - <interval> else t1a.starttime end))
>>    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
>>    //                   FROM (SELECT DISTINCT substring(entityid from
>> '<bucketregexp>') AS bucket, endtime FROM repohistory WHERE
>> <criteria>) t0a, repohistory t1a
>>    //                   WHERE t0a.bucket=substring(t1a.entityid from
>> '<bucket_regexp>')
>>    //                      AND (t1a.starttime < t0a.endtime AND
>> t1a.endtime > t0a.endtime - <interval>
>>    //                      AND <criteria on t1a>
>>    //                          GROUP BY bucket,windowstart,windowend) t2
>>    //                              ORDER BY bucket ASC,bytecount
>> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;
>>
>> I have low confidence that ANY planner would be able to locate the
>> common part of a 2x larger query and not do it twice.
>>
>> Karl
>>
>>
>>
>> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba <as...@gmail.com> wrote:
>>>> The other thing is that we cannot afford to use the same "table"
>>>> twice, as it is actually an extremely expensive query in its own
>>>> right, with multiple joins, select distinct's, etc. under the covers.
>>> Even if you create indexes on bucket and activitycount columns? It
>>> might be that the query plans for these two queries (with "distinct
>>> on" hack and subquery max/subquery order limit/join) would be the
>>> same.
>>>
>>>> I'd be happy to post it but it may shock you. ;-)
>>> The way I indent SQL queries should say that I'm not afraid of
>>> multipage queries :)
>>>
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba <as...@gmail.com> wrote:
>>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3
>>>>> Do you have primary key in your t3 table?
>>>>>
>>>>>> In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result.
>>>>> FIRST based on which sort?
>>>>>
>>>>> Lets say you want to return FIRST row based on t3.windowstart column
>>>>> and you have primary key in t3 table. Then I believe your query can be
>>>>> rewritten in the following ways:
>>>>>
>>>>> 1. Using subqueries
>>>>> SELECT
>>>>>    bucket, primary_key, windowstart, etc
>>>>> FROM
>>>>>    table AS t1
>>>>> WHERE
>>>>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>>>>> bucket = t1.bucket )
>>>>>
>>>>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>>>>> subqueries - not sure about that )
>>>>> SELECT
>>>>>    t1.bucket, t1.primary_key, windowstart, etc
>>>>> FROM
>>>>>    table AS t1
>>>>>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
>>>>> t2.windowstart > t1.windowstart )
>>>>> WHERE
>>>>>    t2.primary_key IS NULL
>>>>>
>>>>> HTH,
>>>>> Alex
>>>>>
>>>>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <da...@gmail.com> wrote:
>>>>>> Hi Folks,
>>>>>>
>>>>>> For two of the report queries, ACF uses the following Postgresql
>>>>>> construct, which sadly seems to have no Derby equivalent:
>>>>>>
>>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>>>>>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>>>>>> FROM (...) t3
>>>>>>
>>>>>> In Postgresql, what this does is to return the FIRST entire row
>>>>>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>>>>>> aggregate function, it would be the equivalent of:
>>>>>>
>>>>>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>>>>>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>>>>>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>>>>>
>>>>>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>>>>>> would not be ideal if I were to do the work myself in ACF, because
>>>>>> this is a resultset that needs to be paged through with offset and
>>>>>> length, for presentation to the user and sorting, so it gets wrapped
>>>>>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>>>>>> that does that part.
>>>>>>
>>>>>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>>>>>> quick-start example to have a functional set of reports.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Derby SQL ideas needed

Posted by Alexey Serba <as...@gmail.com>.
And all of this is only with single table repohistory, right? Is this
some kind of complex analytics/stats?

On Sun, Sep 19, 2010 at 8:48 PM, Karl Wright <da...@gmail.com> wrote:
> Here you go:
>
>    // The query we will generate here looks like this:
>    // SELECT *
>    //   FROM
>    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
> t3.bytecount AS bytecount,
>    //                               t3.windowstart AS starttime,
> t3.windowend AS endtime
>    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
> t0.starttime AS windowstart, t0.starttime + <interval> AS windowend,
>    //                   SUM(t1.datasize * ((case when t0.starttime +
> <interval> < t1.endtime then t0.starttime + <interval> else t1.endtime
> end) -
>    //                     (case when t0.starttime>t1.starttime then
> t0.starttime else t1.starttime end))
>    //                      / (t1.endtime - t1.starttime)) AS bytecount
>    //                   FROM (SELECT DISTINCT substring(entityid from
> '<bucketregexp>') AS bucket, starttime FROM repohistory WHERE
> <criteria>) t0, repohistory t1
>    //                   WHERE t0.bucket=substring(t1.entityid from
> '<bucket_regexp>')
>    //                      AND t1.starttime < t0.starttime +
> <interval> AND t1.endtime > t0.starttime
>    //                      AND <criteria on t1>
>    //                          GROUP BY bucket,windowstart,windowend
>    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
> <interval> AS windowstart, t0a.endtime AS windowend,
>    //                   SUM(t1a.datasize * ((case when t0a.endtime <
> t1a.endtime then t0a.endtime else t1a.endtime end) -
>    //                     (case when t0a.endtime - <interval> >
> t1a.starttime then t0a.endtime - <interval> else t1a.starttime end))
>    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
>    //                   FROM (SELECT DISTINCT substring(entityid from
> '<bucketregexp>') AS bucket, endtime FROM repohistory WHERE
> <criteria>) t0a, repohistory t1a
>    //                   WHERE t0a.bucket=substring(t1a.entityid from
> '<bucket_regexp>')
>    //                      AND (t1a.starttime < t0a.endtime AND
> t1a.endtime > t0a.endtime - <interval>
>    //                      AND <criteria on t1a>
>    //                          GROUP BY bucket,windowstart,windowend) t2
>    //                              ORDER BY bucket ASC,bytecount
> DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;
>
> I have low confidence that ANY planner would be able to locate the
> common part of a 2x larger query and not do it twice.
>
> Karl
>
>
>
> On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba <as...@gmail.com> wrote:
>>> The other thing is that we cannot afford to use the same "table"
>>> twice, as it is actually an extremely expensive query in its own
>>> right, with multiple joins, select distinct's, etc. under the covers.
>> Even if you create indexes on bucket and activitycount columns? It
>> might be that the query plans for these two queries (with "distinct
>> on" hack and subquery max/subquery order limit/join) would be the
>> same.
>>
>>> I'd be happy to post it but it may shock you. ;-)
>> The way I indent SQL queries should say that I'm not afraid of
>> multipage queries :)
>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba <as...@gmail.com> wrote:
>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3
>>>> Do you have primary key in your t3 table?
>>>>
>>>>> In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result.
>>>> FIRST based on which sort?
>>>>
>>>> Lets say you want to return FIRST row based on t3.windowstart column
>>>> and you have primary key in t3 table. Then I believe your query can be
>>>> rewritten in the following ways:
>>>>
>>>> 1. Using subqueries
>>>> SELECT
>>>>    bucket, primary_key, windowstart, etc
>>>> FROM
>>>>    table AS t1
>>>> WHERE
>>>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>>>> bucket = t1.bucket )
>>>>
>>>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>>>> subqueries - not sure about that )
>>>> SELECT
>>>>    t1.bucket, t1.primary_key, windowstart, etc
>>>> FROM
>>>>    table AS t1
>>>>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
>>>> t2.windowstart > t1.windowstart )
>>>> WHERE
>>>>    t2.primary_key IS NULL
>>>>
>>>> HTH,
>>>> Alex
>>>>
>>>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <da...@gmail.com> wrote:
>>>>> Hi Folks,
>>>>>
>>>>> For two of the report queries, ACF uses the following Postgresql
>>>>> construct, which sadly seems to have no Derby equivalent:
>>>>>
>>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>>>>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>>>>> FROM (...) t3
>>>>>
>>>>> In Postgresql, what this does is to return the FIRST entire row
>>>>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>>>>> aggregate function, it would be the equivalent of:
>>>>>
>>>>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>>>>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>>>>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>>>>
>>>>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>>>>> would not be ideal if I were to do the work myself in ACF, because
>>>>> this is a resultset that needs to be paged through with offset and
>>>>> length, for presentation to the user and sorting, so it gets wrapped
>>>>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>>>>> that does that part.
>>>>>
>>>>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>>>>> quick-start example to have a functional set of reports.
>>>>>
>>>>> Karl
>>>>>
>>>>
>>>
>>
>

Re: Derby SQL ideas needed

Posted by Karl Wright <da...@gmail.com>.
Here you go:

    // The query we will generate here looks like this:
    // SELECT *
    //   FROM
    //     (SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket,
t3.bytecount AS bytecount,
    //                               t3.windowstart AS starttime,
t3.windowend AS endtime
    //        FROM (SELECT * FROM (SELECT t0.bucket AS bucket,
t0.starttime AS windowstart, t0.starttime + <interval> AS windowend,
    //                   SUM(t1.datasize * ((case when t0.starttime +
<interval> < t1.endtime then t0.starttime + <interval> else t1.endtime
end) -
    //                     (case when t0.starttime>t1.starttime then
t0.starttime else t1.starttime end))
    //                      / (t1.endtime - t1.starttime)) AS bytecount
    //                   FROM (SELECT DISTINCT substring(entityid from
'<bucketregexp>') AS bucket, starttime FROM repohistory WHERE
<criteria>) t0, repohistory t1
    //                   WHERE t0.bucket=substring(t1.entityid from
'<bucket_regexp>')
    //                      AND t1.starttime < t0.starttime +
<interval> AND t1.endtime > t0.starttime
    //                      AND <criteria on t1>
    //                          GROUP BY bucket,windowstart,windowend
    //              UNION SELECT t0a.bucket AS bucket, t0a.endtime -
<interval> AS windowstart, t0a.endtime AS windowend,
    //                   SUM(t1a.datasize * ((case when t0a.endtime <
t1a.endtime then t0a.endtime else t1a.endtime end) -
    //                     (case when t0a.endtime - <interval> >
t1a.starttime then t0a.endtime - <interval> else t1a.starttime end))
    //                      / (t1a.endtime - t1a.starttime)) AS bytecount
    //                   FROM (SELECT DISTINCT substring(entityid from
'<bucketregexp>') AS bucket, endtime FROM repohistory WHERE
<criteria>) t0a, repohistory t1a
    //                   WHERE t0a.bucket=substring(t1a.entityid from
'<bucket_regexp>')
    //                      AND (t1a.starttime < t0a.endtime AND
t1a.endtime > t0a.endtime - <interval>
    //                      AND <criteria on t1a>
    //                          GROUP BY bucket,windowstart,windowend) t2
    //                              ORDER BY bucket ASC,bytecount
DESC) t3) t4 ORDER BY xxx LIMIT yyy OFFSET zzz;

I have low confidence that ANY planner would be able to locate the
common part of a 2x larger query and not do it twice.

Karl



On Sun, Sep 19, 2010 at 12:05 PM, Alexey Serba <as...@gmail.com> wrote:
>> The other thing is that we cannot afford to use the same "table"
>> twice, as it is actually an extremely expensive query in its own
>> right, with multiple joins, select distinct's, etc. under the covers.
> Even if you create indexes on bucket and activitycount columns? It
> might be that the query plans for these two queries (with "distinct
> on" hack and subquery max/subquery order limit/join) would be the
> same.
>
>> I'd be happy to post it but it may shock you. ;-)
> The way I indent SQL queries should say that I'm not afraid of
> multipage queries :)
>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba <as...@gmail.com> wrote:
>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3
>>> Do you have primary key in your t3 table?
>>>
>>>> In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result.
>>> FIRST based on which sort?
>>>
>>> Lets say you want to return FIRST row based on t3.windowstart column
>>> and you have primary key in t3 table. Then I believe your query can be
>>> rewritten in the following ways:
>>>
>>> 1. Using subqueries
>>> SELECT
>>>    bucket, primary_key, windowstart, etc
>>> FROM
>>>    table AS t1
>>> WHERE
>>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>>> bucket = t1.bucket )
>>>
>>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>>> subqueries - not sure about that )
>>> SELECT
>>>    t1.bucket, t1.primary_key, windowstart, etc
>>> FROM
>>>    table AS t1
>>>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
>>> t2.windowstart > t1.windowstart )
>>> WHERE
>>>    t2.primary_key IS NULL
>>>
>>> HTH,
>>> Alex
>>>
>>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <da...@gmail.com> wrote:
>>>> Hi Folks,
>>>>
>>>> For two of the report queries, ACF uses the following Postgresql
>>>> construct, which sadly seems to have no Derby equivalent:
>>>>
>>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>>>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>>>> FROM (...) t3
>>>>
>>>> In Postgresql, what this does is to return the FIRST entire row
>>>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>>>> aggregate function, it would be the equivalent of:
>>>>
>>>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>>>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>>>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>>>
>>>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>>>> would not be ideal if I were to do the work myself in ACF, because
>>>> this is a resultset that needs to be paged through with offset and
>>>> length, for presentation to the user and sorting, so it gets wrapped
>>>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>>>> that does that part.
>>>>
>>>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>>>> quick-start example to have a functional set of reports.
>>>>
>>>> Karl
>>>>
>>>
>>
>

Re: Derby SQL ideas needed

Posted by Alexey Serba <as...@gmail.com>.
> The other thing is that we cannot afford to use the same "table"
> twice, as it is actually an extremely expensive query in its own
> right, with multiple joins, select distinct's, etc. under the covers.
Even if you create indexes on bucket and activitycount columns? It
might be that the query plans for these two queries (with "distinct
on" hack and subquery max/subquery order limit/join) would be the
same.

> I'd be happy to post it but it may shock you. ;-)
The way I indent SQL queries should say that I'm not afraid of
multipage queries :)

>
> Karl
>
>
>
>
>
> On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba <as...@gmail.com> wrote:
>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3
>> Do you have primary key in your t3 table?
>>
>>> In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result.
>> FIRST based on which sort?
>>
>> Lets say you want to return FIRST row based on t3.windowstart column
>> and you have primary key in t3 table. Then I believe your query can be
>> rewritten in the following ways:
>>
>> 1. Using subqueries
>> SELECT
>>    bucket, primary_key, windowstart, etc
>> FROM
>>    table AS t1
>> WHERE
>>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
>> bucket = t1.bucket )
>>
>> 2. Using joins instead of subqueries ( in case Derby doesn't support
>> subqueries - not sure about that )
>> SELECT
>>    t1.bucket, t1.primary_key, windowstart, etc
>> FROM
>>    table AS t1
>>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
>> t2.windowstart > t1.windowstart )
>> WHERE
>>    t2.primary_key IS NULL
>>
>> HTH,
>> Alex
>>
>> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <da...@gmail.com> wrote:
>>> Hi Folks,
>>>
>>> For two of the report queries, ACF uses the following Postgresql
>>> construct, which sadly seems to have no Derby equivalent:
>>>
>>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>>> FROM (...) t3
>>>
>>> In Postgresql, what this does is to return the FIRST entire row
>>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>>> aggregate function, it would be the equivalent of:
>>>
>>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>>
>>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>>> would not be ideal if I were to do the work myself in ACF, because
>>> this is a resultset that needs to be paged through with offset and
>>> length, for presentation to the user and sorting, so it gets wrapped
>>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>>> that does that part.
>>>
>>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>>> quick-start example to have a functional set of reports.
>>>
>>> Karl
>>>
>>
>

Re: Derby SQL ideas needed

Posted by Karl Wright <da...@gmail.com>.
"FIRST based on which sort?"?

First based on the existing sort, which is crucial, because the sort
is by bucket ASC, activitycount DESC.  I'm looking for the row with
the highest activitycount, per bucket.

The other thing is that we cannot afford to use the same "table"
twice, as it is actually an extremely expensive query in its own
right, with multiple joins, select distinct's, etc. under the covers.
I'd be happy to post it but it may shock you. ;-)

Karl





On Sun, Sep 19, 2010 at 11:32 AM, Alexey Serba <as...@gmail.com> wrote:
>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3
> Do you have primary key in your t3 table?
>
>> In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result.
> FIRST based on which sort?
>
> Lets say you want to return FIRST row based on t3.windowstart column
> and you have primary key in t3 table. Then I believe your query can be
> rewritten in the following ways:
>
> 1. Using subqueries
> SELECT
>    bucket, primary_key, windowstart, etc
> FROM
>    table AS t1
> WHERE
>    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
> bucket = t1.bucket )
>
> 2. Using joins instead of subqueries ( in case Derby doesn't support
> subqueries - not sure about that )
> SELECT
>    t1.bucket, t1.primary_key, windowstart, etc
> FROM
>    table AS t1
>    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
> t2.windowstart > t1.windowstart )
> WHERE
>    t2.primary_key IS NULL
>
> HTH,
> Alex
>
> On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <da...@gmail.com> wrote:
>> Hi Folks,
>>
>> For two of the report queries, ACF uses the following Postgresql
>> construct, which sadly seems to have no Derby equivalent:
>>
>> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
>> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
>> FROM (...) t3
>>
>> In Postgresql, what this does is to return the FIRST entire row
>> matching each distinct idbucket result.  If Derby had a "FIRST()"
>> aggregate function, it would be the equivalent of:
>>
>> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
>> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
>> AS endtime FROM (...) t3 GROUP BY t3.bucket
>>
>> Unfortunately, Derby has no such aggregate function.  Furthermore, it
>> would not be ideal if I were to do the work myself in ACF, because
>> this is a resultset that needs to be paged through with offset and
>> length, for presentation to the user and sorting, so it gets wrapped
>> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
>> that does that part.
>>
>> Does anyone have any ideas and/or Derby contacts?  I'd really like the
>> quick-start example to have a functional set of reports.
>>
>> Karl
>>
>

Re: Derby SQL ideas needed

Posted by Alexey Serba <as...@gmail.com>.
> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime FROM (...) t3
Do you have primary key in your t3 table?

> In Postgresql, what this does is to return the FIRST entire row matching each distinct idbucket result.
FIRST based on which sort?

Lets say you want to return FIRST row based on t3.windowstart column
and you have primary key in t3 table. Then I believe your query can be
rewritten in the following ways:

1. Using subqueries
SELECT
    bucket, primary_key, windowstart, etc
FROM
    table AS t1
WHERE
    windowstart=( SELECT max(windowstart) FROM table AS t2 WHERE
bucket = t1.bucket )

2. Using joins instead of subqueries ( in case Derby doesn't support
subqueries - not sure about that )
SELECT
    t1.bucket, t1.primary_key, windowstart, etc
FROM
    table AS t1
    LEFT OUTER JOIN table AS t2 ON ( t1.bucket = t2.bucket AND
t2.windowstart > t1.windowstart )
WHERE
    t2.primary_key IS NULL

HTH,
Alex

On Sat, Sep 18, 2010 at 2:28 PM, Karl Wright <da...@gmail.com> wrote:
> Hi Folks,
>
> For two of the report queries, ACF uses the following Postgresql
> construct, which sadly seems to have no Derby equivalent:
>
> SELECT DISTINCT ON (idbucket) t3.bucket AS idbucket, t3.activitycount
> AS activitycount, t3.windowstart AS starttime, t3.windowend AS endtime
> FROM (...) t3
>
> In Postgresql, what this does is to return the FIRST entire row
> matching each distinct idbucket result.  If Derby had a "FIRST()"
> aggregate function, it would be the equivalent of:
>
> SELECT t3.bucket AS idbucket, FIRST(t3.activitycount) AS
> activitycount, FIRST(t3.windowstart) AS starttime, FIRST(t3.windowend)
> AS endtime FROM (...) t3 GROUP BY t3.bucket
>
> Unfortunately, Derby has no such aggregate function.  Furthermore, it
> would not be ideal if I were to do the work myself in ACF, because
> this is a resultset that needs to be paged through with offset and
> length, for presentation to the user and sorting, so it gets wrapped
> in another SELECT ... FROM (...) ORDER BY ... OFFSET ... LIMIT ...
> that does that part.
>
> Does anyone have any ideas and/or Derby contacts?  I'd really like the
> quick-start example to have a functional set of reports.
>
> Karl
>