You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Yip Ng <yi...@gmail.com> on 2006/11/01 00:00:21 UTC

Re: Functions in GROUP BY expressions? (related to DERBY-883)

On 10/31/06, Army <qo...@gmail.com> wrote:
>
> Many thanks to all who took the time to provide clarifications in this
> thread.
> At this point I think the only question I have remaining is with respect
> to
> something Dan wrote in one of his replies, namely:
>
> > the grouping by functions does not work because at the moment Derby
> > assumes all functions are non-deterministic.
>
> Note the use of the word "all".
>


I think what Dan meant was that all *user defined functions* defined via
CREATE FUNCTION statement are assumed to be non-deterministic since users
cannot declare a function as deterministic in the current implementation of
Derby, so implicitly, they are non-deterministic.

For built-in functions such as SUBSTR, LTRIM and RTRIM, I believe they are
deterministic since they always return the same result from successive
invocations with identical input(s)
and they work as expected with group by expression.

For SIN, COS, TAN functions,  they are also considered deterministics.
However, they do not work currently in Derby with group by expression as
they are mapped to  JavaToSQLValueNode and such nodes always return false in
isEquivalent() method, so I think this is an implementation restriction for
those built-in functions.

Perhaps it is a good idea to have documentation that describes the function
determinism of all built-in functions in the reference manual.


Regards,
Yip Ng

Re: Functions in GROUP BY expressions? (related to DERBY-883)

Posted by Daniel John Debrunner <dj...@apache.org>.
Yip Ng wrote:

> I think what Dan meant was that all *user defined functions* defined via 
> CREATE FUNCTION statement are assumed to be non-deterministic since 
> users cannot declare a function as deterministic in the current 
> implementation of Derby, so implicitly, they are non-deterministic.

Yes.

[snip]

> For SIN, COS, TAN functions,  they are also considered deterministics.

SIN, COS etc. are implemented using the same mechanism as user defined 
functions.

> However, they do not work currently in Derby with group by expression as 
> they are mapped to  JavaToSQLValueNode and such nodes always return 
> false in isEquivalent() method, so I think this is an implementation 
> restriction for those built-in functions.

Right, ideally these builtin functions should be able to declare 
themselves as deterministic or not and the isEquivalent() method factor 
that into its decision. I'd like to see this handled the same way for 
the builtin Java SQL external functions (SIN, COS etc.) as for the user 
defined functions, rather than special casing the builtin ones.

Dan.



Re: Functions in GROUP BY expressions? (related to DERBY-883)

Posted by Manish Khettry <ma...@gmail.com>.
Actually, when I was implementing this functionality I thought that
comparing java nodes for equivalence was not possible or perhaps hard to do.
I could be wrong on this, so someone who knows this stuff better can comment
on this.

At any rate, I find it strange that one of these query works but not the
other. I think right now as things stand, both should fail. If there is
consensus that we should allow java value nodes in group by expressions,
then we should add isEquivalence methods to all the JavaValueNodes.

m


On 11/15/06, Oystein Grovlen - Sun Norway <Oy...@sun.com> wrote:
>
> Manish Khettry wrote:
> > Hmm strange. I am not sure why this
> >
> >    select avg(i) from t group by sin(i)
> >
> > works but not this,
> >
> >   select avg(i), sin(i) from test group by sin(i);
> >
> > I think its a bug-- both should be disallowed.
>
> Why should they be disallowed?  I agree that in most cases grouping by
> non-deterministic functions do not make sense, but I am not sure that
> is a reason for disallowing it.  Also, a function that is generally
> non-deterministic may be deterministic in a given context.
>
> Theoretically, I do not see why equivalence and non-deterministic
> functions are an issue for GROUP BY.  I do not think an implementation
> need to execute the function twice if it appears both in the group-by
> clause and in the select list.  The value returned when determining
> group could be used for the select list also.
>
> --
> Øystein
>

Re: Functions in GROUP BY expressions? (related to DERBY-883)

Posted by Oystein Grovlen - Sun Norway <Oy...@Sun.COM>.
Manish Khettry wrote:
 > Hmm strange. I am not sure why this
 >
 >    select avg(i) from t group by sin(i)
 >
 > works but not this,
 >
 >   select avg(i), sin(i) from test group by sin(i);
 >
 > I think its a bug-- both should be disallowed.

Why should they be disallowed?  I agree that in most cases grouping by
non-deterministic functions do not make sense, but I am not sure that
is a reason for disallowing it.  Also, a function that is generally
non-deterministic may be deterministic in a given context.

Theoretically, I do not see why equivalence and non-deterministic
functions are an issue for GROUP BY.  I do not think an implementation
need to execute the function twice if it appears both in the group-by
clause and in the select list.  The value returned when determining
group could be used for the select list also.

-- 
Øystein

Re: Functions in GROUP BY expressions? (related to DERBY-883)

Posted by Manish Khettry <ma...@gmail.com>.
Hmm strange. I am not sure why this

   select avg(i) from t group by sin(i)

works but not this,

  select avg(i), sin(i) from test group by sin(i);

I think its a bug-- both should be disallowed.

m

On 11/14/06, Oystein Grovlen - Sun Norway <Oy...@sun.com> wrote:
>
> Yip Ng wrote:
>
> > For SIN, COS, TAN functions,  they are also considered deterministics.
> > However, they do not work currently in Derby with group by expression as
> > they are mapped to  JavaToSQLValueNode and such nodes always return
> > false in isEquivalent() method, so I think this is an implementation
> > restriction for those built-in functions.
>
> Just some late follow-up here.  As far as I can tell, it is not strictly
> correct to say that non-deterministic functions does not work with group
> by expressions.  What does not work is to use a non-deterministic
> function in the select list of a statement when grouping on the same
> function.  However, the following query works:
>
>         select avg(i) from t group by sin(i)
>
> I assume that it is the check for equivalence between the select list
> and the group by list that requires deterministic functions.  (This may
> have been clear to others when this was discussed, but at least not to
> me.)
>
> --
> Øystein
>

Re: Functions in GROUP BY expressions? (related to DERBY-883)

Posted by Yip Ng <yi...@gmail.com>.
On 11/15/06, Oystein Grovlen - Sun Norway <Oy...@sun.com> wrote:

>I thought we had established that SIN was non-deterministic.  Are you
> saying that SIN is deterministic in some contexts, but not in others?

If we are talking about the current implementation of Derby, then all user
defined functions and built-ins are non-deterministic.  As explained in
my previous comments, Derby does not have a way to declare user functions
deterministically, so they are all treated in non-deterministic manner
implicitly.

What I was trying to convey was most of the built-in functions in Derby are
*supposed*
to be deterministic (aggregates and SIN, COS, TAN, SUBSTR, etc.) since given
the
same argument(s), they return the same result.  And so far, I have been
using
function determinisim to determine what is allowed in the group by
expression in
general and not limited to Derby.  This is what I would like to describe in
the
group by expression documentation when Derby allows functions to be declared

determinisically in the future.

But for now, from a documentation perspective, the Derby ref manual should
mention
what is allowed in the group by expression in 10.2 due to implementation
restrictions.


Regards,
Yip Ng

Re: Functions in GROUP BY expressions? (related to DERBY-883)

Posted by Oystein Grovlen - Sun Norway <Oy...@sun.com>.
Yip Ng wrote:
 > On 11/14/06, *Oystein Grovlen - Sun Norway* <Oystein.Grovlen@sun.com
 > <ma...@sun.com>> wrote:
 >
 >
 >     Just some late follow-up here.  As far as I can tell, it is not 
strictly
 >     correct to say that non-deterministic functions does not work 
with group
 >     by expressions.  What does not work is to use a non-deterministic
 >     function in the select list of a statement when grouping on the same
 >     function.  However, the following query works:
 >
 >             select avg(i) from t group by sin(i)
 >
 >     I assume that it is the check for equivalence between the select list
 >     and the group by list that requires deterministic functions.
 >
 >
 >
 > In your example, AVG and (its other aggregate friends) and SIN are
 > deterministic, so

I thought we had established that SIN was non-deterministic.  Are you
saying that SIN is deterministic in some contexts, but not in others?

 > select avg(i) from t group by sin(i) works expectedly.  I was referring
 > to Army's example
 > where the sin(i) is in the select list

That was exactly my point.  It is putting the non-deterministic
functions both in the select list and the group-by clause that is the
problem, not non-deterministic functions in the group-by clause, per
se.

 > On a side note, I think it is awkward and perhaps wrong to group by with
 > non-deterministic functions as the
 > grouping will have unpredictable result, so they shouldn't be allow in
 > the group by clause in my opinion.

If someone wants to group records in random ways, why should we
prevent them from doing so?

 > For a quick reference, I created a non-deterministic function named f_nd
 > in DB2 and attempt to group by
 > with this function and as I expected, it throws an error.  e.g.:
 >
 > db2 => select avg(i) from t1 group by f_nd(i)
 > SQL0583N  The use of routine "S1.F_ND" is invalid because it is not
 > deterministic or has an external action.  SQLSTATE=42845

The DB2 reference manual shows an example og how to get around this
restriction:

SELECT RANDID ,AVG(EDLEVEL ),AVG(SALARY )
   FROM ( SELECT EDLEVEL,SALARY,INTEGER(RAND()*5) AS RANDID
            FROM EMPLOYEE
        ) AS EMPRAND
GROUP BY RANDID

This made me try the following version of Army's original query on
Derby:

SELECT sinus
   FROM ( SELECT sin(i) AS sinus
            FROM t
        ) AS sinust
GROUP BY sinus;

This works.  It is not clear to me why we should force people to write
people more complex queries than necessary to achieve what they want.

-- 
Øystein Grøvlen

Re: Functions in GROUP BY expressions? (related to DERBY-883)

Posted by Yip Ng <yi...@gmail.com>.
On 11/14/06, Oystein Grovlen - Sun Norway <Oy...@sun.com> wrote:
>
>
> Just some late follow-up here.  As far as I can tell, it is not strictly
> correct to say that non-deterministic functions does not work with group
> by expressions.  What does not work is to use a non-deterministic
> function in the select list of a statement when grouping on the same
> function.  However, the following query works:
>
>         select avg(i) from t group by sin(i)
>
> I assume that it is the check for equivalence between the select list
> and the group by list that requires deterministic functions.



In your example, AVG and (its other aggregate friends) and SIN are
deterministic, so
select avg(i) from t group by sin(i) works expectedly.  I was referring to
Army's example
where the sin(i) is in the select list

<snip>
ij> select sin(i) from t1 group by sin(i);
ERROR 42Y30: The SELECT list of a grouped query contains at least one
invalid
expression. If a SELECT list has a GROUP BY, the list may only contain valid
grouping expressions and valid aggregate expressions.
</snip>

and that the current implementation does not allow this due to
JavaToSQLValueNode's

behavior in isEquivalent() method.  Hope I clear this up. =)

On a side note, I think it is awkward and perhaps wrong to group by with
non-deterministic functions as the
grouping will have unpredictable result, so they shouldn't be allow in the
group by clause in my opinion.
For a quick reference, I created a non-deterministic function named f_nd in
DB2 and attempt to group by
with this function and as I expected, it throws an error.  e.g.:

db2 => select avg(i) from t1 group by f_nd(i)
SQL0583N  The use of routine "S1.F_ND" is invalid because it is not
deterministic or has an external action.  SQLSTATE=42845


Regards,
Yip Ng

Re: Functions in GROUP BY expressions? (related to DERBY-883)

Posted by Oystein Grovlen - Sun Norway <Oy...@Sun.COM>.
Yip Ng wrote:

> For SIN, COS, TAN functions,  they are also considered deterministics.  
> However, they do not work currently in Derby with group by expression as 
> they are mapped to  JavaToSQLValueNode and such nodes always return 
> false in isEquivalent() method, so I think this is an implementation 
> restriction for those built-in functions.

Just some late follow-up here.  As far as I can tell, it is not strictly 
correct to say that non-deterministic functions does not work with group 
by expressions.  What does not work is to use a non-deterministic 
function in the select list of a statement when grouping on the same 
function.  However, the following query works:

	select avg(i) from t group by sin(i)

I assume that it is the check for equivalence between the select list 
and the group by list that requires deterministic functions.  (This may 
have been clear to others when this was discussed, but at least not to me.)

-- 
Øystein