You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Manish Khettry (JIRA)" <de...@db.apache.org> on 2006/07/17 06:53:17 UTC

[jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

     [ http://issues.apache.org/jira/browse/DERBY-883?page=all ]

Manish Khettry updated DERBY-883:
---------------------------------

    Derby Info: [Patch Available]

Functionally, this patch supports expressions in the group by list. An expression
in the group by list can also be used in the select list. For example if the grouping
expression is c1+c2, then a valid expression in the select list would be 
c1+c2+1. The expression matching facility is not smart enough to allow c1+1+c2.

This patch does not allow the use of grouping expressions in the having clause
or order by clause. These are restrictions for now which should be removed
eventually. For more information on group by expressions:

http://www.pdc.kth.se/doc/SP/manuals/db2-5.0/html/db2s0/db2s0176.htm#HDRGRPBY

Another change has been to allow the use of duplicate grouping expressions: i.e
select a, sum(b) from test group by a,a;

Error messages have been changed. The most noticeable change is the use
of the more general sql exception: LANG_INVALID_GROUPED_SELECT_LIST.

VerifyAggregateExpressionsVisitor

o disallows java nodes. This should take care of functions in java. 
o the skipChildren method doesn't traverse a subtree that contains
  any grouping expression.
o the error messages are more appropriate-- LANG_INVALID_GROUPED_SELECT_LIST.

SelectNode

o calls preprocess on the group by list. This is needed because expressions 
can get rewritten in the select list but not in the grouping list causing
problems when the result set is generated.

GroupByNode

o changes to init() take care of the case where gropuing expressions is
not a column reference.

o the rewrite logic is a bit different now. Earlier, we would change
each unaggregate ColumnReference in the select list and point it to the
GroupByColumn. Now we replace each group by expression that we find
in the projection list with a virtual column node that effectively points
to a result column in the result set doing the group by. In addition
the original routine which did the rewrite is now split up into two separate
and smaller routines: addUnAggColumns and addAggregateColumns.

GroupByColumn

o now uses a ValueNode instead of a ColumnReference. Unused methods zapped.

GroupByDiff

o verifyUniqueGroupingColumn discarded. This was a restriction earlier and
it does not make sense.

o findGroupingColumn now does the hard work of finding a group by column
for a given expression.

sqlgrammer

o use additiveExpression instead of columnReference.

GroupByExpressionTest.java

o JUnit test case for this functionality.







> Enhance GROUP BY clause to support expressions instead of just column references.
> ---------------------------------------------------------------------------------
>
>                 Key: DERBY-883
>                 URL: http://issues.apache.org/jira/browse/DERBY-883
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 10.1.2.1
>         Environment: JDK 1.5.0_05
>            Reporter: Lluis Turro
>         Assigned To: Manish Khettry
>         Attachments: 883.patch.txt
>
>
> This select would return an error syntax on finding "(" after month if group by clause:
> select idissue, month(creation), year(creation), count(distinct idissue)
> where 
>   ....
> group by idissue, month(creation), year(creation)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Posted by Daniel John Debrunner <dj...@apache.org>.
Army wrote:


> **** Nit-picking: Changes that you don't *have* to do but that might
> help in cleaning up the patch.
> 
> A. Unnecessary white-space changes?

+1 to avoiding such changes.


> In SelectNode.java:
> 
> -              VerifyAggregateExpressionsVisitor visitor =
> -                  new VerifyAggregateExpressionsVisitor(groupByList);
> +            VerifyAggregateExpressionsVisitor visitor =
> +                new VerifyAggregateExpressionsVisitor(groupByList);
>              resultColumns.accept(visitor);


White space changes like this make it extra work on the reviewers. Is it
really just white space change or is there some minor change in there
that they need to be concerned about. Looking at such changes makes me
go cross-eyed as I try to determine if the lines are the same or not.

Thanks,
Dan.



Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Posted by Manish Khettry <ma...@gmail.com>.
On 7/26/06, Army <qo...@gmail.com> wrote:
>
> Hi Manish,
>
> Thanks for addressing my comments.  The patch is looking better
> now--especially
> with the whitespace issues in version 4.   Some follow-up comments below.
>
> Manish Khettry wrote:
> >> 1) TernarnyOpNode.isEquivalent (): I think we need to check "receiver"
> >> equivalence as well, in addition to left and right operands.
> >
> > yes, you are right on this one. The new patch fixes this.
>
> Thanks for fixing this.  If possible, could you add a test case to your
> JUnit
> test to demonstrate?  Could just be the query that I posted in my previous
> comments.


I have added a couple of queries to make sure that
TernaryOpNode#isEquivalent is tested.


> Actually the existing error message no longer makes sense with group by
> > expressions. For example, the line of code that you point to, also
> catches
> > errors like this:
> > select c1+1,  avg(c2) from test group by c1+10;
> > Does the existing error message make sense for this query?
>
> Not with this query, no.  But two things to note here:
>
> 1) Before your patch, if you replace "group by c1+10" with "group by c2"
> you'll
> get error 42Y36 saying that column reference C1 is invalid, even though
> it's
> actually part of an expression ("c1+1")--so that type of behavior *does*
> already
> exist, even if it doesn't really "make sense".  If you change the SQLSTATE
> and
> error message for this case, the result is a change (albeit a pretty small
> one)
> in customer apps, which is really the issue here.
>
> 2) The key expression in my previous comment was "in cases like this",
> where
> "like this" meant queries where the existing error message *does* make
> sense.
>
> For example, with the following query it makes sense to include the column
> reference because we know what it is:
>
> ij> select c1 from test group by c2;
> ERROR 42Y36: Column reference 'C1' is invalid.  For a SELECT list with a
> GROUP
> BY, the list may only contain grouping columns and valid aggregate
> expressions.
>
> And that's what we'll get before your patch.  But after your patch we a)
> get a
> different SQLSTATE and b) lose the column reference name:
>
> ij> select c1 from test group by c2;
> ERROR 42Y30: The SELECT list of a grouped query contains at least one
> invalid
> expression. If a SELECT list has a GROUP BY, the list may only contain
> valid
> grouping expressions and valid aggregate expressions.
>
> So now we've changed the way an existing query behaves (by returning a
> different
> SQLSTATE)--which (I think?) means we need to mark this as "Existing
> Application
> Impact"--*and* we're now providing the user with less information.


I see your point but I would prefer to keep it this way, first because the
error message in question 42Y36-- even though it gives more information, is
misleading; specifically this phrase: "the list may only contain grouping
columns .....". I also don't like error handling for a subset of the cases
taking recourse to instanceof. I'll throw it up to the community to weigh
in.

I can see that it would be useful to point out which expression is invalid
as opposed to saying ".. atleast one invalid expression....". If the
community wants this, then I'd like to be done with this patch and file
another issue to specifically improve the error handling of this case.

If it's possible to figure out when we have a simple column reference (c1)
> verses an expression (c1+1) and to throw a corresponding error, I think
> that
> would be ideal.  But if you decide that it's not possible or not worth it
> to
> preserve the error for queries with simple column references, then feel
> free to
> leave this as it is.  If that's your decision, though, I think you should
> ask
> derby-dev if a different SQLSTATE for the same query counts as "Existing
> Application Impact", and if so, mark this issue as appropriate.
>
> >> + return (operator.equals(other.operator)
> >> +        || (operand == other.operand)
> >> +        || ((operand != null) && operand.isEquivalent(other)));
> >
> > If you look at bindUnaryOperator for UnaryOperatorNode, you will see
> that
> > it is possible for operand to be null; not so in BinaryOperatorNode.
>
> So the assumption here is that if (operand == other.operand) then both
> operands
> are null?  If so, a simple comment saying that could be helpful.


OK. I think the code speaks clearly for itself (the second part of the ||
checks for != null before calling isEquivalent) but I've added the comment.
I like to add comments if they really clarify something that the code may
not. Usually the code changes and the comments don't and thats worse than no
comments at all. Those XP guys were onto something!

> In GroupByList.java:
> <snip code snippet>
> >
> > That is a good question-- this bit of code add invisible column
> references
> > when a column reference in the group by list, is not in the projection
> > list;
> > i.e.
> >
> > select c2, sum(c3) from test group by c1;
> >
> > will add c1 to the projection list as a generated column. I'm not sure
> if
> > you want to do the same kind of rewrite for expressions. I thought this
> > wasn't worth doing but I'm not sure anymore.
>
> I just tried this query with and without your patch and it fails in both
> cases
> with error 42Y36/42Y30.  So I guess I'm a little confused as to why Derby
> would
> add "c1" to the projection list and what good it does us?



Sorry, wrong query, try

select c2, sum(c3) from test group by c1,c2;

If you're not sure whether or not it's worth adding equivalent logic for
> expressions, do you plan to follow-up with this later?  If not, a comment
> explaining what you have found (or what your uncertainties are) might help
> other
> developers down the road (esp. if a bug comes up in this area later).


I'll add a comment. My preference would be to leave as is and evaluate it
later.


> > I disagree with you here-- when a contributor has spent a considerable
> > amount of time understanding and making a lot of changes to a class, it
> > makes sense to remove dead code at the same time. If you don't, stuff
> like
> > this will likely get ignored or slip through the cracks, imho.
>
> I didn't mean to say that the changes were not useful nor that they
> shouldn't be
> committed; just that it might make it easier to track "cleanup" vs
> "functional"
> changes by filing a separate Jira issue to remove the dead code and
> posting an
> isolated patch to that issue.  The changes will still go in--no slipping
> through
> the cracks--but they will go in as part of their *own* Jira commit, which
> seems
> cleaner to me.
>
> In any event, that was just a suggestion.  No need to let this prevent the
>
> commit of the patch.


OK, I see your point although the overhead of a new jira issue, code reviews
and such is a bit high, imho.

> 7) There are a couple of places where you have commented lines out instead
> >
> > yes, you're right. I've removed it.
>
> There are still a couple of these lingering:


I got rid of both of them.

>> A. Unnecessary white-space changes?
> >
> > Atleast some of these were done to improve existing indentation. Next
> time,
> > I'll resist the urge to improve readability.
>
> Thanks for cleaning a bunch of these whitespace diffs up in the version
> four
> patch.  That said, there are still quite a few white-space changes
> left--did you
> intentionally leave those in?  Ex.
>
> GroupByNode.java:
>
> @@ -470,10 +489,10 @@
>                         ** bottom project restrict.
>                         */
>                         newRC = (ResultColumn) getNodeFactory().getNode(
> -
> C_NodeTypes.RESULT_COLUMN,
> -
> "##aggregate result",
> -
> aggregate.getNewNullResultExpression(),
> -
> getContextManager());
> +                                       C_NodeTypes.RESULT_COLUMN,
> +                                       "##aggregate result",
> +
> aggregate.getNewNullResultExpression (),
> +                                       getContextManager());
>                         newRC.markGenerated();
>                         newRC.bindResultColumnToExpression();
>                         bottomRCL.addElement (newRC);


Maybe one or two slipped by-- this one does reduce the indentation and its
possible it was going past the right margin and I decided to fix it. Some of
the others really served no purpose and I got rid of them.

1 -- SubstituteExpressionVisitor.java:
>
>   * The copyright says 1998, 2004; I think this is supposed to just say
> 2006
> since that's when the file was created.


OK.

2 -- I see that in version 4 of the patch you updated UnaryOperatorNode to
> have
> the following logic in isEquivalent():
>
> +               UnaryOperatorNode other = (UnaryOperatorNode)o;
> +               return (operator.equals(other.operator) &&
> +                       ((operand == other.operand)||
> +                        ((operand != null) && operand.isEquivalent(
> other.operand))));
>
> Did you become aware of this problem because of some test(s) that you were
> running?  If so, it'd be great if you could add the actual test case(s) to
> your
> JUnit test for regression purposes.  The more tests there are to check for
> these
> kinds of subtleties, the better...


Not tests, just visual inspection. I was going over the email exchange when
it hit me. I thought about tests and the only one case where the operand is
null is count(*)-- since isEquivalent will only be called for grouping
expressions, we can never exercise this particular branch of the
conditional.

And finally:
>
> > Manish Khettry updated DERBY-883:
> > ---------------------------------
> >
> >     Attachment: 883.patch4.txt
> >
> > I found a problem in one of my changes. Also got rid of a lot of
> > white space only diffs.
>
> In the future it would be helpful if you could say what changed between
> versions
> of the patch, so that reviewers don't have to scour the patches to try to
> figure
> out what changed.  I managed to find out what "a problem" meant in this
> case,
> but it's generally a good idea to explicitly state the problem/changes in
> the
> comments.  Makes my life as a reviewer easier :)


OK. I will attach a  new patch shortly incorporating your comments and the
NPE that Yip found. Also, just as an fyi-- I will be away all of next week,
so if you have additional comments, I can only get to them, the second week
of august.

m

Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Posted by Army <qo...@gmail.com>.
Hi Manish,

Thanks for addressing my comments.  The patch is looking better now--especially 
with the whitespace issues in version 4.   Some follow-up comments below.

Manish Khettry wrote:
 >> 1) TernarnyOpNode.isEquivalent(): I think we need to check "receiver"
 >> equivalence as well, in addition to left and right operands.
 >
 > yes, you are right on this one. The new patch fixes this.

Thanks for fixing this.  If possible, could you add a test case to your JUnit 
test to demonstrate?  Could just be the query that I posted in my previous comments.

> Actually the existing error message no longer makes sense with group by
> expressions. For example, the line of code that you point to, also catches
> errors like this:
> select c1+1,  avg(c2) from test group by c1+10;
> Does the existing error message make sense for this query?

Not with this query, no.  But two things to note here:

1) Before your patch, if you replace "group by c1+10" with "group by c2" you'll 
get error 42Y36 saying that column reference C1 is invalid, even though it's 
actually part of an expression ("c1+1")--so that type of behavior *does* already 
exist, even if it doesn't really "make sense".  If you change the SQLSTATE and 
error message for this case, the result is a change (albeit a pretty small one) 
in customer apps, which is really the issue here.

2) The key expression in my previous comment was "in cases like this", where 
"like this" meant queries where the existing error message *does* make sense.

For example, with the following query it makes sense to include the column 
reference because we know what it is:

ij> select c1 from test group by c2;
ERROR 42Y36: Column reference 'C1' is invalid.  For a SELECT list with a GROUP 
BY, the list may only contain grouping columns and valid aggregate expressions.

And that's what we'll get before your patch.  But after your patch we a) get a 
different SQLSTATE and b) lose the column reference name:

ij> select c1 from test group by c2;
ERROR 42Y30: The SELECT list of a grouped query contains at least one invalid 
expression. If a SELECT list has a GROUP BY, the list may only contain valid 
grouping expressions and valid aggregate expressions.

So now we've changed the way an existing query behaves (by returning a different 
SQLSTATE)--which (I think?) means we need to mark this as "Existing Application 
Impact"--*and* we're now providing the user with less information.

If it's possible to figure out when we have a simple column reference (c1) 
verses an expression (c1+1) and to throw a corresponding error, I think that 
would be ideal.  But if you decide that it's not possible or not worth it to 
preserve the error for queries with simple column references, then feel free to 
leave this as it is.  If that's your decision, though, I think you should ask 
derby-dev if a different SQLSTATE for the same query counts as "Existing 
Application Impact", and if so, mark this issue as appropriate.

>> + return (operator.equals(other.operator)
>> +        || (operand == other.operand)
>> +        || ((operand != null) && operand.isEquivalent(other)));
> 
> If you look at bindUnaryOperator for UnaryOperatorNode, you will see that
> it is possible for operand to be null; not so in BinaryOperatorNode.

So the assumption here is that if (operand == other.operand) then both operands 
are null?  If so, a simple comment saying that could be helpful.

> In GroupByList.java:
<snip code snippet>
> 
> That is a good question-- this bit of code add invisible column references
> when a column reference in the group by list, is not in the projection 
> list;
> i.e.
> 
> select c2, sum(c3) from test group by c1;
> 
> will add c1 to the projection list as a generated column. I'm not sure if
> you want to do the same kind of rewrite for expressions. I thought this
> wasn't worth doing but I'm not sure anymore.

I just tried this query with and without your patch and it fails in both cases 
with error 42Y36/42Y30.  So I guess I'm a little confused as to why Derby would 
add "c1" to the projection list and what good it does us?

If you're not sure whether or not it's worth adding equivalent logic for 
expressions, do you plan to follow-up with this later?  If not, a comment 
explaining what you have found (or what your uncertainties are) might help other 
developers down the road (esp. if a bug comes up in this area later).

> Most if not all of the routines which are deleted were made redundant by
> this fix. For example, getColumnReference in GroupByColumn is not only
> unused it makes no sense with group by expressions. I think its good to get
> rid of this code but if you want, I'd be perfectly willing to add dead code
> back in the source tree.

I guess it wasn't clear to me how all of the deleted methods were made redundant 
by *this* fix.  In your comment when you posted the patch you just said "unused 
methods zapped", which sounded to me like you just happened to notice that the 
methods were unused and therefore deleted them.  That kind of spontaneous 
"cleanup" is, like the whitespace changes, something that should be 
posted/committed as a separate issue.  If, however, the methods are relevant and 
useful before your changes but redundant and deletable after your changes, then 
it's fine to have them in with this patch.

> I disagree with you here-- when a contributor has spent a considerable
> amount of time understanding and making a lot of changes to a class, it
> makes sense to remove dead code at the same time. If you don't, stuff like
> this will likely get ignored or slip through the cracks, imho.

I didn't mean to say that the changes were not useful nor that they shouldn't be 
committed; just that it might make it easier to track "cleanup" vs "functional" 
changes by filing a separate Jira issue to remove the dead code and posting an 
isolated patch to that issue.  The changes will still go in--no slipping through 
the cracks--but they will go in as part of their *own* Jira commit, which seems 
cleaner to me.

In any event, that was just a suggestion.  No need to let this prevent the 
commit of the patch.

> 7) There are a couple of places where you have commented lines out instead
> 
> yes, you're right. I've removed it.

There are still a couple of these lingering:

GroupByColumn.java:

  	public String getColumnName()
  	{
-		return colRef.getColumnName();
+		//return colRef.getColumnName();
+		return columnExpression.getColumnName();

ColumnReference.java:

+	protected boolean isEquivalent(ValueNode o) throws StandardException
+	{
+		if (!isSameNodeType(o)) {
+			return false;
+		}
+		ColumnReference other = (ColumnReference)o;
+		return (tableNumber == other.tableNumber
+				&& columnName.equals(other.getColumnName()));
+		//return source.isEquivalent(other.source);
+	}

>> A. Unnecessary white-space changes?
> 
> Atleast some of these were done to improve existing indentation. Next time,
> I'll resist the urge to improve readability.

Thanks for cleaning a bunch of these whitespace diffs up in the version four 
patch.  That said, there are still quite a few white-space changes left--did you 
intentionally leave those in?  Ex.

GroupByNode.java:

@@ -470,10 +489,10 @@
  			** bottom project restrict.
  			*/
  			newRC = (ResultColumn) getNodeFactory().getNode(
-										C_NodeTypes.RESULT_COLUMN,
-										"##aggregate result",
-										aggregate.getNewNullResultExpression(),
-										getContextManager());
+					C_NodeTypes.RESULT_COLUMN,
+					"##aggregate result",
+					aggregate.getNewNullResultExpression(),
+					getContextManager());
  			newRC.markGenerated();
  			newRC.bindResultColumnToExpression();
  			bottomRCL.addElement(newRC);

>> B. Lines longer than 80 chars:
> 
> I try hard to to keep things within 80 chars with tab stop set to 4 chars.
> Do you see any changes where this is not the case?

Again, this is a fairly minor thing--you don't have to waste too much time on 
this.  When I look at the patch I still seem some lines that are longer than 80 
chars with a 4-char tab, but as you said this can be editor-dependent.  So long 
as you're aware of the issue and trying to respect it, I'm not going to complain.

Some other things to note:

1 -- SubstituteExpressionVisitor.java:

  * The copyright says 1998, 2004; I think this is supposed to just say 2006 
since that's when the file was created.

2 -- I see that in version 4 of the patch you updated UnaryOperatorNode to have 
the following logic in isEquivalent():

+    		UnaryOperatorNode other = (UnaryOperatorNode)o;
+    		return (operator.equals(other.operator) &&
+			((operand == other.operand)||
+			 ((operand != null) && operand.isEquivalent(other.operand))));

Did you become aware of this problem because of some test(s) that you were 
running?  If so, it'd be great if you could add the actual test case(s) to your 
JUnit test for regression purposes.  The more tests there are to check for these 
kinds of subtleties, the better...

And finally:

 > Manish Khettry updated DERBY-883:
 > ---------------------------------
 >
 >     Attachment: 883.patch4.txt
 >
 > I found a problem in one of my changes. Also got rid of a lot of
 > white space only diffs.

In the future it would be helpful if you could say what changed between versions 
of the patch, so that reviewers don't have to scour the patches to try to figure 
out what changed.  I managed to find out what "a problem" meant in this case, 
but it's generally a good idea to explicitly state the problem/changes in the 
comments.  Makes my life as a reviewer easier :)

Thanks for continuing to improve these patches,
Army


Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Posted by Manish Khettry <ma...@gmail.com>.
On 7/24/06, Army <qo...@gmail.com> wrote:
>
> Hi Manish,
>
> I have to be honest and say that I haven't taken the time to fully
> understand
> the "guts" of this patch--namely, the part where you rewrite the group-by
> tree
> in GroupByNode.addAggregateColumns().  But I did do some review of the
> overall
> patch and I have the following comments.  If you can address these I think
> the
> resultant patch will be a lot easier to read (and will also be smaller :)
> so
> that I or anyone else reviewing can focus more on the "guts" of what this
> patch
> is really doing...
>
> ------------------------------------------------------------------
>
> 0) Patch doesn't apply cleanly with latest codeline--there appear to be
> conflicts with the JUnit test changes and also one conflict in
> sqlgrammar.jj.  I
> manually resolved these conflicts in my local codeline so that I could run
> some
> tests/view the code, but I think it'd be appropriate to post an updated
> patch
> (sync'd with the latest codeline) to ease the review process for others.


I have updated the bug with a new patch after resolving the conflict.

1) TernarnyOpNode.isEquivalent(): I think we need to check "receiver"
> equivalence as well, in addition to left and right operands.  For example,
> I did
> the following and I received an error as expected:



yes, you are right on this one. The new patch fixes this.



2) Aren't we losing information with this change?
>
> In VerifyAggregateExpressionsVisitor.java:
>
> -                               throw
> StandardException.newException(
> SQLState.LANG_INVALID_COL_REF_GROUPED_SELECT_LIST,
> cr.getSQLColumnName());
> +                               throw
> StandardException.newException(SQLState.LANG_INVALID_GROUPED_SELECT_LIST);
>
> Before your changes we see the column reference name:
>
> ij> select substr(vc, 3, 4) from s1 group by vc2;
> ERROR 42Y36: Column reference 'VC' is invalid.  For a SELECT list with a
> GROUP
> BY, the list may only contain grouping columns and valid aggregate
> expressions.
>
> But after the patch is applied, we lose the name of the invalid column
> reference:
>
> ij> select substr(vc, 3, 4) from s1 group by vc2;
> ERROR 42Y30: The SELECT list of a grouped query contains at least one
> invalid
> expression. If a SELECT list has a GROUP BY, the list may only contain
> valid
> grouping expressions and valid aggregate expressions.
>
> Generally it seems like a good idea to provide the user with as much
> information
> as possible for error conditions like this.  Is it possible to preserve
> the
> result column name in cases like this (i.e. cases where we do actually
> know what
> the invalid column reference is)?


Actually the existing error message no longer makes sense with group by
expressions. For example, the line of code that you point to, also catches
errors like this:
 select c1+1,  avg(c2) from test group by c1+10;
Does the existing error message make sense for this query?

3) In UnaryOperatorNode.java the check for equivalence of the operand has
> three
> parts to it:
>
> +       return (operator.equals(other.operator)
> +                       || (operand == other.operand)
> +                       || ((operand != null) && operand.isEquivalent
> (other)));
>
> But for other classes--such as BinaryOperatorNode.java--the check is only
> a
> single call to "isEquivalent" for each operand:
>
> +               return methodName.equals(other.methodName)
> +                      && leftOperand.isEquivalent(other.leftOperand)
> +                      && rightOperand.isEquivalent(other.rightOperand);
>
> Can you explain why the operand for UnaryOperatorNode has three checks
> (.equals(), "==", and .isEquivalent()) while the operands for
> BinaryOperatorNode
> only have a single check (.isEquivalent())?  Some comments to describe the
> difference might be useful in these classes.


If you look at bindUnaryOperator for UnaryOperatorNode, you will see that
it is possible for operand to be null; not so in BinaryOperatorNode.

3) I think there are several places in this patch where you have comments
> explaining "what" by not explaining "why"--and I think the latter would be
> helpful.  For example:
>
> In VerifyAggregateExpressionsVisitor.java:
>
> Why do we disallow expressions involving native java computation in the
> following code?
>
> +    } else if (node instanceof JavaToSQLValueNode)
> +    {
> +      // disallow any expression which involves native java computation.
> +      throw StandardException.newException( (groupByList == null) ?
> +          SQLState.LANG_INVALID_NON_GROUPED_SELECT_LIST :
> +          SQLState.LANG_INVALID_GROUPED_SELECT_LIST);


It is not possible to compare java expressions for equivalence. I will
update the comments.

Why do we not visit children under subqueries in the following code?
>
>         /**
> -        * Don't visit children under an aggregate
> +        * Don't visit children under an aggregate, subquery or any node
> which
> +        * is equivalent to any of the group by expressions.
>
> In SelectNode.java:


There is no code change here-- I just updated the comment to reflect
existing behavior. Honestly, I do't know why we do not visit subqueries.

Why do we need to add a call to preprocess in the following code?
>
> +    if (groupByList != null)
> +    {
> +      groupByList.preprocess(numTables, fromList, whereSubquerys,
> wherePredicates);
> +               }
> +
>
> For this one you can probably just use the comments that you already
> included in
> your patch description, namely: "This is needed because expressions can
> get
> rewritten in the select list but not in the grouping list causing problems
> when
> the result set is generated."  It'd be great if this explanation was
> included in
> the actual code.


Makes sense. done.

In GroupByNode.java:
>
> Why do we always say that the source is not in sorted order if we have an
> expression instead of a ColumnReference?
>
> -        crs[index] = gc.getColumnReference();
> +        if (gc.getColumnExpression() instanceof ColumnReference)
> +        {
> +          crs[index] = (ColumnReference)gc.getColumnExpression();
> +        }
> +        else
> +        {
> +          isInSortedOrder = false;
> +          break;
> +        }


If you look at how isInSortedOrder is used or look at the javadoc for
ResultSet#isOrderedOn, you will see that this optimization applies only for
column references.


Also, in the comment you posted to DERBY-883 with the patch you explained
> how
> the rewrite is different now:
>
> > o the rewrite logic is a bit different now. Earlier, we would change
> > each unaggregate ColumnReference in the select list and point it to the
> > GroupByColumn. Now we replace each group by expression that we find
> > in the projection list with a virtual column node that effectively
> points
> > to a result column in the result set doing the group by. In addition
> > the original routine which did the rewrite is now split up into two
> separate
> > and smaller routines: addUnAggColumns and addAggregateColumns.
>
> It would be great if this explanation could be added to the relevant parts
> in
> the code and/or if the comments for that section could be updated to match
> the
> new behavior.  This will make it easier for people to understand what is
> going
> on when they read the code several years from now.


OK,  done.

In GroupByList.java:
>
> Why don't we need to add a ResultColumn/ColumnReference pair to
> SelectNode's RCL
> when the grouping column is an expression (as opposed to a
> ColumnReference), as
> in the following code?
>
> @@ -185,7 +174,8 @@
>      /* If no match found in the SELECT list, then add a matching
>       * ResultColumn/ColumnReference pair to the SelectNode's RCL.
>       */
> -       if (! matchFound)
> +       if (! matchFound &&
> +           groupingCol.getColumnExpression() instanceof ColumnReference)
>         {
>                 ResultColumn newRC;


That is a good question-- this bit of code add invisible column references
when a column reference in the group by list, is not in the projection list;
i.e.

select c2, sum(c3) from test group by c1;

will add c1 to the projection list as a generated column. I'm not sure if
you want to do the same kind of rewrite for expressions. I thought this
wasn't worth doing but I'm not sure anymore.

These are just some examples; I think there are others too.  If you can look
> again at the patch and try to add comments explaining *why* we do or do
> not need
> to do things like this, that would be a huge help to those reading the
> code now
> and in the future.  These don't have to be extensive--even just a line or
> two
> can be very helpful.
>
> 4) In GroupByColumn and GroupByList you have deleted a bunch of methods
> that
> apparently are unused.  While this is a useful thing to do, I don't know
> if we
> want to do that as part of the DERBY-883 patch--it makes the diff bigger
> and
> doesn't add to the patch functionally.  Instead you could file a new Jira
> entry
> and attach a separate patch that removes the unnecessary methods to that
> Jira
> issue, to be reviewed/committed separately.  Or you could create a
> follow-up
> patch for this issue that just separates out the changes for removing
> these methods.


Most if not all of the routines which are deleted were made redundant by
this fix. For example, getColumnReference in GroupByColumn is not only
unused it makes no sense with group by expressions. I think its good to get
rid of this code but if you want, I'd be perfectly willing to add dead code
back in the source tree.

I disagree with you here-- when a contributor has spent a considerable
amount of time understanding and making a lot of changes to a class, it
makes sense to remove dead code at the same time. If you don't, stuff like
this will likely get ignored or slip through the cracks, imho.


5) Extraneous diff in QueryTreeNode.java?  Are any of these changes actually
> required or are they just there for debugging?  If they are an intentional
> part
> of the patch then a) comments explaining these changes would be helpful,
> and b)
> it might be good to actually delete the commented out lines altogether,
> instead
> of leaving them in to clutter the code.


I've removed this from the new patch.

6) New methods could use javadoc explaining in more detail what they do:
>
> GroupByNode.addUnAggColumns()
> GroupByNode.addAggregateColumns()


OK, done.

7) There are a couple of places where you have commented lines out instead
> of
> deleting them.  If there's a reason you left these in, then comments
> explaining
> that would be great; otherwise I think it's cleaner to just delete the
> lines
> altogether.  For example:
>
> In GroupByColumn.java:
>
> @@ -48,7 +48,8 @@
>          */
>         public void init(Object colRef)
>         {
> -               this.colRef = (ColumnReference) colRef;
> +               //this.colRef = (ColumnReference) colRef;
> +               this.columnExpression = (ValueNode)colRef;
>         }
>
>         /**


yes, you're right. I've removed it.

8) I think the diff for master/subquery.out has some stuff from a failed
> merge
> in it:


yes, this is an old diff hanging around; removed it.



> A. Unnecessary white-space changes?


Atleast some of these were done to improve existing indentation. Next time,
I'll resist the urge to improve readability.


> B. Lines longer than 80 chars:


I try hard to to keep things within 80 chars with tab stop set to 4 chars.
Do you see any changes where this is not the case?

Thanks for taking the time to go through these changes. I've attached a new
patch which incorporates most, if not all of your feedback. Other minor
changes from the first patch-- moved the junit test case to a different
package, and added more javadoc comments to the new method in
BaseJDBCTestCase and made the indentation conform to the one in this file
which does not seem to use tabs.

Thanks,
Manish

Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Posted by Army <qo...@gmail.com>.
Manish Khettry wrote:
> On the point of 80 chars per line [...]

As I mentioned, this was just me being picky.  Instead of battling out the tab 
question, feel free to leave the lines as they are if they look good in your 
editor.  This isn't something that should block/hinder your patch, just a little 
thing I noticed when reviewing and thought I'd bring up...*shrug*

Army


Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Posted by Manish Khettry <ma...@yahoo.com>.
Army,

Thanks for the feeback. I appreciate it. You've
identified atleast one (perhaps 2) bugs with my code
changes. 

On the point of 80 chars per line, I can't help but
ask: does a tab count as 4 chars or 8?
The conventional wisdom from an earlier discussion was
that you set your editor to treat a tab as 4 chars
which I did and the offending lines that you point to
were well within the 80 char margin on my editor.

m

--- Army <qo...@gmail.com> wrote:

> Hi Manish,
> 
> I have to be honest and say that I haven't taken the
> time to fully understand 
> the "guts" of this patch--namely, the part where you
> rewrite the group-by tree 
> in GroupByNode.addAggregateColumns().  But I did do
> some review of the overall 
> patch and I have the following comments.  If you can
> address these I think the 
> resultant patch will be a lot easier to read (and
> will also be smaller :) so 
> that I or anyone else reviewing can focus more on
> the "guts" of what this patch 
> is really doing...
> 
>
------------------------------------------------------------------
> 
> 0) Patch doesn't apply cleanly with latest
> codeline--there appear to be 
> conflicts with the JUnit test changes and also one
> conflict in sqlgrammar.jj.  I 
> manually resolved these conflicts in my local
> codeline so that I could run some 
> tests/view the code, but I think it'd be appropriate
> to post an updated patch 
> (sync'd with the latest codeline) to ease the review
> process for others.
> 
> 1) TernarnyOpNode.isEquivalent(): I think we need to
> check "receiver" 
> equivalence as well, in addition to left and right
> operands.  For example, I did 
> the following and I received an error as expected:
> 
> ij> create table s1 (vc varchar(20), vc2
> varchar(10));
> 0 rows inserted/updated/deleted
> 
> ij> insert into s1 values ('allo', 'bye'), ('inigo',
> 'montoya'), ('six','fingers');
> 3 rows inserted/updated/deleted
> 
> ij> select substr(vc, 3, 4) from s1 group by vc2;
> ERROR 42Y30: The SELECT list of a grouped query
> contains at least one invalid 
> expression. If a SELECT list has a GROUP BY, the
> list may only contain valid 
> grouping expressions and valid aggregate
> expressions.
> 
> I then I did the following, expecting to see the
> same error--but this actually 
> works:
> 
> ij> select substr(vc, 3, 4) from s1 group by
> substr(vc2, 3, 4);
> 1
> ----
> igo
> lo
> x
> 
> 3 rows selected
> 
> I think this is because
> TernaryOperatorNode.isEquivalent() doesn't compare 
> "receiver" and thus thinks that vc and vc2 are the
> same, hence allowing the 
> group by.  Is this supposed to work or is it
> supposed to throw an error?  My 
> understanding is that it should fail, but I could of
> course be wrong...
> 
> 2) Aren't we losing information with this change?
> 
> In VerifyAggregateExpressionsVisitor.java:
> 
> -				throw
>
StandardException.newException(SQLState.LANG_INVALID_COL_REF_GROUPED_SELECT_LIST,
> 
> cr.getSQLColumnName());
> +				throw
>
StandardException.newException(SQLState.LANG_INVALID_GROUPED_SELECT_LIST);
> 
> Before your changes we see the column reference
> name:
> 
> ij> select substr(vc, 3, 4) from s1 group by vc2;
> ERROR 42Y36: Column reference 'VC' is invalid.  For
> a SELECT list with a GROUP 
> BY, the list may only contain grouping columns and
> valid aggregate expressions.
> 
> But after the patch is applied, we lose the name of
> the invalid column reference:
> 
> ij> select substr(vc, 3, 4) from s1 group by vc2;
> ERROR 42Y30: The SELECT list of a grouped query
> contains at least one invalid 
> expression. If a SELECT list has a GROUP BY, the
> list may only contain valid 
> grouping expressions and valid aggregate
> expressions.
> 
> Generally it seems like a good idea to provide the
> user with as much information 
> as possible for error conditions like this.  Is it
> possible to preserve the 
> result column name in cases like this (i.e. cases
> where we do actually know what 
> the invalid column reference is)?
> 
> 3) In UnaryOperatorNode.java the check for
> equivalence of the operand has three 
> parts to it:
> 
> +    	return (operator.equals(other.operator)
> +    			|| (operand == other.operand)
> +    			|| ((operand != null) &&
> operand.isEquivalent(other)));
> 
> But for other classes--such as
> BinaryOperatorNode.java--the check is only a 
> single call to "isEquivalent" for each operand:
> 
> +        	return methodName.equals(other.methodName)
> +        	       &&
> leftOperand.isEquivalent(other.leftOperand)
> +        	       &&
> rightOperand.isEquivalent(other.rightOperand);
> 
> Can you explain why the operand for
> UnaryOperatorNode has three checks 
> (.equals(), "==", and .isEquivalent()) while the
> operands for BinaryOperatorNode 
> only have a single check (.isEquivalent())?  Some
> comments to describe the 
> difference might be useful in these classes.
> 
> 3) I think there are several places in this patch
> where you have comments 
> explaining "what" by not explaining "why"--and I
> think the latter would be 
> helpful.  For example:
> 
> In VerifyAggregateExpressionsVisitor.java:
> 
> Why do we disallow expressions involving native java
> computation in the 
> following code?
> 
> +    } else if (node instanceof JavaToSQLValueNode)
> +    {
> +      // disallow any expression which involves
> native java computation.
> +      throw StandardException.newException(
> (groupByList == null) ?
> +         
> SQLState.LANG_INVALID_NON_GROUPED_SELECT_LIST :
> +         
> SQLState.LANG_INVALID_GROUPED_SELECT_LIST);
> 
> Why do we not visit children under subqueries in the
> following code?
> 
>   	/**
> -	 * Don't visit children under an aggregate
> +	 * Don't visit children under an aggregate,
> subquery or any node which
> +	 * is equivalent to any of the group by
> expressions.
> 
> In SelectNode.java:
> 
> Why do we need to add a call to preprocess in the
> following code?
> 
> +    if (groupByList != null)
> +    {
> +      groupByList.preprocess(numTables, fromList,
> whereSubquerys, wherePredicates);
> +		}
> +		
> 
> For this one you can probably just use the comments
> that you already included in 
> your patch description, namely: "This is needed
> because expressions can get 
> rewritten in the select list but not in the grouping
> list causing problems when 
> the result set is generated."  It'd be great if this
> explanation was included in 
> the actual code.
> 
> In GroupByNode.java:
> 
> Why do we always say that the source is not in
> sorted order if we have an 
> expression instead of a ColumnReference?
> 
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Posted by Army <qo...@gmail.com>.
Hi Manish,

I have to be honest and say that I haven't taken the time to fully understand 
the "guts" of this patch--namely, the part where you rewrite the group-by tree 
in GroupByNode.addAggregateColumns().  But I did do some review of the overall 
patch and I have the following comments.  If you can address these I think the 
resultant patch will be a lot easier to read (and will also be smaller :) so 
that I or anyone else reviewing can focus more on the "guts" of what this patch 
is really doing...

------------------------------------------------------------------

0) Patch doesn't apply cleanly with latest codeline--there appear to be 
conflicts with the JUnit test changes and also one conflict in sqlgrammar.jj.  I 
manually resolved these conflicts in my local codeline so that I could run some 
tests/view the code, but I think it'd be appropriate to post an updated patch 
(sync'd with the latest codeline) to ease the review process for others.

1) TernarnyOpNode.isEquivalent(): I think we need to check "receiver" 
equivalence as well, in addition to left and right operands.  For example, I did 
the following and I received an error as expected:

ij> create table s1 (vc varchar(20), vc2 varchar(10));
0 rows inserted/updated/deleted

ij> insert into s1 values ('allo', 'bye'), ('inigo', 'montoya'), ('six','fingers');
3 rows inserted/updated/deleted

ij> select substr(vc, 3, 4) from s1 group by vc2;
ERROR 42Y30: The SELECT list of a grouped query contains at least one invalid 
expression. If a SELECT list has a GROUP BY, the list may only contain valid 
grouping expressions and valid aggregate expressions.

I then I did the following, expecting to see the same error--but this actually 
works:

ij> select substr(vc, 3, 4) from s1 group by substr(vc2, 3, 4);
1
----
igo
lo
x

3 rows selected

I think this is because TernaryOperatorNode.isEquivalent() doesn't compare 
"receiver" and thus thinks that vc and vc2 are the same, hence allowing the 
group by.  Is this supposed to work or is it supposed to throw an error?  My 
understanding is that it should fail, but I could of course be wrong...

2) Aren't we losing information with this change?

In VerifyAggregateExpressionsVisitor.java:

-				throw
StandardException.newException(SQLState.LANG_INVALID_COL_REF_GROUPED_SELECT_LIST, 
cr.getSQLColumnName());
+				throw
StandardException.newException(SQLState.LANG_INVALID_GROUPED_SELECT_LIST);

Before your changes we see the column reference name:

ij> select substr(vc, 3, 4) from s1 group by vc2;
ERROR 42Y36: Column reference 'VC' is invalid.  For a SELECT list with a GROUP 
BY, the list may only contain grouping columns and valid aggregate expressions.

But after the patch is applied, we lose the name of the invalid column reference:

ij> select substr(vc, 3, 4) from s1 group by vc2;
ERROR 42Y30: The SELECT list of a grouped query contains at least one invalid 
expression. If a SELECT list has a GROUP BY, the list may only contain valid 
grouping expressions and valid aggregate expressions.

Generally it seems like a good idea to provide the user with as much information 
as possible for error conditions like this.  Is it possible to preserve the 
result column name in cases like this (i.e. cases where we do actually know what 
the invalid column reference is)?

3) In UnaryOperatorNode.java the check for equivalence of the operand has three 
parts to it:

+    	return (operator.equals(other.operator)
+    			|| (operand == other.operand)
+    			|| ((operand != null) && operand.isEquivalent(other)));

But for other classes--such as BinaryOperatorNode.java--the check is only a 
single call to "isEquivalent" for each operand:

+        	return methodName.equals(other.methodName)
+        	       && leftOperand.isEquivalent(other.leftOperand)
+        	       && rightOperand.isEquivalent(other.rightOperand);

Can you explain why the operand for UnaryOperatorNode has three checks 
(.equals(), "==", and .isEquivalent()) while the operands for BinaryOperatorNode 
only have a single check (.isEquivalent())?  Some comments to describe the 
difference might be useful in these classes.

3) I think there are several places in this patch where you have comments 
explaining "what" by not explaining "why"--and I think the latter would be 
helpful.  For example:

In VerifyAggregateExpressionsVisitor.java:

Why do we disallow expressions involving native java computation in the 
following code?

+    } else if (node instanceof JavaToSQLValueNode)
+    {
+      // disallow any expression which involves native java computation.
+      throw StandardException.newException( (groupByList == null) ?
+          SQLState.LANG_INVALID_NON_GROUPED_SELECT_LIST :
+          SQLState.LANG_INVALID_GROUPED_SELECT_LIST);

Why do we not visit children under subqueries in the following code?

  	/**
-	 * Don't visit children under an aggregate
+	 * Don't visit children under an aggregate, subquery or any node which
+	 * is equivalent to any of the group by expressions.

In SelectNode.java:

Why do we need to add a call to preprocess in the following code?

+    if (groupByList != null)
+    {
+      groupByList.preprocess(numTables, fromList, whereSubquerys, wherePredicates);
+		}
+		

For this one you can probably just use the comments that you already included in 
your patch description, namely: "This is needed because expressions can get 
rewritten in the select list but not in the grouping list causing problems when 
the result set is generated."  It'd be great if this explanation was included in 
the actual code.

In GroupByNode.java:

Why do we always say that the source is not in sorted order if we have an 
expression instead of a ColumnReference?

-        crs[index] = gc.getColumnReference();
+        if (gc.getColumnExpression() instanceof ColumnReference)
+        {
+          crs[index] = (ColumnReference)gc.getColumnExpression();
+        }
+        else
+        {
+          isInSortedOrder = false;
+          break;
+        }

Also, in the comment you posted to DERBY-883 with the patch you explained how 
the rewrite is different now:

> o the rewrite logic is a bit different now. Earlier, we would change
> each unaggregate ColumnReference in the select list and point it to the
> GroupByColumn. Now we replace each group by expression that we find
> in the projection list with a virtual column node that effectively points
> to a result column in the result set doing the group by. In addition
> the original routine which did the rewrite is now split up into two separate
> and smaller routines: addUnAggColumns and addAggregateColumns.

It would be great if this explanation could be added to the relevant parts in 
the code and/or if the comments for that section could be updated to match the 
new behavior.  This will make it easier for people to understand what is going 
on when they read the code several years from now.

In GroupByList.java:

Why don't we need to add a ResultColumn/ColumnReference pair to SelectNode's RCL 
when the grouping column is an expression (as opposed to a ColumnReference), as 
in the following code?

@@ -185,7 +174,8 @@
     /* If no match found in the SELECT list, then add a matching
      * ResultColumn/ColumnReference pair to the SelectNode's RCL.
      */
-	if (! matchFound)
+	if (! matchFound &&
+	    groupingCol.getColumnExpression() instanceof ColumnReference)
  	{
  		ResultColumn newRC;

These are just some examples; I think there are others too.  If you can look 
again at the patch and try to add comments explaining *why* we do or do not need 
to do things like this, that would be a huge help to those reading the code now 
and in the future.  These don't have to be extensive--even just a line or two 
can be very helpful.

4) In GroupByColumn and GroupByList you have deleted a bunch of methods that 
apparently are unused.  While this is a useful thing to do, I don't know if we 
want to do that as part of the DERBY-883 patch--it makes the diff bigger and 
doesn't add to the patch functionally.  Instead you could file a new Jira entry 
and attach a separate patch that removes the unnecessary methods to that Jira 
issue, to be reviewed/committed separately.  Or you could create a follow-up 
patch for this issue that just separates out the changes for removing these methods.

5) Extraneous diff in QueryTreeNode.java?  Are any of these changes actually 
required or are they just there for debugging?  If they are an intentional part 
of the patch then a) comments explaining these changes would be helpful, and b) 
it might be good to actually delete the commented out lines altogether, instead 
of leaving them in to clutter the code.

6) New methods could use javadoc explaining in more detail what they do:

GroupByNode.addUnAggColumns()
GroupByNode.addAggregateColumns()

7) There are a couple of places where you have commented lines out instead of 
deleting them.  If there's a reason you left these in, then comments explaining 
that would be great; otherwise I think it's cleaner to just delete the lines 
altogether.  For example:

In GroupByColumn.java:

@@ -48,7 +48,8 @@
  	 */
  	public void init(Object colRef)
  	{
-		this.colRef = (ColumnReference) colRef;
+		//this.colRef = (ColumnReference) colRef;
+		this.columnExpression = (ValueNode)colRef;
  	}

  	/**

8) I think the diff for master/subquery.out has some stuff from a failed merge 
in it:

Index: java/testing/org/apache/derbyTesting/functionTests/master/subquery.out
===================================================================
--- java/testing/org/apache/derbyTesting/functionTests/master/subquery.out
(revision 422486)
+++ java/testing/org/apache/derbyTesting/functionTests/master/subquery.out
(working copy)
@@ -1016,8 +1016,14 @@
  					Number of rows visited=9
  					Scan type=heap
  					start position:
+<<<<<<< .mine
  null					stop position:
+null
+					scan qualifiers:
+=======
+null					stop position:
  null					scan qualifiers:
+>>>>>>> .r422486
  None
  					next qualifiers:
  None

**** Nit-picking: Changes that you don't *have* to do but that might help in 
cleaning up the patch.

A. Unnecessary white-space changes?

In SelectNode.java:

-  			VerifyAggregateExpressionsVisitor visitor =
-  				new VerifyAggregateExpressionsVisitor(groupByList);
+			VerifyAggregateExpressionsVisitor visitor =
+				new VerifyAggregateExpressionsVisitor(groupByList);
  			resultColumns.accept(visitor);
-		}
-
+        }
+
+
+		

In GroupByNode.java:

-										C_NodeTypes.RESULT_COLUMN_LIST,
-										getContextManager()),
-					((FromTable) childResult).getTableNumber());
+							C_NodeTypes.RESULT_COLUMN_LIST,
+							getContextManager()),
+				((FromTable) childResult).getTableNumber());

...

-		** For each aggregate
-		*/
+		 ** For each aggregate
+		 */

...

-
+			
  			/*
-			** AGG RESULT: Set the aggregate result to null in the
-			** bottom project restrict.
-			*/
+			 ** AGG RESULT: Set the aggregate result to null in the
+			 ** bottom project restrict.
+			 */

... etc.

There are quite a few white-space changes in this patch that make it hard to 
figure out what has actually changed.

B. Lines longer than 80 chars:

In SelectNode.java:

@@ -965,6 +956,11 @@
  								   wherePredicates);
  		}

+		if (groupByList != null)
+		{
+			groupByList.preprocess(numTables, fromList, whereSubquerys, wherePredicates);
+		}
+		

In VerifyAggregateExpressionsVisitor.java:

+		return ((node instanceof AggregateNode) ||
+				(node instanceof SubqueryNode) ||
+				(node instanceof ValueNode &&
+						groupByList != null
+						&& groupByList.findGroupingColumn((ValueNode)node) != null));

I know, this whole 80 character thing is a bit picky--but that seems to be the 
rule adopted by most developers in the Derby code, so I'm just bringing it up...

------------------------------------------------------------------

I know that's a lot of feedback and very little of it has to do with the actual 
correctness of the code--so my apologies if this is a bit annoying.  But I do 
think that some additional comments and an effort to reduce the extra diffs will 
make this a cleaner patch and will help developers in the long and short-term 
who might find themselves reading this code.

Functionally, this is a great patch--an excellent feature enhancement for Derby. 
  Thank you so much for taking the time to figure out all of the details--and 
for writing such extensive tests, as well.  I hope this feedback doesn't 
discourage you from continuing to work on this or other Derby changes: I'm just 
posting feedback based on my own opinions.  If you disagree with anything I've 
written here, please feel free to speak up!

Thanks again for all of the work on this,
Army