You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@openjpa.apache.org by "Daniel Lee (JIRA)" <ji...@apache.org> on 2007/08/10 03:43:42 UTC

[jira] Created: (OPENJPA-312) derby fails with duplicate primary key(s) in group by list

derby fails with duplicate primary key(s) in group by list
----------------------------------------------------------

                 Key: OPENJPA-312
                 URL: https://issues.apache.org/jira/browse/OPENJPA-312
             Project: OpenJPA
          Issue Type: Bug
          Components: sql
    Affects Versions: 1.0.0
            Reporter: Daniel Lee
            Priority: Minor
             Fix For: 1.0.0


derby fails with duplicate primary key(s) in group by list

With query "select o.customer, avg(o.amount) from Order o group by o.customer" the push-down query contains duplicate columns in the group by clause.  This is okay when DB2 and other DB that tolerate the duplicates but Derby returns error.

Of course, we can ask fix on Derby but we can also easy fix in OpenJPA to avoid duplicates in the group by list.  Please refer to the following for the error result and the attach patch for the fix.

Output from running the query that generate duplicate in the group by list:
6429  demo  TRACE  [main] openjpa.Query - Executing query: select o.customer, avg(o.amount) from Order o group by o.customer
6639  demo  TRACE  [main] openjpa.jdbc.SQL - <t 1094861122, conn 1639735740> executing prepstmnt 1405375428 SELECT t1.countryCode, t1.id, t1.version, t1.city, t1.state, t1.street, t1.zip, t1.creditRating, t1.name, AVG(t0.amount) FROM Order t0 INNER JOIN Customer t1 ON t0.customer_countryCode = t1.countryCode AND t0.customer_id = t1.id GROUP BY t1.countryCode, t1.id, t1.version, t1.countryCode, t1.id, t1.city, t1.state, t1.street, t1.zip, t1.countryCode, t1.id, t1.creditRating, t1.name 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (OPENJPA-312) derby fails with duplicate primary key(s) in group by list

Posted by Daniel Lee <ts...@gmail.com>.

Per Kevin's question/suggestions and problem found by David, a new patch
is attached here for review.

Thanks to Kavin for the comments.  The code using sql or sqlGetSQL() is
based on the type of the arguments for the column(s).  sqlGetSQL() call is
needed to get the String of the column if the column is passed in as a
SQLBuffer.

For making the suggested code working, I have to add the checking of the
existence of the column into the logic as the following:

        getJoins(joins, true);
        if (_grouping == null) {
            _grouping = new SQLBuffer(_dict);
            if (_grouping.getSQL().indexOf(sql) < 0)
                _grouping.append(sql);
       }
       else if (_grouping.getSQL().indexOf(sql) < 0)
            _grouping.append(", " + sql);
This may not be cheaper than testing if (_grouping == null)...  I have
modified the code to avoid overlooking any substring of a column as an
existing column and attach the patch here.  Please let me know if you have
further concerns.  Thanks.

Here is part of the changes...

    public void groupBy(SQLBuffer sql, Joins joins) {
        getJoins(joins, true);
        if (!columnInGrouping(sql.getSQL())) {
            if (_grouping == null)
                _grouping = new SQLBuffer(_dict);
            else
                _grouping.append(", ");
            _grouping.append(sql.getSQL());
        }
    }
    private boolean columnInGrouping(String col) {
        if (_grouping == null)
            return false;

        StringTokenizer st = new StringTokenizer(_grouping.getSQL(),
                " ,\t\n\r\f");
        while (st.hasMoreTokens()) {
            if (StringUtils.equals(st.nextToken(), col))
                return true;
        }
        return false;
    }

Thanks,
Daniel

On 8/10/07, Kevin Sutter (JIRA) <ji...@apache.org> wrote:
>
>
>    [
> https://issues.apache.org/jira/browse/OPENJPA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519064]
>
> Kevin Sutter commented on OPENJPA-312:
> --------------------------------------
>
> Daniel,
> The basic idea of the patch looks okay, but the code changes do not look
> consistent.  In some cases, you use "sql" and in others, you are using "
> sql.getSQL()".  I would prefer just using "sql".
>
> Also, you check for an empty _grouping after you check if the desired
> group sub-string is already present in the _grouping.  This seems
> backwards.  I know it's only being used to determine whether to use a comma
> or not.  So, instead of doing these extra empty checks, could we do
> something along the following.  I think this accomplishes the same goal.
>
>         getJoins(joins, true);
>         if (_grouping == null) {
>             _grouping = new SQLBuffer(_dict);
>             _grouping.append(sql);
>        }
>        else if (_grouping.getSQL().indexOf(sql) < 0)
>             _grouping.append(", " + sql);
>
> Thanks,
> Kevin
>
>
>
> > derby fails with duplicate primary key(s) in group by list
> > ----------------------------------------------------------
> >
> >                 Key: OPENJPA-312
> >                 URL: https://issues.apache.org/jira/browse/OPENJPA-312
> >             Project: OpenJPA
> >          Issue Type: Bug
> >          Components: sql
> >    Affects Versions: 1.0.0
> >            Reporter: Daniel Lee
> >            Assignee: Daniel Lee
> >            Priority: Minor
> >             Fix For: 1.0.0
> >
> >         Attachments: OPENJPA-312.patch
> >
> >
> > derby fails with duplicate primary key(s) in group by list
> > With query "select o.customer, avg(o.amount) from Order o group by
> o.customer" the push-down query contains duplicate columns in the group by
> clause.  This is okay when DB2 and other DB that tolerate the duplicates but
> Derby returns error.
> > Of course, we can ask fix on Derby but we can also easy fix in OpenJPA
> to avoid duplicates in the group by list.  Please refer to the following for
> the error result and the attach patch for the fix.
> > Output from running the query that generate duplicate in the group by
> list:
> > 6429  demo  TRACE  [main] openjpa.Query - Executing query: select
> o.customer, avg(o.amount) from Order o group by o.customer
> > 6639  demo  TRACE  [main] openjpa.jdbc.SQL - <t 1094861122, conn
> 1639735740> executing prepstmnt 1405375428 SELECT t1.countryCode, t1.id,
> t1.version, t1.city, t1.state, t1.street, t1.zip, t1.creditRating, t1.name,
> AVG(t0.amount) FROM Order t0 INNER JOIN Customer t1 ON
> t0.customer_countryCode = t1.countryCode AND t0.customer_id = t1.id GROUP
> BY t1.countryCode, t1.id, t1.version, t1.countryCode, t1.id, t1.city,
> t1.state, t1.street, t1.zip, t1.countryCode, t1.id, t1.creditRating,
> t1.name
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Commented: (OPENJPA-312) derby fails with duplicate primary key(s) in group by list

Posted by Kevin Sutter <kw...@gmail.com>.

Daniel,

On 8/13/07, Daniel Lee <ts...@gmail.com> wrote:
>
> The problem with indexOf() is with a column name as a substring of an
> existing ones.  For example, BC is part of ABCD and using indexOf will
> overlook and miss the column BC if we don't check the leading and tailing
> characters of the target column name.  Tokenizer is then chosen for this
> reason.


Good reasons for not using indexOf().  Nice catch.  But, I still don't like
the use of StringTokenizer.   Maybe my information is out of date, but this
seems like a lot of overkill for this particular usage.  I noticed that we
have other uses of StringTokenizer in the OpenJPA code, but these uses are
for isolated, post-processing of strings.  In your proposed usage,
StringTokenizer would be called every time a new column is added to the
order by grouping.  Your simple example would mean 13 instantiations and
associated method invocations.  This seems excessive.

A better fundamental change should be keeping a List of column objects just
> like rderBY which keeps more context (semantic information) for better
> query
> processing but the fixing cost and risk is much higher and may not worth
> it
> on processing group by.


Looking at the code, this doesn't seem like too bad of an idea.  Actually,
if you just made it part of the SQLBuffer class, then it could optionally be
used by anybody using the SQLBuffer class.  Not just for the order by
grouping.  Or, you could just use a separate _groupingMap to keep track of
the tokens in your _grouping string.  This seems relatively straightforward
and a lot less expensive.

Thanks,
Kevin

Thanks,
> Daniel
>
>
> On 8/13/07, Kevin Sutter (JIRA) <ji...@apache.org> wrote:
> >
> >
> >    [
> >
> https://issues.apache.org/jira/browse/OPENJPA-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519456
> ]
> >
> > Kevin Sutter commented on OPENJPA-312:
> > --------------------------------------
> >
> > Daniel,
> > The code is much cleaner now.  Thanks.  But, I don't think the
> performance
> > of StringTokenizer is worth any advantages it might provide.  Is there
> some
> > reason why the previous mechanism of just using indexOf() wasn't
> > sufficient?  I like the idea of isolating this check to a single boolean
> > method like you have, but I would like to see a better performing
> > implementation.
> >
> > Thanks,
> > Kevin
> >
> > > derby fails with duplicate primary key(s) in group by list
> > > ----------------------------------------------------------
> > >
> > >                 Key: OPENJPA-312
> > >                 URL: https://issues.apache.org/jira/browse/OPENJPA-312
> > >             Project: OpenJPA
> > >          Issue Type: Bug
> > >          Components: sql
> > >    Affects Versions: 1.0.0
> > >            Reporter: Daniel Lee
> > >            Assignee: Daniel Lee
> > >            Priority: Minor
> > >             Fix For: 1.0.0
> > >
> > >         Attachments: OPENJPA-312.patch, OPENJPA-312.patch
> > >
> > >
> > > derby fails with duplicate primary key(s) in group by list
> > > With query "select o.customer, avg(o.amount) from Order o group by
> > o.customer" the push-down query contains duplicate columns in the group
> by
> > clause.  This is okay when DB2 and other DB that tolerate the duplicates
> but
> > Derby returns error.
> > > Of course, we can ask fix on Derby but we can also easy fix in OpenJPA
> > to avoid duplicates in the group by list.  Please refer to the following
> for
> > the error result and the attach patch for the fix.
> > > Output from running the query that generate duplicate in the group by
> > list:
> > > 6429  demo  TRACE  [main] openjpa.Query - Executing query: select
> > o.customer, avg(o.amount) from Order o group by o.customer
> > > 6639  demo  TRACE  [main] openjpa.jdbc.SQL - <t 1094861122, conn
> > 1639735740> executing prepstmnt 1405375428 SELECT t1.countryCode, t1.id,
> > t1.version, t1.city, t1.state, t1.street, t1.zip, t1.creditRating,
> t1.name,
> > AVG(t0.amount) FROM Order t0 INNER JOIN Customer t1 ON
> > t0.customer_countryCode = t1.countryCode AND t0.customer_id = t1.idGROUP
> > BY t1.countryCode, t1.id, t1.version, t1.countryCode, t1.id, t1.city,
> > t1.state, t1.street, t1.zip, t1.countryCode, t1.id, t1.creditRating,
> > t1.name
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>