You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openjpa.apache.org by Michael Dick <mi...@gmail.com> on 2009/10/28 23:03:09 UTC

How should we handle the JPQL DISTINCT keyword

Hi all,

Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
DISTINCT keyword. So the following
JPQL:  "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
may result in the following SQL
SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0 WHERE
t0.title = 'Gone Sailing'"

This works fine for most queries, but when I was looking into OPENJPA-894 I
noticed a problem with some relationships and the JOIN FETCH clause. A JOIN
FETCH looking like this :
JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
Sailing'"
Has two effects :
   1. It eagerly loads b.subjects
   2. It returns (potentially) multiple references to the same book. One
reference to Book(id=1) for every subject associated with Book(id=1).

The resulting SQL may look like this (in this case Book is MxM with Subject)
:
SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name FROM
LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID INNER
JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
Fishing'"

Lets say that the Book with title = 'Gone Sailing' has two subjects :
Outdoors and Sportsman. In that case the SQL will return two rows that look
something like this :

OID    DUEDATE    TITLE          BOOKS_OID  OID  NAME
----   ---------- ---------      ---------- ---- -----------
3      2009-11-11 Gone Fishing   3          12   Outdoors
3      2009-11-11 Gone Fishing   3          13   Sportsman

The fix for OPENJPA-894 generates a result list with two references to the
same Book(id=3).

If you only wanted eager fetching of b.subjects, and didn't want duplicates
a good first guess would be to add the DISTINCT keyword (I'm finally getting
back to the subject)

JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title =
'Gone Sailing'"
SQL :  "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid,
t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
t0.title = 'Gone Fishing'"

We'll still get the same two rows because the SQL DISTINCT keyword is
applied to all permutations of the columns - not just the oid (and if it did
only apply to the oid we wouldn't be eagerly loading b.subjects anyway). As
a result I think we'll have to use a SetBackedList (or otherwise enforce
distinct results after getting rows from SQL) as our ResultList.

I have a patch that does this, and resolves the use case I described above
(it's in the TestLibService unit test), but I'm open to any other ideas for
ways to resolve the problem.

Thanks,
-mike

Re: How should we handle the JPQL DISTINCT keyword

Posted by Michael Dick <mi...@gmail.com>.
Hi Pinaki,

Thanks for reading through my verbose problem description and for your
comments!

This particular issue turned up when working with a WebSphere customer and
they have been reluctant to use vendor specific options like FetchPlans in
the past. I can propose FetchPlans as a solution again, but I want to have a
vendor neutral answer ready if they push back.

As a developer I appreciate the flexibility of FetchPlans, but I've found
that using them in this manner results in a additional SQL statements. With
the simple example I posted above FetchPlans generate 4 SQL statements and
take slightly longer than the JOIN FETCH's 2 statements. One developer's
laptop does not make a valid benchmark though :-)

-mike

On Wed, Oct 28, 2009 at 9:11 PM, Pinaki Poddar <pp...@apache.org> wrote:

>
> Hi Mike,
>  Very good description of the problem.
>  JOIN FETCH is one place where the assumption that a object-oriented query
> is same as a row-based query shows its strain. DISTINCT in JPQL referred to
> b, while DISTINCT in SQL referred to the row it selects -- and b is not a
> row but the root of an object graph!
>
>  SetBackedList is indeed a good idea.
>
>  Other option is to drop the JOIN FETCH clauses altogether from part of the
> query. But to add them to the FetchPlan. Then query
> JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
> Sailing'"
> is effectively
>  "SELECT b from Book b WHERE b.title = 'Gone Sailing'"
>  fetch.add(Book.class, "subjects");
>
>  Do not how much trouble to tweak OpenJPA that way though!
>
>
>
>
> Michael Dick wrote:
> >
> > Hi all,
> >
> > Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
> > DISTINCT keyword. So the following
> > JPQL:  "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
> > may result in the following SQL
> > SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0
> > WHERE
> > t0.title = 'Gone Sailing'"
> >
> > This works fine for most queries, but when I was looking into OPENJPA-894
> > I
> > noticed a problem with some relationships and the JOIN FETCH clause. A
> > JOIN
> > FETCH looking like this :
> > JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
> > Sailing'"
> > Has two effects :
> >    1. It eagerly loads b.subjects
> >    2. It returns (potentially) multiple references to the same book. One
> > reference to Book(id=1) for every subject associated with Book(id=1).
> >
> > The resulting SQL may look like this (in this case Book is MxM with
> > Subject)
> > :
> > SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name
> > FROM
> > LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID
> INNER
> > JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
> > Fishing'"
> >
> > Lets say that the Book with title = 'Gone Sailing' has two subjects :
> > Outdoors and Sportsman. In that case the SQL will return two rows that
> > look
> > something like this :
> >
> > OID    DUEDATE    TITLE          BOOKS_OID  OID  NAME
> > ----   ---------- ---------      ---------- ---- -----------
> > 3      2009-11-11 Gone Fishing   3          12   Outdoors
> > 3      2009-11-11 Gone Fishing   3          13   Sportsman
> >
> > The fix for OPENJPA-894 generates a result list with two references to
> the
> > same Book(id=3).
> >
> > If you only wanted eager fetching of b.subjects, and didn't want
> > duplicates
> > a good first guess would be to add the DISTINCT keyword (I'm finally
> > getting
> > back to the subject)
> >
> > JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title
> > =
> > 'Gone Sailing'"
> > SQL :  "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID,
> > t2.oid,
> > t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
> > t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
> > t0.title = 'Gone Fishing'"
> >
> > We'll still get the same two rows because the SQL DISTINCT keyword is
> > applied to all permutations of the columns - not just the oid (and if it
> > did
> > only apply to the oid we wouldn't be eagerly loading b.subjects anyway).
> > As
> > a result I think we'll have to use a SetBackedList (or otherwise enforce
> > distinct results after getting rows from SQL) as our ResultList.
> >
> > I have a patch that does this, and resolves the use case I described
> above
> > (it's in the TestLibService unit test), but I'm open to any other ideas
> > for
> > ways to resolve the problem.
> >
> > Thanks,
> > -mike
> >
> >
>
>
> -----
> Pinaki
> --
> View this message in context:
> http://n2.nabble.com/How-should-we-handle-the-JPQL-DISTINCT-keyword-tp3908400p3909427.html
> Sent from the OpenJPA Developers mailing list archive at Nabble.com.
>

Re: How should we handle the JPQL DISTINCT keyword

Posted by Pinaki Poddar <pp...@apache.org>.
Hi Mike,
  Very good description of the problem. 
  JOIN FETCH is one place where the assumption that a object-oriented query
is same as a row-based query shows its strain. DISTINCT in JPQL referred to
b, while DISTINCT in SQL referred to the row it selects -- and b is not a
row but the root of an object graph!

  SetBackedList is indeed a good idea. 

  Other option is to drop the JOIN FETCH clauses altogether from part of the
query. But to add them to the FetchPlan. Then query   
JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
Sailing'"
is effectively
  "SELECT b from Book b WHERE b.title = 'Gone Sailing'"
  fetch.add(Book.class, "subjects");

  Do not how much trouble to tweak OpenJPA that way though!




Michael Dick wrote:
> 
> Hi all,
> 
> Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
> DISTINCT keyword. So the following
> JPQL:  "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
> may result in the following SQL
> SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0
> WHERE
> t0.title = 'Gone Sailing'"
> 
> This works fine for most queries, but when I was looking into OPENJPA-894
> I
> noticed a problem with some relationships and the JOIN FETCH clause. A
> JOIN
> FETCH looking like this :
> JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
> Sailing'"
> Has two effects :
>    1. It eagerly loads b.subjects
>    2. It returns (potentially) multiple references to the same book. One
> reference to Book(id=1) for every subject associated with Book(id=1).
> 
> The resulting SQL may look like this (in this case Book is MxM with
> Subject)
> :
> SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name
> FROM
> LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID INNER
> JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
> Fishing'"
> 
> Lets say that the Book with title = 'Gone Sailing' has two subjects :
> Outdoors and Sportsman. In that case the SQL will return two rows that
> look
> something like this :
> 
> OID    DUEDATE    TITLE          BOOKS_OID  OID  NAME
> ----   ---------- ---------      ---------- ---- -----------
> 3      2009-11-11 Gone Fishing   3          12   Outdoors
> 3      2009-11-11 Gone Fishing   3          13   Sportsman
> 
> The fix for OPENJPA-894 generates a result list with two references to the
> same Book(id=3).
> 
> If you only wanted eager fetching of b.subjects, and didn't want
> duplicates
> a good first guess would be to add the DISTINCT keyword (I'm finally
> getting
> back to the subject)
> 
> JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title
> =
> 'Gone Sailing'"
> SQL :  "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID,
> t2.oid,
> t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
> t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
> t0.title = 'Gone Fishing'"
> 
> We'll still get the same two rows because the SQL DISTINCT keyword is
> applied to all permutations of the columns - not just the oid (and if it
> did
> only apply to the oid we wouldn't be eagerly loading b.subjects anyway).
> As
> a result I think we'll have to use a SetBackedList (or otherwise enforce
> distinct results after getting rows from SQL) as our ResultList.
> 
> I have a patch that does this, and resolves the use case I described above
> (it's in the TestLibService unit test), but I'm open to any other ideas
> for
> ways to resolve the problem.
> 
> Thanks,
> -mike
> 
> 


-----
Pinaki 
-- 
View this message in context: http://n2.nabble.com/How-should-we-handle-the-JPQL-DISTINCT-keyword-tp3908400p3909427.html
Sent from the OpenJPA Developers mailing list archive at Nabble.com.