You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openjpa.apache.org by Michael Dick <mi...@gmail.com> on 2009/10/28 23:03:09 UTC
How should we handle the JPQL DISTINCT keyword
Hi all,
Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
DISTINCT keyword. So the following
JPQL: "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
may result in the following SQL
SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0 WHERE
t0.title = 'Gone Sailing'"
This works fine for most queries, but when I was looking into OPENJPA-894 I
noticed a problem with some relationships and the JOIN FETCH clause. A JOIN
FETCH looking like this :
JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
Sailing'"
Has two effects :
1. It eagerly loads b.subjects
2. It returns (potentially) multiple references to the same book. One
reference to Book(id=1) for every subject associated with Book(id=1).
The resulting SQL may look like this (in this case Book is MxM with Subject)
:
SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name FROM
LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID INNER
JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
Fishing'"
Lets say that the Book with title = 'Gone Sailing' has two subjects :
Outdoors and Sportsman. In that case the SQL will return two rows that look
something like this :
OID DUEDATE TITLE BOOKS_OID OID NAME
---- ---------- --------- ---------- ---- -----------
3 2009-11-11 Gone Fishing 3 12 Outdoors
3 2009-11-11 Gone Fishing 3 13 Sportsman
The fix for OPENJPA-894 generates a result list with two references to the
same Book(id=3).
If you only wanted eager fetching of b.subjects, and didn't want duplicates
a good first guess would be to add the DISTINCT keyword (I'm finally getting
back to the subject)
JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title =
'Gone Sailing'"
SQL : "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid,
t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
t0.title = 'Gone Fishing'"
We'll still get the same two rows because the SQL DISTINCT keyword is
applied to all permutations of the columns - not just the oid (and if it did
only apply to the oid we wouldn't be eagerly loading b.subjects anyway). As
a result I think we'll have to use a SetBackedList (or otherwise enforce
distinct results after getting rows from SQL) as our ResultList.
I have a patch that does this, and resolves the use case I described above
(it's in the TestLibService unit test), but I'm open to any other ideas for
ways to resolve the problem.
Thanks,
-mike
Re: How should we handle the JPQL DISTINCT keyword
Posted by Michael Dick <mi...@gmail.com>.
Hi Pinaki,
Thanks for reading through my verbose problem description and for your
comments!
This particular issue turned up when working with a WebSphere customer and
they have been reluctant to use vendor specific options like FetchPlans in
the past. I can propose FetchPlans as a solution again, but I want to have a
vendor neutral answer ready if they push back.
As a developer I appreciate the flexibility of FetchPlans, but I've found
that using them in this manner results in a additional SQL statements. With
the simple example I posted above FetchPlans generate 4 SQL statements and
take slightly longer than the JOIN FETCH's 2 statements. One developer's
laptop does not make a valid benchmark though :-)
-mike
On Wed, Oct 28, 2009 at 9:11 PM, Pinaki Poddar <pp...@apache.org> wrote:
>
> Hi Mike,
> Very good description of the problem.
> JOIN FETCH is one place where the assumption that a object-oriented query
> is same as a row-based query shows its strain. DISTINCT in JPQL referred to
> b, while DISTINCT in SQL referred to the row it selects -- and b is not a
> row but the root of an object graph!
>
> SetBackedList is indeed a good idea.
>
> Other option is to drop the JOIN FETCH clauses altogether from part of the
> query. But to add them to the FetchPlan. Then query
> JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
> Sailing'"
> is effectively
> "SELECT b from Book b WHERE b.title = 'Gone Sailing'"
> fetch.add(Book.class, "subjects");
>
> Do not how much trouble to tweak OpenJPA that way though!
>
>
>
>
> Michael Dick wrote:
> >
> > Hi all,
> >
> > Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
> > DISTINCT keyword. So the following
> > JPQL: "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
> > may result in the following SQL
> > SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0
> > WHERE
> > t0.title = 'Gone Sailing'"
> >
> > This works fine for most queries, but when I was looking into OPENJPA-894
> > I
> > noticed a problem with some relationships and the JOIN FETCH clause. A
> > JOIN
> > FETCH looking like this :
> > JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
> > Sailing'"
> > Has two effects :
> > 1. It eagerly loads b.subjects
> > 2. It returns (potentially) multiple references to the same book. One
> > reference to Book(id=1) for every subject associated with Book(id=1).
> >
> > The resulting SQL may look like this (in this case Book is MxM with
> > Subject)
> > :
> > SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name
> > FROM
> > LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID
> INNER
> > JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
> > Fishing'"
> >
> > Lets say that the Book with title = 'Gone Sailing' has two subjects :
> > Outdoors and Sportsman. In that case the SQL will return two rows that
> > look
> > something like this :
> >
> > OID DUEDATE TITLE BOOKS_OID OID NAME
> > ---- ---------- --------- ---------- ---- -----------
> > 3 2009-11-11 Gone Fishing 3 12 Outdoors
> > 3 2009-11-11 Gone Fishing 3 13 Sportsman
> >
> > The fix for OPENJPA-894 generates a result list with two references to
> the
> > same Book(id=3).
> >
> > If you only wanted eager fetching of b.subjects, and didn't want
> > duplicates
> > a good first guess would be to add the DISTINCT keyword (I'm finally
> > getting
> > back to the subject)
> >
> > JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title
> > =
> > 'Gone Sailing'"
> > SQL : "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID,
> > t2.oid,
> > t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
> > t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
> > t0.title = 'Gone Fishing'"
> >
> > We'll still get the same two rows because the SQL DISTINCT keyword is
> > applied to all permutations of the columns - not just the oid (and if it
> > did
> > only apply to the oid we wouldn't be eagerly loading b.subjects anyway).
> > As
> > a result I think we'll have to use a SetBackedList (or otherwise enforce
> > distinct results after getting rows from SQL) as our ResultList.
> >
> > I have a patch that does this, and resolves the use case I described
> above
> > (it's in the TestLibService unit test), but I'm open to any other ideas
> > for
> > ways to resolve the problem.
> >
> > Thanks,
> > -mike
> >
> >
>
>
> -----
> Pinaki
> --
> View this message in context:
> http://n2.nabble.com/How-should-we-handle-the-JPQL-DISTINCT-keyword-tp3908400p3909427.html
> Sent from the OpenJPA Developers mailing list archive at Nabble.com.
>
Re: How should we handle the JPQL DISTINCT keyword
Posted by Pinaki Poddar <pp...@apache.org>.
Hi Mike,
Very good description of the problem.
JOIN FETCH is one place where the assumption that a object-oriented query
is same as a row-based query shows its strain. DISTINCT in JPQL referred to
b, while DISTINCT in SQL referred to the row it selects -- and b is not a
row but the root of an object graph!
SetBackedList is indeed a good idea.
Other option is to drop the JOIN FETCH clauses altogether from part of the
query. But to add them to the FetchPlan. Then query
JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
Sailing'"
is effectively
"SELECT b from Book b WHERE b.title = 'Gone Sailing'"
fetch.add(Book.class, "subjects");
Do not how much trouble to tweak OpenJPA that way though!
Michael Dick wrote:
>
> Hi all,
>
> Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
> DISTINCT keyword. So the following
> JPQL: "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
> may result in the following SQL
> SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0
> WHERE
> t0.title = 'Gone Sailing'"
>
> This works fine for most queries, but when I was looking into OPENJPA-894
> I
> noticed a problem with some relationships and the JOIN FETCH clause. A
> JOIN
> FETCH looking like this :
> JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
> Sailing'"
> Has two effects :
> 1. It eagerly loads b.subjects
> 2. It returns (potentially) multiple references to the same book. One
> reference to Book(id=1) for every subject associated with Book(id=1).
>
> The resulting SQL may look like this (in this case Book is MxM with
> Subject)
> :
> SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name
> FROM
> LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID INNER
> JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
> Fishing'"
>
> Lets say that the Book with title = 'Gone Sailing' has two subjects :
> Outdoors and Sportsman. In that case the SQL will return two rows that
> look
> something like this :
>
> OID DUEDATE TITLE BOOKS_OID OID NAME
> ---- ---------- --------- ---------- ---- -----------
> 3 2009-11-11 Gone Fishing 3 12 Outdoors
> 3 2009-11-11 Gone Fishing 3 13 Sportsman
>
> The fix for OPENJPA-894 generates a result list with two references to the
> same Book(id=3).
>
> If you only wanted eager fetching of b.subjects, and didn't want
> duplicates
> a good first guess would be to add the DISTINCT keyword (I'm finally
> getting
> back to the subject)
>
> JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title
> =
> 'Gone Sailing'"
> SQL : "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID,
> t2.oid,
> t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
> t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
> t0.title = 'Gone Fishing'"
>
> We'll still get the same two rows because the SQL DISTINCT keyword is
> applied to all permutations of the columns - not just the oid (and if it
> did
> only apply to the oid we wouldn't be eagerly loading b.subjects anyway).
> As
> a result I think we'll have to use a SetBackedList (or otherwise enforce
> distinct results after getting rows from SQL) as our ResultList.
>
> I have a patch that does this, and resolves the use case I described above
> (it's in the TestLibService unit test), but I'm open to any other ideas
> for
> ways to resolve the problem.
>
> Thanks,
> -mike
>
>
-----
Pinaki
--
View this message in context: http://n2.nabble.com/How-should-we-handle-the-JPQL-DISTINCT-keyword-tp3908400p3909427.html
Sent from the OpenJPA Developers mailing list archive at Nabble.com.