You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cayenne.apache.org by Andrus Adamchik <an...@objectstyle.org> on 2012/01/30 09:37:54 UTC

Third prefetch kind - DISJOINT_BY_ID

I keep encountering a common scenario where neither JOINT or DISJOINT prefetch strategies are adequate - queries with fetch limit. It is very common in the application to display X most recent entries from a table with millions of rows, and then drill down to the object details. E.g. assume 2 entities - "Order" and "LineItem", with orders having multiple line items. We want 10 most recent orders, with line items prefetched, so you'd so something like this:

  SelectQuery q = new SelectQuery(Order.class);
  q.addPrefetch("lineItems");
  q.setFetchLimit(10);

"Disjoint" prefetch in this situation would fetch 10 orders and ALL LineItems in DB. 

"Joint" prefetch will fetch anywhere between 1 and 10 orders, depending on how many line items the first 10 orders have, i.e. fetch limit is applied to to-many join, not to the query root. And this is certainly not what we want. 

Now Cayenne already has something that can solve the problem:

 q.setPageSize(10); // same as fetch limit

Paginated query is the most optimal way to prefetch here. Whenever a result list is accessed, Cayenne would execute 2 IN () queries - one for the Orders, another one - for the LineItems. Both queries are matching on a set of Order PKs and are pretty efficient, and only return the objects that we care about.

The problem with this solution is that it is counterintuitive to the user (why should I set "pageSize" to make my prefetches work) and adds one extra query (the IN query resolving the root object list). Would be cool to turn it into a separate type of prefetch.  Something like "disjoint by id"?

Andrus




Re: Third prefetch kind - DISJOINT_BY_ID

Posted by Andrus Adamchik <an...@objectstyle.org>.
On Jan 30, 2012, at 3:14 PM, Andrus Adamchik wrote:

>  Ordering of related (prefetched or otherwise) objects is controlled by the user.

It should say "is NOT controlled" of course.


Re: Third prefetch kind - DISJOINT_BY_ID

Posted by Andrus Adamchik <an...@objectstyle.org>.
Yes I envision the new strategy to be used for relatively small result sets. So I am not suggesting we replace DISJOINT with this behind the scenes. I'd like to leave it up to the users to pick (or maybe we can auto-choose based on the size of the main result set).

BTW paginated queries internally have a control limiting the size of the WHERE clause. It is hardcoded inside IncrementalFaultList and the value is 10000. So any extra large IN query will be broken to chunks.

> How do these solutions work when the user tries to order the results on something across a to-one join?

Root query works the same with or without a disjoint prefetch, even if some joins overlap. Ordering of related (prefetched or otherwise) objects is controlled by the user.

Andrus

On Jan 30, 2012, at 2:54 PM, Aristedes Maniatis wrote:
> On 30/01/12 10:43 PM, Andrus Adamchik wrote:
>>> And does DISJOINT map to:
>>> 
>>>  SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>>>  SELECT * FROM PAINTING WHERE true;
>> 
>> Effectively yes. With some extra nastiness like joins inherited from the parent query.
>> 
>>> Are you proposing this new join type be:
>>> 
>>>  SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>>>  SELECT * FROM PAINTING WHERE ARTIST_ID IN (123, 124, ....)
>> 
>> Yes.
>> 
>> Andrus
>> 
> 
> Some databases don't like the IN clause to be too big. 1024 entries seems to be in my head about when we hit an error with MS SQL in the past.
> 
> How do these solutions work when the user tries to order the results on something across a to-one join?
> 
> 
> 
> 
> 
> Ari
> 
> 
> 
> -- 
> -------------------------->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
> 


Re: Third prefetch kind - DISJOINT_BY_ID

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 30/01/12 10:43 PM, Andrus Adamchik wrote:
>> And does DISJOINT map to:
>>
>>   SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>>   SELECT * FROM PAINTING WHERE true;
>
> Effectively yes. With some extra nastiness like joins inherited from the parent query.
>
>> Are you proposing this new join type be:
>>
>>   SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>>   SELECT * FROM PAINTING WHERE ARTIST_ID IN (123, 124, ....)
>
> Yes.
>
> Andrus
>

Some databases don't like the IN clause to be too big. 1024 entries seems to be in my head about when we hit an error with MS SQL in the past.

How do these solutions work when the user tries to order the results on something across a to-one join?





Ari



-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: Third prefetch kind - DISJOINT_BY_ID

Posted by Andrus Adamchik <an...@objectstyle.org>.
> And does DISJOINT map to:
> 
>  SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>  SELECT * FROM PAINTING WHERE true;

Effectively yes. With some extra nastiness like joins inherited from the parent query.

> Are you proposing this new join type be:
> 
>  SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>  SELECT * FROM PAINTING WHERE ARTIST_ID IN (123, 124, ....)

Yes.

Andrus




On Jan 30, 2012, at 1:12 PM, Aristedes Maniatis wrote:

> On 30/01/12 8:21 PM, Andrus Adamchik wrote:
>> 
>> On Jan 30, 2012, at 12:13 PM, Aristedes Maniatis wrote:
>> 
>>> Isn't this just a bug in the old DISJOINT approach rather than a new type of prefetch? What is the use-case for fetching all related records in the database when you have a fetch limit?
>> 
>> This is not a bug, rather a limitation of the algorithm. DISJOINT approach is based on the root query qualifier transposed for the related entity. It has no way of knowing which root objects will get below the fetch limit.
>> 
>> Root:
>> 
>>  SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>> 
>> Prefetch
>> 
>>  SELECT PAINTING.* FROM PAINTING JOIN ARTIST ON (..) WHERE ARTIST.ARTIST_NAME LIKE 'X%' FETCH LIMIT ????
>> 
>> Andrus
> 
> And does DISJOINT map to:
> 
>  SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>  SELECT * FROM PAINTING WHERE true;
> 
> 
> Are you proposing this new join type be:
> 
>  SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>  SELECT * FROM PAINTING WHERE ARTIST_ID IN (123, 124, ....)
> 
> 
> 
> Ari
> 
> 
> -- 
> -------------------------->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
> 


Re: Third prefetch kind - DISJOINT_BY_ID

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 30/01/12 8:21 PM, Andrus Adamchik wrote:
>
> On Jan 30, 2012, at 12:13 PM, Aristedes Maniatis wrote:
>
>> Isn't this just a bug in the old DISJOINT approach rather than a new type of prefetch? What is the use-case for fetching all related records in the database when you have a fetch limit?
>
> This is not a bug, rather a limitation of the algorithm. DISJOINT approach is based on the root query qualifier transposed for the related entity. It has no way of knowing which root objects will get below the fetch limit.
>
> Root:
>
>   SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
>
> Prefetch
>
>   SELECT PAINTING.* FROM PAINTING JOIN ARTIST ON (..) WHERE ARTIST.ARTIST_NAME LIKE 'X%' FETCH LIMIT ????
>
> Andrus

And does DISJOINT map to:

   SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
   SELECT * FROM PAINTING WHERE true;


Are you proposing this new join type be:

   SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;
   SELECT * FROM PAINTING WHERE ARTIST_ID IN (123, 124, ....)



Ari


-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: Third prefetch kind - DISJOINT_BY_ID

Posted by Andrus Adamchik <an...@objectstyle.org>.
On Jan 30, 2012, at 12:13 PM, Aristedes Maniatis wrote:

> Isn't this just a bug in the old DISJOINT approach rather than a new type of prefetch? What is the use-case for fetching all related records in the database when you have a fetch limit?

This is not a bug, rather a limitation of the algorithm. DISJOINT approach is based on the root query qualifier transposed for the related entity. It has no way of knowing which root objects will get below the fetch limit.

Root:

 SELECT * FROM ARTIST WHERE ARTIST_NAME LIKE 'X%' FETCH LIMIT 10;

Prefetch

 SELECT PAINTING.* FROM PAINTING JOIN ARTIST ON (..) WHERE ARTIST.ARTIST_NAME LIKE 'X%' FETCH LIMIT ????

Andrus


Re: Third prefetch kind - DISJOINT_BY_ID

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On Mon Jan 30 19:37:54 2012, Andrus Adamchik wrote:
> I keep encountering a common scenario where neither JOINT or DISJOINT prefetch strategies are adequate - queries with fetch limit. It is very common in the application to display X most recent entries from a table with millions of rows, and then drill down to the object details. E.g. assume 2 entities - "Order" and "LineItem", with orders having multiple line items. We want 10 most recent orders, with line items prefetched, so you'd so something like this:
>
>    SelectQuery q = new SelectQuery(Order.class);
>    q.addPrefetch("lineItems");
>    q.setFetchLimit(10);
>
> "Disjoint" prefetch in this situation would fetch 10 orders and ALL LineItems in DB.
>
> "Joint" prefetch will fetch anywhere between 1 and 10 orders, depending on how many line items the first 10 orders have, i.e. fetch limit is applied to to-many join, not to the query root. And this is certainly not what we want.
>
> Now Cayenne already has something that can solve the problem:
>
>   q.setPageSize(10); // same as fetch limit
>
> Paginated query is the most optimal way to prefetch here. Whenever a result list is accessed, Cayenne would execute 2 IN () queries - one for the Orders, another one - for the LineItems. Both queries are matching on a set of Order PKs and are pretty efficient, and only return the objects that we care about.
>
> The problem with this solution is that it is counterintuitive to the user (why should I set "pageSize" to make my prefetches work) and adds one extra query (the IN query resolving the root object list). Would be cool to turn it into a separate type of prefetch.  Something like "disjoint by id"?
>
> Andrus
>
>
>

 Isn't this just a bug in the old DISJOINT approach rather than a new 
type of prefetch? What is the use-case for fetching all related records 
in the database when you have a fetch limit?

I don't really know since we have found pagination so much more useful 
and powerful than fetchLimit.

Ari

-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A