You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by "Alexander Lamb (dev)" <al...@mac.com> on 2007/11/14 14:11:00 UTC

Batch faulting with Cayenne 3

Hello list,

One thing is killing performance of our application: it is the  
resolving of individual to-one faults in lists.

For example, we can have 200 roles each refering to a person.

When we loop through the roles, for each role where we do a  
role.getPerson() there will be a return trip to the database.

In the EOF days, there was a possibility to define a batch faulting  
strategy for the entity. In that we would say for example "batch fault  
20 for person" and the first time a to-one fault to person from role  
would be found, it would look in the data context for up to 19 more to  
build a single SQL statement and fetch in one go the person objects  
and resolve up to 20 faults.

Is this feature available somewhere in Cayenne 3m2 or planned in the  
near future?

If not, is there some kind of callback or hook wich would allow us to  
do the same thing?

Thanks,

Alex

Re: Batch faulting with Cayenne 3: small add-on to correct problem

Posted by Kevin Menard <km...@servprise.com>.
Interesting.  This is a situation I'd prefer never occur anyway.  IMO,
ToManyList should work like a Set with a defined iteration order.  This
wouldn't necessarily preclude duplicates because the uniqueness would be
based on the key.

The trade-off is the overhead of managing those semantics.  Although, I've
done it locally (subclassed CayenneDataObject) and haven't noticed a severe
performance hit.

I believe you'll be able to work directly with Sets in Cayenne 3.0.  This
would probably be a better approach for you, providing that insertion order
does not matter much.

-- 
Kevin Menard
Servprise International, Inc.
Remote reboot & power control for network equipment
www.servprise.com              +1 508.892.3823 x308


On 11/20/07 3:48 AM, "Alexander Lamb (dev)" <al...@mac.com> wrote:

> Hello,
> 
> In my code example, there is actually a bug.
> Indeed, if I do:
> 
>>>>     List<Role> roles = getObjectContext().performQuery(query);
>>>>     ToManyList tml = new ToManyList(this,"roles");
>>>>     tml.addAll(roles);
>>>>     writePropertyDirectly("roles",tml);
>>> 
> 
> 
> I will end up with twice the number of roles... Indeed, doing "addAll"
> will first fire the to-many which was just created before adding the
> objects...
> The only way to avoid that initial fetch is to do:
> 
> tml.setObjectList(roles)
> 
> then continue with the writePropertyDirectly.
> 
> Alex
> 
> 
> Le 15 nov. 07 à 10:45, Alexander Lamb (dev) a écrit :
> 
>> 
>> 
>>> The code below should work. Another way is to use multi-step
>>> prefetching on a root of to-many relationship.
>> 
>> Well, that's a good news so we shall probably implement my piece of
>> code for most relationships. However, I am afraid I didn't
>> understand your second sentence:-(
>>> 
>>> 
>>>> Now, it would still be cool if we could have batch faulting for
>>>> the odd places where we didn't set up the prefetching.
>>> 
>>> While I used batch faulting in webobjects days and found it quite
>>> useful, I could never understand how to make it work predictably
>>> (i.e. fault the objects that I care about). I wouldn't object
>>> though to somebody (or even myself) implementing it at the
>>> framework level if whoever that is could explain me the algorithm
>>> used to select which objects to fault. IIRC EOF builds internal
>>> "fault chains". Wonder how much overhead this would incur in Cayenne.
>> I don't think there is a particular order in which the faults are
>> fired. Actually, it is not a problem since gradually in a few
>> queries, all outstanding faults will be fired. My guess is that upon
>> firing the first fault of a to-one registered to batch fault, you
>> simply (ok, not that simple:-) look at all entities of same class in
>> the DataContext and take the first X to be fired at random. X being
>> the size of the batch. Of course, it some how means if there is a to-
>> one which needs to be batch faulted it has to be flagged somewhere
>> so you quickly know which objects to take into account.
>>> 
>>> BTW relationship prefetching policies can be specified per JPA spec
>>> (and hence will be implemented in Cayenne). However my
>>> understanding is that JPA specifies a different kind of prefetch -
>>> which attributes/relationships to resolve eagerly when an object is
>>> fetched.
>> 
>> I am afraid I don't know anything about JPA, but it probably means
>> in the future some kind of interface in the modeler to be able to
>> specify those prefetches?
>> 
>> Thanks!
>> 
>> Alex


Re: Batch faulting with Cayenne 3: small add-on to correct problem

Posted by "Alexander Lamb (dev)" <al...@mac.com>.
Hello,

In my code example, there is actually a bug.
Indeed, if I do:

>>>     List<Role> roles = getObjectContext().performQuery(query);
>>>     ToManyList tml = new ToManyList(this,"roles");
>>>     tml.addAll(roles);
>>>     writePropertyDirectly("roles",tml);
>>


I will end up with twice the number of roles... Indeed, doing "addAll"  
will first fire the to-many which was just created before adding the  
objects...
The only way to avoid that initial fetch is to do:

tml.setObjectList(roles)

then continue with the writePropertyDirectly.

Alex


Le 15 nov. 07 à 10:45, Alexander Lamb (dev) a écrit :

>
>
>> The code below should work. Another way is to use multi-step  
>> prefetching on a root of to-many relationship.
>
> Well, that's a good news so we shall probably implement my piece of  
> code for most relationships. However, I am afraid I didn't  
> understand your second sentence:-(
>>
>>
>>> Now, it would still be cool if we could have batch faulting for  
>>> the odd places where we didn't set up the prefetching.
>>
>> While I used batch faulting in webobjects days and found it quite  
>> useful, I could never understand how to make it work predictably  
>> (i.e. fault the objects that I care about). I wouldn't object  
>> though to somebody (or even myself) implementing it at the  
>> framework level if whoever that is could explain me the algorithm  
>> used to select which objects to fault. IIRC EOF builds internal  
>> "fault chains". Wonder how much overhead this would incur in Cayenne.
> I don't think there is a particular order in which the faults are  
> fired. Actually, it is not a problem since gradually in a few  
> queries, all outstanding faults will be fired. My guess is that upon  
> firing the first fault of a to-one registered to batch fault, you  
> simply (ok, not that simple:-) look at all entities of same class in  
> the DataContext and take the first X to be fired at random. X being  
> the size of the batch. Of course, it some how means if there is a to- 
> one which needs to be batch faulted it has to be flagged somewhere  
> so you quickly know which objects to take into account.
>>
>> BTW relationship prefetching policies can be specified per JPA spec  
>> (and hence will be implemented in Cayenne). However my  
>> understanding is that JPA specifies a different kind of prefetch -  
>> which attributes/relationships to resolve eagerly when an object is  
>> fetched.
>
> I am afraid I don't know anything about JPA, but it probably means  
> in the future some kind of interface in the modeler to be able to  
> specify those prefetches?
>
> Thanks!
>
> Alex
>
>>
>>
>> Andrus
>>
>>
>>
>> On Nov 14, 2007, at 9:33 AM, Alexander Lamb (dev) wrote:
>>
>>> Well, yes it is possible up to a point.
>>>
>>> Usually it is through the "to-many" relationship I get my objects.  
>>> Some other times it might be through a custom query meaning I have  
>>> to do it each time.
>>>
>>> However, as I said in the second email I sent about prefetching,  
>>> the solution is maybe the following:
>>>
>>> @SuppressWarnings("unchecked")
>>> public List<Role> getRoles () {
>>>    
>>> if 
>>> (org 
>>> .apache 
>>> .cayenne 
>>> .Fault.class.isInstance(this.readPropertyDirectly("roles"))) {
>>>     Expression exp = ExpressionFactory.matchExp("person", this);
>>>     SelectQuery query = new SelectQuery(Role.class, exp);
>>>     query.addPrefetch("profile");
>>>     query.addPrefetch("person");
>>>     List<Role> roles = getObjectContext().performQuery(query);
>>>     ToManyList tml = new ToManyList(this,"roles");
>>>     tml.addAll(roles);
>>>     writePropertyDirectly("roles",tml);
>>>   }
>>>   return super.getRoles();
>>> }
>>>
>>> The advantage of this is that it does the prefetch but also sets  
>>> correctly the "to-many" relationship, meaning it will not refetch  
>>> everything if I do an addToRoles or removeFromRoles.
>>>
>>> If I want to refault the relationship, I do:
>>>
>>>    
>>> if 
>>> (org 
>>> .apache 
>>> .cayenne 
>>> .access 
>>> .ToManyList.class.isInstance(this.readPropertyDirectly("roles"))) {
>>> 	  ((org.apache.cayenne.access.ToManyList)getRoles()).invalidate();
>>>   }
>>>
>>> Is this the correct way of doing it?
>>>
>>> If so, could there be a way to add this in a generic way to the  
>>> model?
>>>
>>> Now, it would still be cool if we could have batch faulting for  
>>> the odd places where we didn't set up the prefetching.
>>>
>>> Alex
>>>
>>> Le 14 nov. 07 à 14:45, Andrus Adamchik a écrit :
>>>
>>>> Can you use prefetching instead? You got a list of users vis some  
>>>> sort of query - just add prefetch to that query.
>>>>
>>>> Andrus
>>>>
>>>>
>>>> On Nov 14, 2007, at 8:11 AM, Alexander Lamb (dev) wrote:
>>>>
>>>>> Hello list,
>>>>>
>>>>> One thing is killing performance of our application: it is the  
>>>>> resolving of individual to-one faults in lists.
>>>>>
>>>>> For example, we can have 200 roles each refering to a person.
>>>>>
>>>>> When we loop through the roles, for each role where we do a  
>>>>> role.getPerson() there will be a return trip to the database.
>>>>>
>>>>> In the EOF days, there was a possibility to define a batch  
>>>>> faulting strategy for the entity. In that we would say for  
>>>>> example "batch fault 20 for person" and the first time a to-one  
>>>>> fault to person from role would be found, it would look in the  
>>>>> data context for up to 19 more to build a single SQL statement  
>>>>> and fetch in one go the person objects and resolve up to 20  
>>>>> faults.
>>>>>
>>>>> Is this feature available somewhere in Cayenne 3m2 or planned in  
>>>>> the near future?
>>>>>
>>>>> If not, is there some kind of callback or hook wich would allow  
>>>>> us to do the same thing?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Alex
>>>>>
>>>>
>>>
>>
>

--
Alexander Lamb
alamb@mac.com




Re: Batch faulting with Cayenne 3

Posted by "Alexander Lamb (dev)" <al...@mac.com>.

> The code below should work. Another way is to use multi-step  
> prefetching on a root of to-many relationship.

Well, that's a good news so we shall probably implement my piece of  
code for most relationships. However, I am afraid I didn't understand  
your second sentence:-(
>
>
>> Now, it would still be cool if we could have batch faulting for the  
>> odd places where we didn't set up the prefetching.
>
> While I used batch faulting in webobjects days and found it quite  
> useful, I could never understand how to make it work predictably  
> (i.e. fault the objects that I care about). I wouldn't object though  
> to somebody (or even myself) implementing it at the framework level  
> if whoever that is could explain me the algorithm used to select  
> which objects to fault. IIRC EOF builds internal "fault chains".  
> Wonder how much overhead this would incur in Cayenne.
I don't think there is a particular order in which the faults are  
fired. Actually, it is not a problem since gradually in a few queries,  
all outstanding faults will be fired. My guess is that upon firing the  
first fault of a to-one registered to batch fault, you simply (ok, not  
that simple:-) look at all entities of same class in the DataContext  
and take the first X to be fired at random. X being the size of the  
batch. Of course, it some how means if there is a to-one which needs  
to be batch faulted it has to be flagged somewhere so you quickly know  
which objects to take into account.
>
> BTW relationship prefetching policies can be specified per JPA spec  
> (and hence will be implemented in Cayenne). However my understanding  
> is that JPA specifies a different kind of prefetch - which  
> attributes/relationships to resolve eagerly when an object is fetched.

I am afraid I don't know anything about JPA, but it probably means in  
the future some kind of interface in the modeler to be able to specify  
those prefetches?

Thanks!

Alex

>
>
> Andrus
>
>
>
> On Nov 14, 2007, at 9:33 AM, Alexander Lamb (dev) wrote:
>
>> Well, yes it is possible up to a point.
>>
>> Usually it is through the "to-many" relationship I get my objects.  
>> Some other times it might be through a custom query meaning I have  
>> to do it each time.
>>
>> However, as I said in the second email I sent about prefetching,  
>> the solution is maybe the following:
>>
>> @SuppressWarnings("unchecked")
>> public List<Role> getRoles () {
>>     
>> if 
>> (org 
>> .apache 
>> .cayenne 
>> .Fault.class.isInstance(this.readPropertyDirectly("roles"))) {
>>      Expression exp = ExpressionFactory.matchExp("person", this);
>>      SelectQuery query = new SelectQuery(Role.class, exp);
>>      query.addPrefetch("profile");
>>      query.addPrefetch("person");
>>      List<Role> roles = getObjectContext().performQuery(query);
>>      ToManyList tml = new ToManyList(this,"roles");
>>      tml.addAll(roles);
>>      writePropertyDirectly("roles",tml);
>>    }
>>    return super.getRoles();
>> }
>>
>> The advantage of this is that it does the prefetch but also sets  
>> correctly the "to-many" relationship, meaning it will not refetch  
>> everything if I do an addToRoles or removeFromRoles.
>>
>> If I want to refault the relationship, I do:
>>
>>     
>> if 
>> (org 
>> .apache 
>> .cayenne 
>> .access 
>> .ToManyList.class.isInstance(this.readPropertyDirectly("roles"))) {
>> 	  ((org.apache.cayenne.access.ToManyList)getRoles()).invalidate();
>>    }
>>
>> Is this the correct way of doing it?
>>
>> If so, could there be a way to add this in a generic way to the  
>> model?
>>
>> Now, it would still be cool if we could have batch faulting for the  
>> odd places where we didn't set up the prefetching.
>>
>> Alex
>>
>> Le 14 nov. 07 à 14:45, Andrus Adamchik a écrit :
>>
>>> Can you use prefetching instead? You got a list of users vis some  
>>> sort of query - just add prefetch to that query.
>>>
>>> Andrus
>>>
>>>
>>> On Nov 14, 2007, at 8:11 AM, Alexander Lamb (dev) wrote:
>>>
>>>> Hello list,
>>>>
>>>> One thing is killing performance of our application: it is the  
>>>> resolving of individual to-one faults in lists.
>>>>
>>>> For example, we can have 200 roles each refering to a person.
>>>>
>>>> When we loop through the roles, for each role where we do a  
>>>> role.getPerson() there will be a return trip to the database.
>>>>
>>>> In the EOF days, there was a possibility to define a batch  
>>>> faulting strategy for the entity. In that we would say for  
>>>> example "batch fault 20 for person" and the first time a to-one  
>>>> fault to person from role would be found, it would look in the  
>>>> data context for up to 19 more to build a single SQL statement  
>>>> and fetch in one go the person objects and resolve up to 20 faults.
>>>>
>>>> Is this feature available somewhere in Cayenne 3m2 or planned in  
>>>> the near future?
>>>>
>>>> If not, is there some kind of callback or hook wich would allow  
>>>> us to do the same thing?
>>>>
>>>> Thanks,
>>>>
>>>> Alex
>>>>
>>>
>>
>


Re: Batch faulting with Cayenne 3

Posted by Andrus Adamchik <an...@objectstyle.org>.
The code below should work. Another way is to use multi-step  
prefetching on a root of to-many relationship.

> Now, it would still be cool if we could have batch faulting for the  
> odd places where we didn't set up the prefetching.

While I used batch faulting in webobjects days and found it quite  
useful, I could never understand how to make it work predictably  
(i.e. fault the objects that I care about). I wouldn't object though  
to somebody (or even myself) implementing it at the framework level  
if whoever that is could explain me the algorithm used to select  
which objects to fault. IIRC EOF builds internal "fault chains".  
Wonder how much overhead this would incur in Cayenne.

BTW relationship prefetching policies can be specified per JPA spec  
(and hence will be implemented in Cayenne). However my understanding  
is that JPA specifies a different kind of prefetch - which attributes/ 
relationships to resolve eagerly when an object is fetched.

Andrus



On Nov 14, 2007, at 9:33 AM, Alexander Lamb (dev) wrote:

> Well, yes it is possible up to a point.
>
> Usually it is through the "to-many" relationship I get my objects.  
> Some other times it might be through a custom query meaning I have  
> to do it each time.
>
> However, as I said in the second email I sent about prefetching,  
> the solution is maybe the following:
>
> @SuppressWarnings("unchecked")
> public List<Role> getRoles () {
>     if(org.apache.cayenne.Fault.class.isInstance 
> (this.readPropertyDirectly("roles"))) {
>       Expression exp = ExpressionFactory.matchExp("person", this);
>       SelectQuery query = new SelectQuery(Role.class, exp);
>       query.addPrefetch("profile");
>       query.addPrefetch("person");
>       List<Role> roles = getObjectContext().performQuery(query);
>       ToManyList tml = new ToManyList(this,"roles");
>       tml.addAll(roles);
>       writePropertyDirectly("roles",tml);
>     }
>     return super.getRoles();
> }
>
> The advantage of this is that it does the prefetch but also sets  
> correctly the "to-many" relationship, meaning it will not refetch  
> everything if I do an addToRoles or removeFromRoles.
>
> If I want to refault the relationship, I do:
>
>     if(org.apache.cayenne.access.ToManyList.class.isInstance 
> (this.readPropertyDirectly("roles"))) {
> 	  ((org.apache.cayenne.access.ToManyList)getRoles()).invalidate();
>     }
>
> Is this the correct way of doing it?
>
> If so, could there be a way to add this in a generic way to the model?
>
> Now, it would still be cool if we could have batch faulting for the  
> odd places where we didn't set up the prefetching.
>
> Alex
>
> Le 14 nov. 07 à 14:45, Andrus Adamchik a écrit :
>
>> Can you use prefetching instead? You got a list of users vis some  
>> sort of query - just add prefetch to that query.
>>
>> Andrus
>>
>>
>> On Nov 14, 2007, at 8:11 AM, Alexander Lamb (dev) wrote:
>>
>>> Hello list,
>>>
>>> One thing is killing performance of our application: it is the  
>>> resolving of individual to-one faults in lists.
>>>
>>> For example, we can have 200 roles each refering to a person.
>>>
>>> When we loop through the roles, for each role where we do a  
>>> role.getPerson() there will be a return trip to the database.
>>>
>>> In the EOF days, there was a possibility to define a batch  
>>> faulting strategy for the entity. In that we would say for  
>>> example "batch fault 20 for person" and the first time a to-one  
>>> fault to person from role would be found, it would look in the  
>>> data context for up to 19 more to build a single SQL statement  
>>> and fetch in one go the person objects and resolve up to 20 faults.
>>>
>>> Is this feature available somewhere in Cayenne 3m2 or planned in  
>>> the near future?
>>>
>>> If not, is there some kind of callback or hook wich would allow  
>>> us to do the same thing?
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>
>


Re: Batch faulting with Cayenne 3

Posted by "Alexander Lamb (dev)" <al...@mac.com>.
Well, yes it is possible up to a point.

Usually it is through the "to-many" relationship I get my objects.  
Some other times it might be through a custom query meaning I have to  
do it each time.

However, as I said in the second email I sent about prefetching, the  
solution is maybe the following:

@SuppressWarnings("unchecked")
public List<Role> getRoles () {
      
if 
(org 
.apache 
.cayenne.Fault.class.isInstance(this.readPropertyDirectly("roles"))) {
       Expression exp = ExpressionFactory.matchExp("person", this);
       SelectQuery query = new SelectQuery(Role.class, exp);
       query.addPrefetch("profile");
       query.addPrefetch("person");
       List<Role> roles = getObjectContext().performQuery(query);
       ToManyList tml = new ToManyList(this,"roles");
       tml.addAll(roles);
       writePropertyDirectly("roles",tml);
     }
     return super.getRoles();
}

The advantage of this is that it does the prefetch but also sets  
correctly the "to-many" relationship, meaning it will not refetch  
everything if I do an addToRoles or removeFromRoles.

If I want to refault the relationship, I do:

      
if 
(org 
.apache 
.cayenne 
.access 
.ToManyList.class.isInstance(this.readPropertyDirectly("roles"))) {
	  ((org.apache.cayenne.access.ToManyList)getRoles()).invalidate();
     }

Is this the correct way of doing it?

If so, could there be a way to add this in a generic way to the model?

Now, it would still be cool if we could have batch faulting for the  
odd places where we didn't set up the prefetching.

Alex

Le 14 nov. 07 à 14:45, Andrus Adamchik a écrit :

> Can you use prefetching instead? You got a list of users vis some  
> sort of query - just add prefetch to that query.
>
> Andrus
>
>
> On Nov 14, 2007, at 8:11 AM, Alexander Lamb (dev) wrote:
>
>> Hello list,
>>
>> One thing is killing performance of our application: it is the  
>> resolving of individual to-one faults in lists.
>>
>> For example, we can have 200 roles each refering to a person.
>>
>> When we loop through the roles, for each role where we do a  
>> role.getPerson() there will be a return trip to the database.
>>
>> In the EOF days, there was a possibility to define a batch faulting  
>> strategy for the entity. In that we would say for example "batch  
>> fault 20 for person" and the first time a to-one fault to person  
>> from role would be found, it would look in the data context for up  
>> to 19 more to build a single SQL statement and fetch in one go the  
>> person objects and resolve up to 20 faults.
>>
>> Is this feature available somewhere in Cayenne 3m2 or planned in  
>> the near future?
>>
>> If not, is there some kind of callback or hook wich would allow us  
>> to do the same thing?
>>
>> Thanks,
>>
>> Alex
>>
>


Re: Batch faulting with Cayenne 3

Posted by Andrus Adamchik <an...@objectstyle.org>.
Can you use prefetching instead? You got a list of users vis some  
sort of query - just add prefetch to that query.

Andrus


On Nov 14, 2007, at 8:11 AM, Alexander Lamb (dev) wrote:

> Hello list,
>
> One thing is killing performance of our application: it is the  
> resolving of individual to-one faults in lists.
>
> For example, we can have 200 roles each refering to a person.
>
> When we loop through the roles, for each role where we do a  
> role.getPerson() there will be a return trip to the database.
>
> In the EOF days, there was a possibility to define a batch faulting  
> strategy for the entity. In that we would say for example "batch  
> fault 20 for person" and the first time a to-one fault to person  
> from role would be found, it would look in the data context for up  
> to 19 more to build a single SQL statement and fetch in one go the  
> person objects and resolve up to 20 faults.
>
> Is this feature available somewhere in Cayenne 3m2 or planned in  
> the near future?
>
> If not, is there some kind of callback or hook wich would allow us  
> to do the same thing?
>
> Thanks,
>
> Alex
>