You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openjpa.apache.org by Daniel Lee <ts...@gmail.com> on 2007/05/24 00:55:54 UTC

missing getAll(List keys) method?

Do we miss the getAll(List keys) method for data cache?

When fetching objects with eager "to-many" relationships, the code is
calling get(Object key) multiple time (one for each object in the
relationship).  For example, it is doing 1 get() call for each order placed
by a customer which we are fetching, that means 100 calls for a customer
with 100 orders.  The performance can be greatly improved if we have
getAll(List keys) methods which returns all orders in one call.  This is
especially important in a distributed environment.

Is there a way (new plug-in) to avoid the multiple-trip for single
relationship, or can we implement the code to improve the performance in
this area?

Many thanks.
Daniel

Re: missing getAll(List keys) method?

Posted by Daniel Lee <ts...@gmail.com>.
I can come up with a sample implementation and provide the result comparison
of the performance measurement.

Daniel

On 5/30/07, Marc Prud'hommeaux <mp...@bea.com> wrote:
>
>
> I personally think it sounds like a good idea that has a lot of
> potential for performance improvement.
>
> Perhaps someone could come up with a sample implementation that adds
> the API and a default implementation in the DataCacheImpl and compare
> the performance in the scenario mentioned below? That would help
> establish a concrete justification for enhancing the DataCache
> interface.
>
>
>
> On May 30, 2007, at 12:48 PM, Kevin Sutter wrote:
>
> > Marc,
> > What are your views on this request?  Since you seem to be
> > intimately familiar with the data cache API, do you see a problem
> > with introducing this additional get method?  Either from an
> > expectation viewpoint or an implementation viewpoint?  Thanks.
> >
> > Kevin
> >
> > On 5/29/07, Daniel Lee <ts...@gmail.com> wrote: Hi Craig,
> >
> > The discussed API (getAll) is for fetching objects that's already
> > cached in
> > the DataCache.  From what I understand, OpenJPA executes the
> > following code
> > when loading (find()) a customer which exists in the DataCache.  It
> > loads
> > not only the customer but also the objects in any eager (direct and
> > indirect) relationships.  In the earlier example (a customer with
> > 100 orders
> > and each order has different products) the direct relationships
> > are all orders placed by the customer and the indirect
> > relationships are all
> > products in these orders).
> >
> >    1. BrokerImpl.find() calls DataCacheStoreManager.initialize() to
> >    initialize a new state manager for an object (a customer with
> > 100 orders
> >    for example).
> >    2. initialize() then issues get() to DataCache to see whether
> > the data
> >    (customer) is already cached.  After successfully getting the
> > customer (data
> >    != null) from the datacache, DataCachePCData.load (sm, fetch,
> > edata) is
> >    invoked to load all the eager relationships (orders in the
> > example) of the
> >    object (customer).
> >    3. PCDataImpl.load() loops through the relationship field to call
> >    loadField() for each relationship which is not yet loaded.  In
> > this example,
> >    it is the relationship the customer to its orders (eager,
> >    one-to-many) relationship
> >    4. loadField() calls toField() which is defined in AbstractPCData.
> >    5. toField() LOOPS through all elements (orders) to invoke
> >    toNestedField() for each element.  This is 100 toNestedFields
> > calls for the
> >    100 orders in the example.
> >    6. toNestedField() calls toRelationField(sm, vmd, data, fetch,
> >    context) which actually calls find() and recursively get back to
> > step 1
> >    above for loading "a" order.  This will end up calling get() 100
> > times to
> >    the DataCache for the 100 orders and can possibly get into
> > another loop for
> >    loading all products in each order, etc.
> >
> > Because of the loop in step 5 above, a single "find(customerA)"
> > statement
> > actually triggers 100 DataCahce.get() for its orders and could be
> > hundreds
> > or thousands more of the get() calls for the products ordered by the
> > customer.  This is a performance hit as I understand.
> >
> > If we have getAll(List keys) method which returns a list of objects
> > from the
> > datacache, we can change the logic to call the following new
> > methods to get
> > all elements (orders/products) in one relationship in single call to
> > getAll(); instead of calling get() a hundred times for 100 orders.
> >
> >    - toNestedFields() - called by toFields without the loop
> >    - toRelationFields() - called by toNestedFields; calls findAll()
> >    - findAll need to be able to initialize a List of sm and call
> >    initializeAll()
> >    - initializeAll() - call getAll() instead of get(), then iterate
> > the
> >    return to call load
> >
> > This is more like doing batch fetch from DataCache.  There should
> > be some
> > significant performance improvement, especially in the distributed
> > environment in which the communication/serialization area is known
> > be the
> > bottleneck of the whole process.  This implementation can also
> > potentially
> > provide a lot better performance for the 3-rd party DataCache plug-
> > ins which
> > provide and optimize getAll() process.
> >
> > Hope this make the issue more clear this time.  Could you please
> > let me know
> > if you have further questions or other concerns.  Many thanks.
> >
> > Daniel
> >
> > On 5/24/07, Craig L Russell <Cr...@sun.com> wrote:
> >
> > > Hi Daniel,
> > >
> > > On May 24, 2007, at 11:59 AM, Daniel Lee wrote:
> > >
> > > > Hi Craig,
> > > >
> > > > I think findAll() is different.  It is a client level API and the
> > > > getAll()
> > > > here is for internal fetch from data cache.
> > > >
> > > > In the example, when an application issue findAll() for a list of
> > > > customers.  It internally, for each customer with order(s),
> > loads the
> > > > "eager" relationship (orders) from data cache if they are already
> > > > cached by
> > > > calling map.get (orderId) for each order placed by the
> > customer.  It
> > > > again
> > > > load the items that are related to each order by calling map.get
> > > > (itemId) for
> > > > each item if the relationship to Order is declared as eager.
> > This is
> > > > potentially a performance bottleneck and findAll() does not avoid
> > > > this.
> > >
> > > Seems that this algorithm can be improved to use the broker's
> > findAll
> > > mechanism when the instance is not found in the cache. The not-found
> > > instances can be found more efficiently than the code currently
> > does.
> > >
> > > Craig
> > > >
> > > > Thanks.
> > > > Daniel
> > > >
> > > >
> > > > On 5/23/07, Craig L Russell < Craig.Russell@sun.com> wrote:
> > > >>
> > > >> Hi Daniel,
> > > >>
> > > >> Take a look at the findAll(Collection oids) method of
> > > >> OpenJPAEntityManager. This should do a better job than N get
> > (Object
> > > >> key) methods.
> > > >>
> > > >> Craig
> > > >>
> > > >> On May 23, 2007, at 3:55 PM, Daniel Lee wrote:
> > > >>
> > > >> > Do we miss the getAll(List keys) method for data cache?
> > > >> >
> > > >> > When fetching objects with eager "to-many" relationships, the
> > > >> code is
> > > >> > calling get(Object key) multiple time (one for each object
> > in the
> > > >> > relationship).  For example, it is doing 1 get() call for each
> > > >> > order placed
> > > >> > by a customer which we are fetching, that means 100 calls for a
> > > >> > customer
> > > >> > with 100 orders.  The performance can be greatly improved if we
> > > >> have
> > > >> > getAll(List keys) methods which returns all orders in one call.
> > > >> > This is
> > > >> > especially important in a distributed environment.
> > > >> >
> > > >> > Is there a way (new plug-in) to avoid the multiple-trip for
> > single
> > > >> > relationship, or can we implement the code to improve the
> > > >> > performance in
> > > >> > this area?
> > > >> >
> > > >> > Many thanks.
> > > >> > Daniel
> > > >>
> > > >> Craig Russell
> > > >> Architect, Sun Java Enterprise System http://java.sun.com/
> > products/
> > > >> jdo
> > > >> 408 276-5638 mailto:Craig.Russell@sun.com
> > > >> P.S. A good JDO? O, Gasp!
> > > >>
> > > >>
> > > >>
> > >
> > > Craig Russell
> > > Architect, Sun Java Enterprise System http://java.sun.com/
> > products/jdo
> > > 408 276-5638 mailto: Craig.Russell@sun.com
> > > P.S. A good JDO? O, Gasp!
> > >
> > >
> > >
> >
>
> --
> Marc Prud'hommeaux
> BEA Systems, Inc.
>
>
>
> Notice:  This email message, together with any attachments, may contain
> information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
> entities,  that may be confidential,  proprietary,  copyrighted  and/or
> legally privileged, and is intended solely for the use of the individual or
> entity named in this message. If you are not the intended recipient, and
> have received this message in error, please immediately return this by email
> and then delete it.
>

Re: missing getAll(List keys) method?

Posted by Marc Prud'hommeaux <mp...@bea.com>.
I personally think it sounds like a good idea that has a lot of  
potential for performance improvement.

Perhaps someone could come up with a sample implementation that adds  
the API and a default implementation in the DataCacheImpl and compare  
the performance in the scenario mentioned below? That would help  
establish a concrete justification for enhancing the DataCache  
interface.



On May 30, 2007, at 12:48 PM, Kevin Sutter wrote:

> Marc,
> What are your views on this request?  Since you seem to be  
> intimately familiar with the data cache API, do you see a problem  
> with introducing this additional get method?  Either from an  
> expectation viewpoint or an implementation viewpoint?  Thanks.
>
> Kevin
>
> On 5/29/07, Daniel Lee <ts...@gmail.com> wrote: Hi Craig,
>
> The discussed API (getAll) is for fetching objects that's already  
> cached in
> the DataCache.  From what I understand, OpenJPA executes the  
> following code
> when loading (find()) a customer which exists in the DataCache.  It  
> loads
> not only the customer but also the objects in any eager (direct and
> indirect) relationships.  In the earlier example (a customer with  
> 100 orders
> and each order has different products) the direct relationships
> are all orders placed by the customer and the indirect  
> relationships are all
> products in these orders).
>
>    1. BrokerImpl.find() calls DataCacheStoreManager.initialize() to
>    initialize a new state manager for an object (a customer with  
> 100 orders
>    for example).
>    2. initialize() then issues get() to DataCache to see whether  
> the data
>    (customer) is already cached.  After successfully getting the  
> customer (data
>    != null) from the datacache, DataCachePCData.load (sm, fetch,  
> edata) is
>    invoked to load all the eager relationships (orders in the  
> example) of the
>    object (customer).
>    3. PCDataImpl.load() loops through the relationship field to call
>    loadField() for each relationship which is not yet loaded.  In  
> this example,
>    it is the relationship the customer to its orders (eager,
>    one-to-many) relationship
>    4. loadField() calls toField() which is defined in AbstractPCData.
>    5. toField() LOOPS through all elements (orders) to invoke
>    toNestedField() for each element.  This is 100 toNestedFields  
> calls for the
>    100 orders in the example.
>    6. toNestedField() calls toRelationField(sm, vmd, data, fetch,
>    context) which actually calls find() and recursively get back to  
> step 1
>    above for loading "a" order.  This will end up calling get() 100  
> times to
>    the DataCache for the 100 orders and can possibly get into  
> another loop for
>    loading all products in each order, etc.
>
> Because of the loop in step 5 above, a single "find(customerA)"  
> statement
> actually triggers 100 DataCahce.get() for its orders and could be  
> hundreds
> or thousands more of the get() calls for the products ordered by the
> customer.  This is a performance hit as I understand.
>
> If we have getAll(List keys) method which returns a list of objects  
> from the
> datacache, we can change the logic to call the following new  
> methods to get
> all elements (orders/products) in one relationship in single call to
> getAll(); instead of calling get() a hundred times for 100 orders.
>
>    - toNestedFields() - called by toFields without the loop
>    - toRelationFields() - called by toNestedFields; calls findAll()
>    - findAll need to be able to initialize a List of sm and call
>    initializeAll()
>    - initializeAll() - call getAll() instead of get(), then iterate  
> the
>    return to call load
>
> This is more like doing batch fetch from DataCache.  There should  
> be some
> significant performance improvement, especially in the distributed
> environment in which the communication/serialization area is known  
> be the
> bottleneck of the whole process.  This implementation can also  
> potentially
> provide a lot better performance for the 3-rd party DataCache plug- 
> ins which
> provide and optimize getAll() process.
>
> Hope this make the issue more clear this time.  Could you please  
> let me know
> if you have further questions or other concerns.  Many thanks.
>
> Daniel
>
> On 5/24/07, Craig L Russell <Cr...@sun.com> wrote:
>
> > Hi Daniel,
> >
> > On May 24, 2007, at 11:59 AM, Daniel Lee wrote:
> >
> > > Hi Craig,
> > >
> > > I think findAll() is different.  It is a client level API and the
> > > getAll()
> > > here is for internal fetch from data cache.
> > >
> > > In the example, when an application issue findAll() for a list of
> > > customers.  It internally, for each customer with order(s),  
> loads the
> > > "eager" relationship (orders) from data cache if they are already
> > > cached by
> > > calling map.get (orderId) for each order placed by the  
> customer.  It
> > > again
> > > load the items that are related to each order by calling map.get
> > > (itemId) for
> > > each item if the relationship to Order is declared as eager.   
> This is
> > > potentially a performance bottleneck and findAll() does not avoid
> > > this.
> >
> > Seems that this algorithm can be improved to use the broker's  
> findAll
> > mechanism when the instance is not found in the cache. The not-found
> > instances can be found more efficiently than the code currently  
> does.
> >
> > Craig
> > >
> > > Thanks.
> > > Daniel
> > >
> > >
> > > On 5/23/07, Craig L Russell < Craig.Russell@sun.com> wrote:
> > >>
> > >> Hi Daniel,
> > >>
> > >> Take a look at the findAll(Collection oids) method of
> > >> OpenJPAEntityManager. This should do a better job than N get 
> (Object
> > >> key) methods.
> > >>
> > >> Craig
> > >>
> > >> On May 23, 2007, at 3:55 PM, Daniel Lee wrote:
> > >>
> > >> > Do we miss the getAll(List keys) method for data cache?
> > >> >
> > >> > When fetching objects with eager "to-many" relationships, the
> > >> code is
> > >> > calling get(Object key) multiple time (one for each object  
> in the
> > >> > relationship).  For example, it is doing 1 get() call for each
> > >> > order placed
> > >> > by a customer which we are fetching, that means 100 calls for a
> > >> > customer
> > >> > with 100 orders.  The performance can be greatly improved if we
> > >> have
> > >> > getAll(List keys) methods which returns all orders in one call.
> > >> > This is
> > >> > especially important in a distributed environment.
> > >> >
> > >> > Is there a way (new plug-in) to avoid the multiple-trip for  
> single
> > >> > relationship, or can we implement the code to improve the
> > >> > performance in
> > >> > this area?
> > >> >
> > >> > Many thanks.
> > >> > Daniel
> > >>
> > >> Craig Russell
> > >> Architect, Sun Java Enterprise System http://java.sun.com/ 
> products/
> > >> jdo
> > >> 408 276-5638 mailto:Craig.Russell@sun.com
> > >> P.S. A good JDO? O, Gasp!
> > >>
> > >>
> > >>
> >
> > Craig Russell
> > Architect, Sun Java Enterprise System http://java.sun.com/ 
> products/jdo
> > 408 276-5638 mailto: Craig.Russell@sun.com
> > P.S. A good JDO? O, Gasp!
> >
> >
> >
>

--
Marc Prud'hommeaux
BEA Systems, Inc.



Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.

Re: missing getAll(List keys) method?

Posted by Kevin Sutter <kw...@gmail.com>.
Marc,
What are your views on this request?  Since you seem to be intimately
familiar with the data cache API, do you see a problem with introducing this
additional get method?  Either from an expectation viewpoint or an
implementation viewpoint?  Thanks.

Kevin

On 5/29/07, Daniel Lee <ts...@gmail.com> wrote:
>
> Hi Craig,
>
> The discussed API (getAll) is for fetching objects that's already cached
> in
> the DataCache.  From what I understand, OpenJPA executes the following
> code
> when loading (find()) a customer which exists in the DataCache.  It loads
> not only the customer but also the objects in any eager (direct and
> indirect) relationships.  In the earlier example (a customer with 100
> orders
> and each order has different products) the direct relationships
> are all orders placed by the customer and the indirect relationships are
> all
> products in these orders).
>
>    1. BrokerImpl.find() calls DataCacheStoreManager.initialize() to
>    initialize a new state manager for an object (a customer with 100
> orders
>    for example).
>    2. initialize() then issues get() to DataCache to see whether the data
>    (customer) is already cached.  After successfully getting the customer
> (data
>    != null) from the datacache, DataCachePCData.load(sm, fetch, edata) is
>    invoked to load all the eager relationships (orders in the example) of
> the
>    object (customer).
>    3. PCDataImpl.load() loops through the relationship field to call
>    loadField() for each relationship which is not yet loaded.  In this
> example,
>    it is the relationship the customer to its orders (eager,
>    one-to-many) relationship
>    4. loadField() calls toField() which is defined in AbstractPCData.
>    5. toField() LOOPS through all elements (orders) to invoke
>    toNestedField() for each element.  This is 100 toNestedFields calls for
> the
>    100 orders in the example.
>    6. toNestedField() calls toRelationField(sm, vmd, data, fetch,
>    context) which actually calls find() and recursively get back to step 1
>    above for loading "a" order.  This will end up calling get() 100 times
> to
>    the DataCache for the 100 orders and can possibly get into another loop
> for
>    loading all products in each order, etc.
>
> Because of the loop in step 5 above, a single "find(customerA)" statement
> actually triggers 100 DataCahce.get() for its orders and could be hundreds
> or thousands more of the get() calls for the products ordered by the
> customer.  This is a performance hit as I understand.
>
> If we have getAll(List keys) method which returns a list of objects from
> the
> datacache, we can change the logic to call the following new methods to
> get
> all elements (orders/products) in one relationship in single call to
> getAll(); instead of calling get() a hundred times for 100 orders.
>
>    - toNestedFields() - called by toFields without the loop
>    - toRelationFields() - called by toNestedFields; calls findAll()
>    - findAll need to be able to initialize a List of sm and call
>    initializeAll()
>    - initializeAll() - call getAll() instead of get(), then iterate the
>    return to call load
>
> This is more like doing batch fetch from DataCache.  There should be some
> significant performance improvement, especially in the distributed
> environment in which the communication/serialization area is known be the
> bottleneck of the whole process.  This implementation can also potentially
> provide a lot better performance for the 3-rd party DataCache plug-ins
> which
> provide and optimize getAll() process.
>
> Hope this make the issue more clear this time.  Could you please let me
> know
> if you have further questions or other concerns.  Many thanks.
>
> Daniel
>
> On 5/24/07, Craig L Russell <Cr...@sun.com> wrote:
>
> > Hi Daniel,
> >
> > On May 24, 2007, at 11:59 AM, Daniel Lee wrote:
> >
> > > Hi Craig,
> > >
> > > I think findAll() is different.  It is a client level API and the
> > > getAll()
> > > here is for internal fetch from data cache.
> > >
> > > In the example, when an application issue findAll() for a list of
> > > customers.  It internally, for each customer with order(s), loads the
> > > "eager" relationship (orders) from data cache if they are already
> > > cached by
> > > calling map.get(orderId) for each order placed by the customer.  It
> > > again
> > > load the items that are related to each order by calling map.get
> > > (itemId) for
> > > each item if the relationship to Order is declared as eager.  This is
> > > potentially a performance bottleneck and findAll() does not avoid
> > > this.
> >
> > Seems that this algorithm can be improved to use the broker's findAll
> > mechanism when the instance is not found in the cache. The not-found
> > instances can be found more efficiently than the code currently does.
> >
> > Craig
> > >
> > > Thanks.
> > > Daniel
> > >
> > >
> > > On 5/23/07, Craig L Russell <Cr...@sun.com> wrote:
> > >>
> > >> Hi Daniel,
> > >>
> > >> Take a look at the findAll(Collection oids) method of
> > >> OpenJPAEntityManager. This should do a better job than N get(Object
> > >> key) methods.
> > >>
> > >> Craig
> > >>
> > >> On May 23, 2007, at 3:55 PM, Daniel Lee wrote:
> > >>
> > >> > Do we miss the getAll(List keys) method for data cache?
> > >> >
> > >> > When fetching objects with eager "to-many" relationships, the
> > >> code is
> > >> > calling get(Object key) multiple time (one for each object in the
> > >> > relationship).  For example, it is doing 1 get() call for each
> > >> > order placed
> > >> > by a customer which we are fetching, that means 100 calls for a
> > >> > customer
> > >> > with 100 orders.  The performance can be greatly improved if we
> > >> have
> > >> > getAll(List keys) methods which returns all orders in one call.
> > >> > This is
> > >> > especially important in a distributed environment.
> > >> >
> > >> > Is there a way (new plug-in) to avoid the multiple-trip for single
> > >> > relationship, or can we implement the code to improve the
> > >> > performance in
> > >> > this area?
> > >> >
> > >> > Many thanks.
> > >> > Daniel
> > >>
> > >> Craig Russell
> > >> Architect, Sun Java Enterprise System http://java.sun.com/products/
> > >> jdo
> > >> 408 276-5638 mailto:Craig.Russell@sun.com
> > >> P.S. A good JDO? O, Gasp!
> > >>
> > >>
> > >>
> >
> > Craig Russell
> > Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
> > 408 276-5638 mailto:Craig.Russell@sun.com
> > P.S. A good JDO? O, Gasp!
> >
> >
> >
>

Re: missing getAll(List keys) method?

Posted by Daniel Lee <ts...@gmail.com>.
Hi Craig,

The discussed API (getAll) is for fetching objects that's already cached in
the DataCache.  From what I understand, OpenJPA executes the following code
when loading (find()) a customer which exists in the DataCache.  It loads
not only the customer but also the objects in any eager (direct and
indirect) relationships.  In the earlier example (a customer with 100 orders
and each order has different products) the direct relationships
are all orders placed by the customer and the indirect relationships are all
products in these orders).

   1. BrokerImpl.find() calls DataCacheStoreManager.initialize() to
   initialize a new state manager for an object (a customer with 100 orders
   for example).
   2. initialize() then issues get() to DataCache to see whether the data
   (customer) is already cached.  After successfully getting the customer (data
   != null) from the datacache, DataCachePCData.load(sm, fetch, edata) is
   invoked to load all the eager relationships (orders in the example) of the
   object (customer).
   3. PCDataImpl.load() loops through the relationship field to call
   loadField() for each relationship which is not yet loaded.  In this example,
   it is the relationship the customer to its orders (eager,
   one-to-many) relationship
   4. loadField() calls toField() which is defined in AbstractPCData.
   5. toField() LOOPS through all elements (orders) to invoke
   toNestedField() for each element.  This is 100 toNestedFields calls for the
   100 orders in the example.
   6. toNestedField() calls toRelationField(sm, vmd, data, fetch,
   context) which actually calls find() and recursively get back to step 1
   above for loading "a" order.  This will end up calling get() 100 times to
   the DataCache for the 100 orders and can possibly get into another loop for
   loading all products in each order, etc.

Because of the loop in step 5 above, a single "find(customerA)" statement
actually triggers 100 DataCahce.get() for its orders and could be hundreds
or thousands more of the get() calls for the products ordered by the
customer.  This is a performance hit as I understand.

If we have getAll(List keys) method which returns a list of objects from the
datacache, we can change the logic to call the following new methods to get
all elements (orders/products) in one relationship in single call to
getAll(); instead of calling get() a hundred times for 100 orders.

   - toNestedFields() - called by toFields without the loop
   - toRelationFields() - called by toNestedFields; calls findAll()
   - findAll need to be able to initialize a List of sm and call
   initializeAll()
   - initializeAll() - call getAll() instead of get(), then iterate the
   return to call load

This is more like doing batch fetch from DataCache.  There should be some
significant performance improvement, especially in the distributed
environment in which the communication/serialization area is known be the
bottleneck of the whole process.  This implementation can also potentially
provide a lot better performance for the 3-rd party DataCache plug-ins which
provide and optimize getAll() process.

Hope this make the issue more clear this time.  Could you please let me know
if you have further questions or other concerns.  Many thanks.

Daniel

On 5/24/07, Craig L Russell <Cr...@sun.com> wrote:

> Hi Daniel,
>
> On May 24, 2007, at 11:59 AM, Daniel Lee wrote:
>
> > Hi Craig,
> >
> > I think findAll() is different.  It is a client level API and the
> > getAll()
> > here is for internal fetch from data cache.
> >
> > In the example, when an application issue findAll() for a list of
> > customers.  It internally, for each customer with order(s), loads the
> > "eager" relationship (orders) from data cache if they are already
> > cached by
> > calling map.get(orderId) for each order placed by the customer.  It
> > again
> > load the items that are related to each order by calling map.get
> > (itemId) for
> > each item if the relationship to Order is declared as eager.  This is
> > potentially a performance bottleneck and findAll() does not avoid
> > this.
>
> Seems that this algorithm can be improved to use the broker's findAll
> mechanism when the instance is not found in the cache. The not-found
> instances can be found more efficiently than the code currently does.
>
> Craig
> >
> > Thanks.
> > Daniel
> >
> >
> > On 5/23/07, Craig L Russell <Cr...@sun.com> wrote:
> >>
> >> Hi Daniel,
> >>
> >> Take a look at the findAll(Collection oids) method of
> >> OpenJPAEntityManager. This should do a better job than N get(Object
> >> key) methods.
> >>
> >> Craig
> >>
> >> On May 23, 2007, at 3:55 PM, Daniel Lee wrote:
> >>
> >> > Do we miss the getAll(List keys) method for data cache?
> >> >
> >> > When fetching objects with eager "to-many" relationships, the
> >> code is
> >> > calling get(Object key) multiple time (one for each object in the
> >> > relationship).  For example, it is doing 1 get() call for each
> >> > order placed
> >> > by a customer which we are fetching, that means 100 calls for a
> >> > customer
> >> > with 100 orders.  The performance can be greatly improved if we
> >> have
> >> > getAll(List keys) methods which returns all orders in one call.
> >> > This is
> >> > especially important in a distributed environment.
> >> >
> >> > Is there a way (new plug-in) to avoid the multiple-trip for single
> >> > relationship, or can we implement the code to improve the
> >> > performance in
> >> > this area?
> >> >
> >> > Many thanks.
> >> > Daniel
> >>
> >> Craig Russell
> >> Architect, Sun Java Enterprise System http://java.sun.com/products/
> >> jdo
> >> 408 276-5638 mailto:Craig.Russell@sun.com
> >> P.S. A good JDO? O, Gasp!
> >>
> >>
> >>
>
> Craig Russell
> Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
> 408 276-5638 mailto:Craig.Russell@sun.com
> P.S. A good JDO? O, Gasp!
>
>
>

Re: missing getAll(List keys) method?

Posted by Craig L Russell <Cr...@Sun.COM>.
Hi Daniel,

On May 24, 2007, at 11:59 AM, Daniel Lee wrote:

> Hi Craig,
>
> I think findAll() is different.  It is a client level API and the  
> getAll()
> here is for internal fetch from data cache.
>
> In the example, when an application issue findAll() for a list of
> customers.  It internally, for each customer with order(s), loads the
> "eager" relationship (orders) from data cache if they are already  
> cached by
> calling map.get(orderId) for each order placed by the customer.  It  
> again
> load the items that are related to each order by calling map.get 
> (itemId) for
> each item if the relationship to Order is declared as eager.  This is
> potentially a performance bottleneck and findAll() does not avoid  
> this.

Seems that this algorithm can be improved to use the broker's findAll  
mechanism when the instance is not found in the cache. The not-found  
instances can be found more efficiently than the code currently does.

Craig
>
> Thanks.
> Daniel
>
>
> On 5/23/07, Craig L Russell <Cr...@sun.com> wrote:
>>
>> Hi Daniel,
>>
>> Take a look at the findAll(Collection oids) method of
>> OpenJPAEntityManager. This should do a better job than N get(Object
>> key) methods.
>>
>> Craig
>>
>> On May 23, 2007, at 3:55 PM, Daniel Lee wrote:
>>
>> > Do we miss the getAll(List keys) method for data cache?
>> >
>> > When fetching objects with eager "to-many" relationships, the  
>> code is
>> > calling get(Object key) multiple time (one for each object in the
>> > relationship).  For example, it is doing 1 get() call for each
>> > order placed
>> > by a customer which we are fetching, that means 100 calls for a
>> > customer
>> > with 100 orders.  The performance can be greatly improved if we  
>> have
>> > getAll(List keys) methods which returns all orders in one call.
>> > This is
>> > especially important in a distributed environment.
>> >
>> > Is there a way (new plug-in) to avoid the multiple-trip for single
>> > relationship, or can we implement the code to improve the
>> > performance in
>> > this area?
>> >
>> > Many thanks.
>> > Daniel
>>
>> Craig Russell
>> Architect, Sun Java Enterprise System http://java.sun.com/products/ 
>> jdo
>> 408 276-5638 mailto:Craig.Russell@sun.com
>> P.S. A good JDO? O, Gasp!
>>
>>
>>

Craig Russell
Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!


Re: missing getAll(List keys) method?

Posted by Daniel Lee <ts...@gmail.com>.
Hi Craig,

I think findAll() is different.  It is a client level API and the getAll()
here is for internal fetch from data cache.

In the example, when an application issue findAll() for a list of
customers.  It internally, for each customer with order(s), loads the
"eager" relationship (orders) from data cache if they are already cached by
calling map.get(orderId) for each order placed by the customer.  It again
load the items that are related to each order by calling map.get(itemId) for
each item if the relationship to Order is declared as eager.  This is
potentially a performance bottleneck and findAll() does not avoid this.

Thanks.
Daniel


On 5/23/07, Craig L Russell <Cr...@sun.com> wrote:
>
> Hi Daniel,
>
> Take a look at the findAll(Collection oids) method of
> OpenJPAEntityManager. This should do a better job than N get(Object
> key) methods.
>
> Craig
>
> On May 23, 2007, at 3:55 PM, Daniel Lee wrote:
>
> > Do we miss the getAll(List keys) method for data cache?
> >
> > When fetching objects with eager "to-many" relationships, the code is
> > calling get(Object key) multiple time (one for each object in the
> > relationship).  For example, it is doing 1 get() call for each
> > order placed
> > by a customer which we are fetching, that means 100 calls for a
> > customer
> > with 100 orders.  The performance can be greatly improved if we have
> > getAll(List keys) methods which returns all orders in one call.
> > This is
> > especially important in a distributed environment.
> >
> > Is there a way (new plug-in) to avoid the multiple-trip for single
> > relationship, or can we implement the code to improve the
> > performance in
> > this area?
> >
> > Many thanks.
> > Daniel
>
> Craig Russell
> Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
> 408 276-5638 mailto:Craig.Russell@sun.com
> P.S. A good JDO? O, Gasp!
>
>
>

Re: missing getAll(List keys) method?

Posted by Craig L Russell <Cr...@Sun.COM>.
Hi Daniel,

Take a look at the findAll(Collection oids) method of  
OpenJPAEntityManager. This should do a better job than N get(Object  
key) methods.

Craig

On May 23, 2007, at 3:55 PM, Daniel Lee wrote:

> Do we miss the getAll(List keys) method for data cache?
>
> When fetching objects with eager "to-many" relationships, the code is
> calling get(Object key) multiple time (one for each object in the
> relationship).  For example, it is doing 1 get() call for each  
> order placed
> by a customer which we are fetching, that means 100 calls for a  
> customer
> with 100 orders.  The performance can be greatly improved if we have
> getAll(List keys) methods which returns all orders in one call.   
> This is
> especially important in a distributed environment.
>
> Is there a way (new plug-in) to avoid the multiple-trip for single
> relationship, or can we implement the code to improve the  
> performance in
> this area?
>
> Many thanks.
> Daniel

Craig Russell
Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!