You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ofbiz.apache.org by Adam Heath <do...@brainfood.com> on 2010/12/06 04:45:40 UTC

findFoo(cache=false), without modification, in a transaction, issue warning

I've noticed several places in the code that issue a findOne(or other
find method), without using a cache, but then don't do any
modifications to the returned value.  Such call sites should really
use a caching variant of the delegator.

What I'd like to work on, is a feature, only used during development,
that would record any value looked up that wasn't cached, and wasn't
later modified, and print a warning, with an exception, to allow for a
caching variant to be added.  Would others find this useful?

Obviously, it would need to be transaction based, and would sometimes
have false-positives.  Additionally, if no transaction is active, then
that automatically means a caching variant should be used.

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by David E Jones <de...@me.com>.

On Dec 5, 2010, at 9:26 PM, Adrian Crum wrote:

> --- On Sun, 12/5/10, Adam Heath <do...@brainfood.com> wrote:
>> Again, why are people pushing back on this?  It hurts
>> no one, and
>> could find cases where caching actually would help. 
>> Really, what is
>> the harm?
> 
> I can only speak for myself. I push back on entity caching stuff because I feel it is redundant. Database systems cache data. Those database systems run on top of operating systems that cache data. It seems to me that it's redundant to cache data entity data at the application level. I'm not an expert on the subject, I'm just suggesting that caching optimization should be done at a lower level. Entity caching seems to me to be a form of premature optimization.

There is a lot of value of caching on the app server for frequently used or performance sensitive data where stale data isn't such a concern.

For most caches the whole point of the cache is to keep the data as close to where it is used as possible so that it is cheaper/faster to get. For example caches built into hard disks don't make much sense from that perspective. However, really caches built into hard disks serve a different purpose such as read-ahead buffering and such as opposed to caching data that has already been read just in case it is read again (which operating systems do for certain things, databases for other things, and app servers for others, as they should... and hard disk caches shouldn't).

-David

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Sun, 12/5/10, Adam Heath <do...@brainfood.com> wrote:
> Again, why are people pushing back on this?  It hurts
> no one, and
> could find cases where caching actually would help. 
> Really, what is
> the harm?

I can only speak for myself. I push back on entity caching stuff because I feel it is redundant. Database systems cache data. Those database systems run on top of operating systems that cache data. It seems to me that it's redundant to cache data entity data at the application level. I'm not an expert on the subject, I'm just suggesting that caching optimization should be done at a lower level. Entity caching seems to me to be a form of premature optimization.

-Adrian

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adam Heath <do...@brainfood.com>.

David E Jones wrote:
> On Dec 5, 2010, at 8:47 PM, Adam Heath wrote:
> 
>> David E Jones wrote:
>>> On Dec 5, 2010, at 8:11 PM, Adam Heath wrote:
>>>
>>>> David E Jones wrote:
>>>>> There are other reasons why you might not want to cache something. Quite a few of them, even without trying to be too creative.
>>>>>
>>>>> For example: memory use. Why cache things like contact mech or order information that isn't likely to be read a very many times, but if cached would take up memory until expired and cleared (and would only be cleared if something is cleaning up expired entries) or until it makes it to the end of the LRU queue or whatever, and the whole time requiring additional resources to keep track of and making more legitimate caching less efficient.
>>>>>
>>>>> Another example: freshness. If you want to be sure the data is good, get it from the db. If it won't cause big problems to have stale data, then the cache is okay.
>>>>>
>>>>> Am I the only one who things caching everything by default (as Marc suggested) or popping out warning messages (as Adam suggests here) is a little crazy? I haven't seen anyone else speak up, so just wondering...
>>>> I've got a change queued that has deprecated getRelatedOne(as
>>>> suggested earlier).  While doing that, I also decided to go thru and
>>>> fix all the remaining findByPrimaryKey, replacing with findOne.
>>>> During that change, I noticed several FooWorker/FooHelper, that should
>>>> really be using caching variants.
>>> Yes, I don't doubt that there is code in OFBiz that could leverage caching more effectively.
>>>
>>>> A read-only fetch should be cached.  A read-only web request should
>>>> never hit the database.  
>>> Where do these rules come from? How did they become rules?
>>>
>>>> If it's a question of memory, and a
>>>> particular install is throwing away cache entries too quickly, then
>>>> that install needs more memory.
>>> I guess my points weren't very good if that's all the rebuttal you thought was needed.
>>>
>>> How about this case: running a little ecommerce site with 100,000 customers. The typical customer visits the site around once per week, though some customers rarely visit the site, maybe a few times per year. A customer logs in, and looks their customer profile, which is 100% read-only (not data is changed as the page loads). They may look at the profile page again if they decide to edit something, but generally will not otherwise. Should all of the data on the profile be cached so that on repeated views we can follow your two rules above (read-only data cached, read-only web request never hit the database (I assume after the first hit, of course))?
>>> in
>>> So, we should all of the customer's contact, order, contact/mailing list, etc information all cached? How much memory are you thinking of... something roughly the same size as the database? Or is the whole point of this to try to argue for in-memory databases... yeah... I see where you're going...
>>>
>>> And, BTW, this isn't some weird case... for most data in an ERP or ecommerce system this IS the case. In other words, it has a low reads/time-period rate, and a low read to update ratio, and therefore caching is a waste of system resources and more likely to cause issues with bad/stale data (especially across multiple app servers... unless you want to keep all caches perfectly in sync all the time and take the performance hit from that goal, especially making sure other caches are updated before you commit, ie involve the caches in the 2-phase commit so that they are all updated before the data can be committed to the db).
>> Where in the universe does 'should' mean always?  I never said all
>> read-only things must be cached.
>>
>> There are definite times during a production install where parts of
>> the system perform exceptionally slow.  My suggested change would make
>> it simpler to identify where those slow points are, finding values
>> that are being looked up repeatedly, and changing them to be cached.
>>
>> We currently have a client experiencing timeouts when submitting
>> orders(ofbiz takes more than 2 minutes to process it).  Yes, we could
>> increase the default timeout; but that has it's own problems.  If we
>> could identify the hot-points, with an easy to use tool, then everyone
>> would benefit.  Where is the harm in that?
> 
> If it's not really the problem, then maybe it would do more harm than good, like massively increasing log message volume without helping you get valuable information.
> 
> I actually just recently worked with a client who had a big performance problem on add to cart and it turned out that the problem was caching too much stuff which resulted in less important data pushing important data out of memory as the heap memory started getting full.
> 
> There's always a trade-off...
> 
>> Are you suggesting that because sometimes you don't want to cache some
>> things, that we shouldn't cache anything?
> 
> Is that what it sounded like I wrote, or is this just a random thought?
> 
>> <segway>
>> Lately, I've gotten very bad responses to suggestions of new
>> features(not just from you, David).  Someone is wanting to do work,
>> not asking anyone else to implement something for them.  Yet, the
>> response to such suggestions are filled with vitriol.  If someone is
>> willing to do work, to improve ofbiz, then go ahead and let them.
>> </segway>
> 
> Just a total guess here, but maybe because different isn't always better? Getting into more detail:
> 
> Unfortunately OFBiz has suffered a lot from "improvements" that make more things worse than they make better. And yes, there absolutely should be a pretty good burden for contributors to convince others than changes are indeed an improvement, and in fact IMO there should be a MUCH MUCH higher burden of proof. Still, that's not the model this project runs under, so what you're seeing it people reacting to the incredible burden of proof required to keep new features out since any committer can throw stuff in whenever...

Maybe you missed where I said that this feature would not be turned on
by default.  It would have 0 impact during all runs, until such time
as a problem is detected, and an admin turns it on.  How could that
cause problems?

The feature would be used to find possible problems.  I never said it
would be used to force all issued warnings to go away, by adding
caching everywhere.  This feature would make it simpler to find the
poor performance points.  It won't make it easy, as those involved
would still have to know what is going on.

Again, why are people pushing back on this?  It hurts no one, and
could find cases where caching actually would help.  Really, what is
the harm?

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by David E Jones <de...@me.com>.

On Dec 5, 2010, at 8:47 PM, Adam Heath wrote:

> David E Jones wrote:
>> On Dec 5, 2010, at 8:11 PM, Adam Heath wrote:
>> 
>>> David E Jones wrote:
>>>> There are other reasons why you might not want to cache something. Quite a few of them, even without trying to be too creative.
>>>> 
>>>> For example: memory use. Why cache things like contact mech or order information that isn't likely to be read a very many times, but if cached would take up memory until expired and cleared (and would only be cleared if something is cleaning up expired entries) or until it makes it to the end of the LRU queue or whatever, and the whole time requiring additional resources to keep track of and making more legitimate caching less efficient.
>>>> 
>>>> Another example: freshness. If you want to be sure the data is good, get it from the db. If it won't cause big problems to have stale data, then the cache is okay.
>>>> 
>>>> Am I the only one who things caching everything by default (as Marc suggested) or popping out warning messages (as Adam suggests here) is a little crazy? I haven't seen anyone else speak up, so just wondering...
>>> I've got a change queued that has deprecated getRelatedOne(as
>>> suggested earlier).  While doing that, I also decided to go thru and
>>> fix all the remaining findByPrimaryKey, replacing with findOne.
>>> During that change, I noticed several FooWorker/FooHelper, that should
>>> really be using caching variants.
>> 
>> Yes, I don't doubt that there is code in OFBiz that could leverage caching more effectively.
>> 
>>> A read-only fetch should be cached.  A read-only web request should
>>> never hit the database.  
>> 
>> Where do these rules come from? How did they become rules?
>> 
>>> If it's a question of memory, and a
>>> particular install is throwing away cache entries too quickly, then
>>> that install needs more memory.
>> 
>> I guess my points weren't very good if that's all the rebuttal you thought was needed.
>> 
>> How about this case: running a little ecommerce site with 100,000 customers. The typical customer visits the site around once per week, though some customers rarely visit the site, maybe a few times per year. A customer logs in, and looks their customer profile, which is 100% read-only (not data is changed as the page loads). They may look at the profile page again if they decide to edit something, but generally will not otherwise. Should all of the data on the profile be cached so that on repeated views we can follow your two rules above (read-only data cached, read-only web request never hit the database (I assume after the first hit, of course))?
>> in
>> So, we should all of the customer's contact, order, contact/mailing list, etc information all cached? How much memory are you thinking of... something roughly the same size as the database? Or is the whole point of this to try to argue for in-memory databases... yeah... I see where you're going...
>> 
>> And, BTW, this isn't some weird case... for most data in an ERP or ecommerce system this IS the case. In other words, it has a low reads/time-period rate, and a low read to update ratio, and therefore caching is a waste of system resources and more likely to cause issues with bad/stale data (especially across multiple app servers... unless you want to keep all caches perfectly in sync all the time and take the performance hit from that goal, especially making sure other caches are updated before you commit, ie involve the caches in the 2-phase commit so that they are all updated before the data can be committed to the db).
> 
> Where in the universe does 'should' mean always?  I never said all
> read-only things must be cached.
> 
> There are definite times during a production install where parts of
> the system perform exceptionally slow.  My suggested change would make
> it simpler to identify where those slow points are, finding values
> that are being looked up repeatedly, and changing them to be cached.
> 
> We currently have a client experiencing timeouts when submitting
> orders(ofbiz takes more than 2 minutes to process it).  Yes, we could
> increase the default timeout; but that has it's own problems.  If we
> could identify the hot-points, with an easy to use tool, then everyone
> would benefit.  Where is the harm in that?

If it's not really the problem, then maybe it would do more harm than good, like massively increasing log message volume without helping you get valuable information.

I actually just recently worked with a client who had a big performance problem on add to cart and it turned out that the problem was caching too much stuff which resulted in less important data pushing important data out of memory as the heap memory started getting full.

There's always a trade-off...

> Are you suggesting that because sometimes you don't want to cache some
> things, that we shouldn't cache anything?

Is that what it sounded like I wrote, or is this just a random thought?

> <segway>
> Lately, I've gotten very bad responses to suggestions of new
> features(not just from you, David).  Someone is wanting to do work,
> not asking anyone else to implement something for them.  Yet, the
> response to such suggestions are filled with vitriol.  If someone is
> willing to do work, to improve ofbiz, then go ahead and let them.
> </segway>

Just a total guess here, but maybe because different isn't always better? Getting into more detail:

Unfortunately OFBiz has suffered a lot from "improvements" that make more things worse than they make better. And yes, there absolutely should be a pretty good burden for contributors to convince others than changes are indeed an improvement, and in fact IMO there should be a MUCH MUCH higher burden of proof. Still, that's not the model this project runs under, so what you're seeing it people reacting to the incredible burden of proof required to keep new features out since any committer can throw stuff in whenever...

-David

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adam Heath <do...@brainfood.com>.

David E Jones wrote:
> On Dec 5, 2010, at 8:11 PM, Adam Heath wrote:
> 
>> David E Jones wrote:
>>> There are other reasons why you might not want to cache something. Quite a few of them, even without trying to be too creative.
>>>
>>> For example: memory use. Why cache things like contact mech or order information that isn't likely to be read a very many times, but if cached would take up memory until expired and cleared (and would only be cleared if something is cleaning up expired entries) or until it makes it to the end of the LRU queue or whatever, and the whole time requiring additional resources to keep track of and making more legitimate caching less efficient.
>>>
>>> Another example: freshness. If you want to be sure the data is good, get it from the db. If it won't cause big problems to have stale data, then the cache is okay.
>>>
>>> Am I the only one who things caching everything by default (as Marc suggested) or popping out warning messages (as Adam suggests here) is a little crazy? I haven't seen anyone else speak up, so just wondering...
>> I've got a change queued that has deprecated getRelatedOne(as
>> suggested earlier).  While doing that, I also decided to go thru and
>> fix all the remaining findByPrimaryKey, replacing with findOne.
>> During that change, I noticed several FooWorker/FooHelper, that should
>> really be using caching variants.
> 
> Yes, I don't doubt that there is code in OFBiz that could leverage caching more effectively.
> 
>> A read-only fetch should be cached.  A read-only web request should
>> never hit the database.  
> 
> Where do these rules come from? How did they become rules?
> 
>> If it's a question of memory, and a
>> particular install is throwing away cache entries too quickly, then
>> that install needs more memory.
> 
> I guess my points weren't very good if that's all the rebuttal you thought was needed.
> 
> How about this case: running a little ecommerce site with 100,000 customers. The typical customer visits the site around once per week, though some customers rarely visit the site, maybe a few times per year. A customer logs in, and looks their customer profile, which is 100% read-only (not data is changed as the page loads). They may look at the profile page again if they decide to edit something, but generally will not otherwise. Should all of the data on the profile be cached so that on repeated views we can follow your two rules above (read-only data cached, read-only web request never hit the database (I assume after the first hit, of course))?
>  in
> So, we should all of the customer's contact, order, contact/mailing list, etc information all cached? How much memory are you thinking of... something roughly the same size as the database? Or is the whole point of this to try to argue for in-memory databases... yeah... I see where you're going...
> 
> And, BTW, this isn't some weird case... for most data in an ERP or ecommerce system this IS the case. In other words, it has a low reads/time-period rate, and a low read to update ratio, and therefore caching is a waste of system resources and more likely to cause issues with bad/stale data (especially across multiple app servers... unless you want to keep all caches perfectly in sync all the time and take the performance hit from that goal, especially making sure other caches are updated before you commit, ie involve the caches in the 2-phase commit so that they are all updated before the data can be committed to the db).

Where in the universe does 'should' mean always?  I never said all
read-only things must be cached.

There are definite times during a production install where parts of
the system perform exceptionally slow.  My suggested change would make
it simpler to identify where those slow points are, finding values
that are being looked up repeatedly, and changing them to be cached.

We currently have a client experiencing timeouts when submitting
orders(ofbiz takes more than 2 minutes to process it).  Yes, we could
increase the default timeout; but that has it's own problems.  If we
could identify the hot-points, with an easy to use tool, then everyone
would benefit.  Where is the harm in that?

Are you suggesting that because sometimes you don't want to cache some
things, that we shouldn't cache anything?

<segway>
Lately, I've gotten very bad responses to suggestions of new
features(not just from you, David).  Someone is wanting to do work,
not asking anyone else to implement something for them.  Yet, the
response to such suggestions are filled with vitriol.  If someone is
willing to do work, to improve ofbiz, then go ahead and let them.
</segway>

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by David E Jones <de...@me.com>.

On Dec 5, 2010, at 8:11 PM, Adam Heath wrote:

> David E Jones wrote:
>> There are other reasons why you might not want to cache something. Quite a few of them, even without trying to be too creative.
>> 
>> For example: memory use. Why cache things like contact mech or order information that isn't likely to be read a very many times, but if cached would take up memory until expired and cleared (and would only be cleared if something is cleaning up expired entries) or until it makes it to the end of the LRU queue or whatever, and the whole time requiring additional resources to keep track of and making more legitimate caching less efficient.
>> 
>> Another example: freshness. If you want to be sure the data is good, get it from the db. If it won't cause big problems to have stale data, then the cache is okay.
>> 
>> Am I the only one who things caching everything by default (as Marc suggested) or popping out warning messages (as Adam suggests here) is a little crazy? I haven't seen anyone else speak up, so just wondering...
> 
> I've got a change queued that has deprecated getRelatedOne(as
> suggested earlier).  While doing that, I also decided to go thru and
> fix all the remaining findByPrimaryKey, replacing with findOne.
> During that change, I noticed several FooWorker/FooHelper, that should
> really be using caching variants.

Yes, I don't doubt that there is code in OFBiz that could leverage caching more effectively.

> A read-only fetch should be cached.  A read-only web request should
> never hit the database.  

Where do these rules come from? How did they become rules?

> If it's a question of memory, and a
> particular install is throwing away cache entries too quickly, then
> that install needs more memory.

I guess my points weren't very good if that's all the rebuttal you thought was needed.

How about this case: running a little ecommerce site with 100,000 customers. The typical customer visits the site around once per week, though some customers rarely visit the site, maybe a few times per year. A customer logs in, and looks their customer profile, which is 100% read-only (not data is changed as the page loads). They may look at the profile page again if they decide to edit something, but generally will not otherwise. Should all of the data on the profile be cached so that on repeated views we can follow your two rules above (read-only data cached, read-only web request never hit the database (I assume after the first hit, of course))?

So, we should all of the customer's contact, order, contact/mailing list, etc information all cached? How much memory are you thinking of... something roughly the same size as the database? Or is the whole point of this to try to argue for in-memory databases... yeah... I see where you're going...

And, BTW, this isn't some weird case... for most data in an ERP or ecommerce system this IS the case. In other words, it has a low reads/time-period rate, and a low read to update ratio, and therefore caching is a waste of system resources and more likely to cause issues with bad/stale data (especially across multiple app servers... unless you want to keep all caches perfectly in sync all the time and take the performance hit from that goal, especially making sure other caches are updated before you commit, ie involve the caches in the 2-phase commit so that they are all updated before the data can be committed to the db).

-David

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adam Heath <do...@brainfood.com>.

David E Jones wrote:
> There are other reasons why you might not want to cache something. Quite a few of them, even without trying to be too creative.
> 
> For example: memory use. Why cache things like contact mech or order information that isn't likely to be read a very many times, but if cached would take up memory until expired and cleared (and would only be cleared if something is cleaning up expired entries) or until it makes it to the end of the LRU queue or whatever, and the whole time requiring additional resources to keep track of and making more legitimate caching less efficient.
> 
> Another example: freshness. If you want to be sure the data is good, get it from the db. If it won't cause big problems to have stale data, then the cache is okay.
> 
> Am I the only one who things caching everything by default (as Marc suggested) or popping out warning messages (as Adam suggests here) is a little crazy? I haven't seen anyone else speak up, so just wondering...

I've got a change queued that has deprecated getRelatedOne(as
suggested earlier).  While doing that, I also decided to go thru and
fix all the remaining findByPrimaryKey, replacing with findOne.
During that change, I noticed several FooWorker/FooHelper, that should
really be using caching variants.

A read-only fetch should be cached.  A read-only web request should
never hit the database.  If it's a question of memory, and a
particular install is throwing away cache entries too quickly, then
that install needs more memory.

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Marc Morin <ma...@emforium.com>.

 
> ==
> <extend-entity name="Person" absorb-condition-values="true"/>
> if (modelEntity.getAbsorbConditionValues()) {
> for (GenericValue value: list) {
> cache.putToCache(value) }
> } ==
> <extend-entity name="Party">
> <extend-field name="partyTypeId" intern-value="true"/>
> </extend-entity> if (modelField.getInternValue()) {
> value = value.intern()
> } ==
> 
> It would be very difficult to store the same FooType once for multiple
> tenants, as the timestamp fields might be different across
> databases(depending on when the last time seed install was run). It
> might be possible to cache value instances, using a combined weak/soft
> value map.

We maintain a weak reference map of all objects fetched from the jdbc driver.   This ensures that only a single object (being identical) will be in the JVM's memory at a time.  We don't even try to see if there is a low probability...  just put all objects into it.

> 
> > Copy on write database objects, so developer doesn't need to
> > worry about corrupting other users of the data if they want/need
> > to modify an entity.
> 
> In some cases, you know beforehand that you will be modifying a
> generic value; in those cases, it makes no sense to store anything at
> all into the cache. This implies you'd want an api to support
> skipping cache storage, so we are right back to having a useCache flag
> in the api.

Yes, but the JVM will keep it in memory until the garbage collector decides to discard it.  The implementation,
using WeakReferences would make it available until the JVM decides it is not present anymore.

> 
> > Transactional support for the cache. (bug vs design choice in my
> > opinion).
> 
> The last time I looked for a transactional aware map(using jta), it
> didn't support generics. I found one that was close on jakarta.

We implemented our own using a Synchronization instance with the transaction manager, one for each open transaction.  Only really care about rollback or commit.  The implementation essentially guards the "committed" cache with a per-transaction cache.  New puts to the cache a put into the pre-transaction maps,  then posted to the real cache on commit.  On rollback, they are discarded.  Cache gets check the transaction cache and if nothing present, checks the commited cache.

Invalidations on the cache, occur on both the per-transaction cache and the committed cache.

In the end, not difficult, just needed a layer between the EntityCache and the UtilCache code.



> 
> > We've turned on entity caching by "default", and would like to
> > remove the useCache from the api. Don't mind returning a
> > non-caching delegator via a delegator decorator, that way, there
> > is no API difference.
> 
> Delegator proxies is something I'd like to support. However, because
> GenericEntity/GenericPK/GenericValue are classes, it makes it hard to
> proxy in all cases, so that getDelegator() returns the correct
> instance.

With our copy-on-write work, we've essentially made the GenericEntity an interface and GenericValue extends GenericEntityDecorator.  Might make it easier to break the delegator from the entity itself.

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adam Heath <do...@brainfood.com>.

On 12/06/2010 02:51 PM, Marc Morin wrote:
> We've found that our application is not useable (according to our
 > customers) without entity caching enabled.

There are simliar issues with the forced-creation of http sessions 
when doing read-only requests.  There are calls to 
request.getSession(), without passing false and checking for null.

> Like I emailed previously, we've made extensive improvements on the
 > entity cache to increase the hit rate (only store value once if fetched
 > by pkey and then in an entity condition) and lower memory consumption,
 > store objects in dictionary, so that singleton for each Object fetched
 > from db, String.intern() some Field values to make "PARTY".equals(..)
 > checks faster, etc....  All these changes are required for multi-tenant
 > systems where the is a rather high degree of redundant information in
 > each tenant instance. (think of enumerations, types, etc...)

Some pseudo-code for the features you suggest:

==
<extend-entity name="Person" absorb-condition-values="true"/>
if (modelEntity.getAbsorbConditionValues()) {
   for (GenericValue value: list) {
     cache.putToCache(value)
   }
}
==
<extend-entity name="Party">
   <extend-field name="partyTypeId" intern-value="true"/>
</extend-entity>
if (modelField.getInternValue()) {
   value = value.intern()
}
==

It would be very difficult to store the same FooType once for multiple 
tenants, as the timestamp fields might be different across 
databases(depending on when the last time seed install was run).  It 
might be possible to cache value instances, using a combined weak/soft 
value map.

> Copy on write database objects, so developer doesn't need to
 > worry about corrupting other users of the data if they want/need
 > to modify an entity.

In some cases, you know beforehand that you will be modifying a 
generic value; in those cases, it makes no sense to store anything at 
all into the cache.  This implies you'd want an api to support 
skipping cache storage, so we are right back to having a useCache flag 
in the api.

> Transactional support for the cache. (bug vs design choice in my
 > opinion).

The last time I looked for a transactional aware map(using jta), it 
didn't support generics.  I found one that was close on jakarta.

> We've turned on entity caching by "default", and would like to
 > remove the useCache from the api.  Don't mind returning a
 > non-caching delegator via a delegator decorator, that way, there
 > is no API difference.

Delegator proxies is something I'd like to support.  However, because 
GenericEntity/GenericPK/GenericValue are classes, it makes it hard to 
proxy in all cases, so that getDelegator() returns the correct instance.

The reason for this is that GenericValue extends GenericEntity.  You'd 
want a FooDelegatorGenericEntity, which would be extended by 
FooDelegatorGenericValue, but then the latter wouldn't be extending 
GenericValue nor GenericEntity, and the ABI would be broken.

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Marc Morin <ma...@emforium.com>.

We've found that our application is not useable (according to our customers) without entity caching enabled.

Like I emailed previously, we've made extensive improvements on the entity cache to increase the hit rate (only store value once if fetched by pkey and then in an entity condition) and lower memory consumption, store objects in dictionary, so that singleton for each Object fetched from db, String.intern() some Field values to make "PARTY".equals(..) checks faster, etc....  All these changes are required for multi-tenant systems where the is a rather high degree of redundant information in each tenant instance. (think of enumerations, types, etc...)

Copy on write database objects, so developer doesn't need to worry about corrupting other users of the data if they want/need to modify an entity.

Transactional support for the cache. (bug vs design choice in my opinion).

We've turned on entity caching by "default", and would like to remove the useCache from the api.  Don't mind returning a non-caching delegator via a delegator decorator, that way, there is no API difference.

The "don't need to cache" issue is handled by adding a WeakReference cache instance on top of SoftReference.  This makes it so that as long as the JVM has it in memory, why not re-use it?

I prefer to let the framework take care caching (and for how long, etc...) for me.

Marc

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adam Heath <do...@brainfood.com>.

Scott Gray wrote:
> On 6/12/2010, at 6:21 PM, Adam Heath wrote:
> 
>> Adrian Crum wrote:
>>> I agree with David. An example is the advice he gave me during the development of temporal expressions.
>>>
>>> I wanted to cache the temporal expression Java structures in memory. David suggested getting the persisted expressions from the entity cache instead. Thinking about it, that made sense - temporal expressions should not change frequently so they are good candidates for entity caching.
>> Storing them in the entity cache *is* storing them in memory.  Are you
>> iterating the entity values, building up some complex object?  If so,
>> then use Delegator.getCache().put(String, EntityCondition, String,
>> Object).  That allows you to add a complex built-up java object into
>> the same internal cache that is storing the list of cached results
>> from a condition lookup.
> 
> Thanks for pointing this out, I wasn't aware of it and I can think of a fair few places where something like this could be put to use.  Using it in place of some of the recent fixes to caching of freemarker template objects originating from Content records is the first that comes to mind.

I'm the one who wrote this feature years and years ago.

However, there is a race condition.  It's possible that between the
time you fetch the list of generic values, and get around to storing
the complex object graph, the original list of values has been cleared.

The fix is to add some kind of concurrent api, putIfAbsent or some
such.  I haven't thought in detail about the fix, I just know the
problem could happen in theory.

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Scott Gray <sc...@hotwaxmedia.com>.

On 6/12/2010, at 6:21 PM, Adam Heath wrote:

> Adrian Crum wrote:
>> I agree with David. An example is the advice he gave me during the development of temporal expressions.
>> 
>> I wanted to cache the temporal expression Java structures in memory. David suggested getting the persisted expressions from the entity cache instead. Thinking about it, that made sense - temporal expressions should not change frequently so they are good candidates for entity caching.
> 
> Storing them in the entity cache *is* storing them in memory.  Are you
> iterating the entity values, building up some complex object?  If so,
> then use Delegator.getCache().put(String, EntityCondition, String,
> Object).  That allows you to add a complex built-up java object into
> the same internal cache that is storing the list of cached results
> from a condition lookup.

Thanks for pointing this out, I wasn't aware of it and I can think of a fair few places where something like this could be put to use.  Using it in place of some of the recent fixes to caching of freemarker template objects originating from Content records is the first that comes to mind.

Regards
Scott

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adrian Crum <ad...@yahoo.com>.

--- On Sun, 12/5/10, Adam Heath <do...@brainfood.com> wrote:
> Adrian Crum wrote:
> > I agree with David. An example is the advice he gave
> me during the development of temporal expressions.
> > 
> > I wanted to cache the temporal expression Java
> structures in memory. David suggested getting the persisted
> expressions from the entity cache instead. Thinking about
> it, that made sense - temporal expressions should not change
> frequently so they are good candidates for entity caching.
> 
> Storing them in the entity cache *is* storing them in
> memory.  Are you
> iterating the entity values, building up some complex
> object?  If so,
> then use Delegator.getCache().put(String, EntityCondition,
> String,
> Object).  That allows you to add a complex built-up
> java object into
> the same internal cache that is storing the list of cached
> results
> from a condition lookup.
> 
> > Entity data that changes frequently should not be
> cached - because caching it leads to concurrency issues.
> 
> Well, duh.  This feature would be used to find values
> that are being
> looked up repeatedly, not changing, and not being
> cached.  How else
> would it be used?

I admit, I confused this thread with another and went off on a tangent. Some kind of performance metric attached to the entity caching mechanism would be worthwhile - so one could use the entity cache only where needed.

-Adrian

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adam Heath <do...@brainfood.com>.

Adrian Crum wrote:
> I agree with David. An example is the advice he gave me during the development of temporal expressions.
> 
> I wanted to cache the temporal expression Java structures in memory. David suggested getting the persisted expressions from the entity cache instead. Thinking about it, that made sense - temporal expressions should not change frequently so they are good candidates for entity caching.

Storing them in the entity cache *is* storing them in memory.  Are you
iterating the entity values, building up some complex object?  If so,
then use Delegator.getCache().put(String, EntityCondition, String,
Object).  That allows you to add a complex built-up java object into
the same internal cache that is storing the list of cached results
from a condition lookup.

> Entity data that changes frequently should not be cached - because caching it leads to concurrency issues.

Well, duh.  This feature would be used to find values that are being
looked up repeatedly, not changing, and not being cached.  How else
would it be used?

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by Adrian Crum <ad...@yahoo.com>.

I agree with David. An example is the advice he gave me during the development of temporal expressions.

I wanted to cache the temporal expression Java structures in memory. David suggested getting the persisted expressions from the entity cache instead. Thinking about it, that made sense - temporal expressions should not change frequently so they are good candidates for entity caching.

Entity data that changes frequently should not be cached - because caching it leads to concurrency issues.

-Adrian

--- On Sun, 12/5/10, David E Jones <de...@me.com> wrote:
> There are other reasons why you might not want to cache
> something. Quite a few of them, even without trying to be
> too creative.
> 
> For example: memory use. Why cache things like contact mech
> or order information that isn't likely to be read a very
> many times, but if cached would take up memory until expired
> and cleared (and would only be cleared if something is
> cleaning up expired entries) or until it makes it to the end
> of the LRU queue or whatever, and the whole time requiring
> additional resources to keep track of and making more
> legitimate caching less efficient.
> 
> Another example: freshness. If you want to be sure the data
> is good, get it from the db. If it won't cause big problems
> to have stale data, then the cache is okay.
> 
> Am I the only one who things caching everything by default
> (as Marc suggested) or popping out warning messages (as Adam
> suggests here) is a little crazy? I haven't seen anyone else
> speak up, so just wondering...
> 
> -David
> 
> 
> On Dec 5, 2010, at 7:45 PM, Adam Heath wrote:
> 
> > I've noticed several places in the code that issue a
> findOne(or other
> > find method), without using a cache, but then don't do
> any
> > modifications to the returned value.  Such call
> sites should really
> > use a caching variant of the delegator.
> > 
> > What I'd like to work on, is a feature, only used
> during development,
> > that would record any value looked up that wasn't
> cached, and wasn't
> > later modified, and print a warning, with an
> exception, to allow for a
> > caching variant to be added.  Would others find
> this useful?
> > 
> > Obviously, it would need to be transaction based, and
> would sometimes
> > have false-positives.  Additionally, if no
> transaction is active, then
> > that automatically means a caching variant should be
> used.
> 
>

Re: findFoo(cache=false), without modification, in a transaction, issue warning

Posted by David E Jones <de...@me.com>.

There are other reasons why you might not want to cache something. Quite a few of them, even without trying to be too creative.

For example: memory use. Why cache things like contact mech or order information that isn't likely to be read a very many times, but if cached would take up memory until expired and cleared (and would only be cleared if something is cleaning up expired entries) or until it makes it to the end of the LRU queue or whatever, and the whole time requiring additional resources to keep track of and making more legitimate caching less efficient.

Another example: freshness. If you want to be sure the data is good, get it from the db. If it won't cause big problems to have stale data, then the cache is okay.

Am I the only one who things caching everything by default (as Marc suggested) or popping out warning messages (as Adam suggests here) is a little crazy? I haven't seen anyone else speak up, so just wondering...

-David

On Dec 5, 2010, at 7:45 PM, Adam Heath wrote:

> I've noticed several places in the code that issue a findOne(or other
> find method), without using a cache, but then don't do any
> modifications to the returned value.  Such call sites should really
> use a caching variant of the delegator.
> 
> What I'd like to work on, is a feature, only used during development,
> that would record any value looked up that wasn't cached, and wasn't
> later modified, and print a warning, with an exception, to allow for a
> caching variant to be added.  Would others find this useful?
> 
> Obviously, it would need to be transaction based, and would sometimes
> have false-positives.  Additionally, if no transaction is active, then
> that automatically means a caching variant should be used.