You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Hyrum K. Wright" <hy...@hyrumwright.org> on 2009/07/31 02:48:06 UTC

Remove entries cache in adm_access

Greg,
The doc string for svn_wc_entries() reads thusly:

/** Parse the `entries' file for @a adm_access and return a hash @a  
entries,
  * whose keys are (<tt>const char *</tt>) entry names and values are
  * (<tt>svn_wc_entry_t *</tt>).  The hash @a entries, and its keys and
  * values, are allocated from the pool used to open the @a adm_access
  * baton (that's how the entries caching works).  @a pool is used for
  * transient allocations.
...

We've discussed before that removing the entries cache in adm_access  
will lead to memory explosion, as the entries are re-read from disk  
into the access pool, without ever freeing memory used for stale  
entries hashes.  (Well, it will lead to *faster* memory explosion,  
since we still flush the cache under certain circumstances.)

However, as we start touching the sqlite db directly, the cache  
becomes stale quite quickly, since the entries hash isn't modified at  
the same time the the db is, has happens with  
svn_wc__entries_modify(). Since we're not using adm_access anymore,  
we've no way of knowing when to flush the cache, which means we'll  
need to flush it aggressively--so aggressively that it probably isn't  
worth keeping the cache around.

As for API consumers which are using entries and now have sub-optimal  
memory consumption, I say "tough rocks, upgrade to the new APIs".   
That might be harsh, but I think it's the reality as keeping the cache  
in sync is much more trouble than it's worth.

Thoughts?

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2377195

Re: Remove entries cache in adm_access

Posted by "Hyrum K. Wright" <hy...@hyrumwright.org>.
On Jul 30, 2009, at 10:05 PM, Arfrever Frehtes Taifersar Arahesis wrote:

> 2009-07-31 04:48:06 Hyrum K. Wright napisał(a):
>> Greg,
>> The doc string for svn_wc_entries() reads thusly:
>>
>> /** Parse the `entries' file for @a adm_access and return a hash @a
>> entries,
>>  * whose keys are (<tt>const char *</tt>) entry names and values are
>>  * (<tt>svn_wc_entry_t *</tt>).  The hash @a entries, and its keys  
>> and
>>  * values, are allocated from the pool used to open the @a adm_access
>>  * baton (that's how the entries caching works).  @a pool is used for
>>  * transient allocations.
>> ...
>>
>> We've discussed before that removing the entries cache in adm_access
>> will lead to memory explosion, as the entries are re-read from disk
>> into the access pool, without ever freeing memory used for stale
>> entries hashes.  (Well, it will lead to *faster* memory explosion,
>> since we still flush the cache under certain circumstances.)
>>
>> However, as we start touching the sqlite db directly, the cache
>> becomes stale quite quickly, since the entries hash isn't modified at
>> the same time the the db is, has happens with
>> svn_wc__entries_modify(). Since we're not using adm_access anymore,
>> we've no way of knowing when to flush the cache, which means we'll
>> need to flush it aggressively--so aggressively that it probably isn't
>> worth keeping the cache around.
>>
>> As for API consumers which are using entries and now have sub-optimal
>> memory consumption, I say "tough rocks, upgrade to the new APIs".
>> That might be harsh, but I think it's the reality as keeping the  
>> cache
>> in sync is much more trouble than it's worth.
>>
>> Thoughts?
>
> If you update API consumers in our bindings, then +1.

People who care about the bindings are welcome to ensure that they are  
updated.  (That group of people does not directly include me, fwiw.)

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2377199


Re: Remove entries cache in adm_access

Posted by Arfrever Frehtes Taifersar Arahesis <Ar...@GMail.Com>.
2009-07-31 04:48:06 Hyrum K. Wright napisał(a):
> Greg,
> The doc string for svn_wc_entries() reads thusly:
> 
> /** Parse the `entries' file for @a adm_access and return a hash @a  
> entries,
>   * whose keys are (<tt>const char *</tt>) entry names and values are
>   * (<tt>svn_wc_entry_t *</tt>).  The hash @a entries, and its keys and
>   * values, are allocated from the pool used to open the @a adm_access
>   * baton (that's how the entries caching works).  @a pool is used for
>   * transient allocations.
> ...
> 
> We've discussed before that removing the entries cache in adm_access  
> will lead to memory explosion, as the entries are re-read from disk  
> into the access pool, without ever freeing memory used for stale  
> entries hashes.  (Well, it will lead to *faster* memory explosion,  
> since we still flush the cache under certain circumstances.)
> 
> However, as we start touching the sqlite db directly, the cache  
> becomes stale quite quickly, since the entries hash isn't modified at  
> the same time the the db is, has happens with  
> svn_wc__entries_modify(). Since we're not using adm_access anymore,  
> we've no way of knowing when to flush the cache, which means we'll  
> need to flush it aggressively--so aggressively that it probably isn't  
> worth keeping the cache around.
> 
> As for API consumers which are using entries and now have sub-optimal  
> memory consumption, I say "tough rocks, upgrade to the new APIs".   
> That might be harsh, but I think it's the reality as keeping the cache  
> in sync is much more trouble than it's worth.
> 
> Thoughts?

If you update API consumers in our bindings, then +1.

-- 
Arfrever Frehtes Taifersar Arahesis

Re: Remove entries cache in adm_access

Posted by Philip Martin <ph...@codematters.co.uk>.
"Hyrum K. Wright" <hy...@hyrumwright.org> writes:

> As for API consumers which are using entries and now have sub-optimal  
> memory consumption, I say "tough rocks, upgrade to the new APIs".   
> That might be harsh, but I think it's the reality as keeping the cache  
> in sync is much more trouble than it's worth.

How bad is "sub-optimal"?  If it's anything like the pre-1.0 clients
before entries caching then it's not just sub-optimal is unusable on
large working copies.  I don't think "upgrade to the new APIs" is
compatible with our versioning goals.  Our stated policy
http://subversion.tigris.org/hacking.html#release-numbering
has always been that old APIs will continue to work.  If you want
to write 2.0 you don't get to call it 1.7.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2377412

Re: Remove entries cache in adm_access

Posted by "Hyrum K. Wright" <hy...@hyrumwright.org>.
On Jul 31, 2009, at 1:49 AM, Greg Stein wrote:

> Any write operation within wc_db.c should be calling  
> wc_db.c::flush_entries().
>
> As long as you don't mix/match usage style (wc_db vs entries), then
> that cached entries will remain NULL, or will contain "current"
> properly.

Except for right now in the dev process, where we are intentionally  
mixing and matching as we move to wc_db.

> ISTR that you disabled the cache once, and the cmdline performed
> *very* unacceptably w.r.t memory. We could expect the same thing to
> happen with any old-API customer. I don't think we can allow ourselves
> to do that. (granted that there are lots of APIs that are access/path
> pairs, and those will not populate the entries cache, even for old API
> users)
>
> Re: the bindings. IMO, you shouldn't be so quick to dismiss them.
> They're part of the product. That said, you aren't set up to
> build/test Windows, and bindings may be the same for you, but I think
> you ought to treat both as "I'll try not to break it, and will support
> those who *can* build/test them". (maybe that's what you said, but it
> came off much more dismissively...)

Yeah, I could have been a bit less terse.

I'm just tired of hearing "the binding are broken!  the bindings are  
broken!" but nobody actually doing anything about it.  I don't use the  
bindings; I'm not familiar with developing them or keeping them up-to- 
date.  I try not to break the bindings, and I'm happy to answer  
questions when people are working on them.  But, I don't see why  
development in our core libs should be hampered by a lack of developer  
efforts on the bindings.

Now, all that being said, I'm reminded why keeping the cache is a Good  
Thing, so I'll not bother removing it, just flushing it whenever we do  
a write in wc_db.

-Hyrum

> On Fri, Jul 31, 2009 at 04:48, Hyrum K.
> Wright<hy...@mail.utexas.edu> wrote:
>> Greg,
>> The doc string for svn_wc_entries() reads thusly:
>>
>> /** Parse the `entries' file for @a adm_access and return a hash @a  
>> entries,
>>  * whose keys are (<tt>const char *</tt>) entry names and values are
>>  * (<tt>svn_wc_entry_t *</tt>).  The hash @a entries, and its keys  
>> and
>>  * values, are allocated from the pool used to open the @a adm_access
>>  * baton (that's how the entries caching works).  @a pool is used for
>>  * transient allocations.
>> ...
>>
>> We've discussed before that removing the entries cache in  
>> adm_access will
>> lead to memory explosion, as the entries are re-read from disk into  
>> the
>> access pool, without ever freeing memory used for stale entries  
>> hashes.
>>  (Well, it will lead to *faster* memory explosion, since we still  
>> flush the
>> cache under certain circumstances.)
>>
>> However, as we start touching the sqlite db directly, the cache  
>> becomes
>> stale quite quickly, since the entries hash isn't modified at the  
>> same time
>> the the db is, has happens with svn_wc__entries_modify(). Since  
>> we're not
>> using adm_access anymore, we've no way of knowing when to flush the  
>> cache,
>> which means we'll need to flush it aggressively--so aggressively  
>> that it
>> probably isn't worth keeping the cache around.
>>
>> As for API consumers which are using entries and now have sub- 
>> optimal memory
>> consumption, I say "tough rocks, upgrade to the new APIs".  That  
>> might be
>> harsh, but I think it's the reality as keeping the cache in sync is  
>> much
>> more trouble than it's worth.
>>
>> Thoughts?
>>
>> -Hyrum
>>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2377447

Re: Remove entries cache in adm_access

Posted by Greg Stein <gs...@gmail.com>.
Any write operation within wc_db.c should be calling wc_db.c::flush_entries().

As long as you don't mix/match usage style (wc_db vs entries), then
that cached entries will remain NULL, or will contain "current"
properly.

ISTR that you disabled the cache once, and the cmdline performed
*very* unacceptably w.r.t memory. We could expect the same thing to
happen with any old-API customer. I don't think we can allow ourselves
to do that. (granted that there are lots of APIs that are access/path
pairs, and those will not populate the entries cache, even for old API
users)

Re: the bindings. IMO, you shouldn't be so quick to dismiss them.
They're part of the product. That said, you aren't set up to
build/test Windows, and bindings may be the same for you, but I think
you ought to treat both as "I'll try not to break it, and will support
those who *can* build/test them". (maybe that's what you said, but it
came off much more dismissively...)

Cheers,
-g

On Fri, Jul 31, 2009 at 04:48, Hyrum K.
Wright<hy...@mail.utexas.edu> wrote:
> Greg,
> The doc string for svn_wc_entries() reads thusly:
>
> /** Parse the `entries' file for @a adm_access and return a hash @a entries,
>  * whose keys are (<tt>const char *</tt>) entry names and values are
>  * (<tt>svn_wc_entry_t *</tt>).  The hash @a entries, and its keys and
>  * values, are allocated from the pool used to open the @a adm_access
>  * baton (that's how the entries caching works).  @a pool is used for
>  * transient allocations.
> ...
>
> We've discussed before that removing the entries cache in adm_access will
> lead to memory explosion, as the entries are re-read from disk into the
> access pool, without ever freeing memory used for stale entries hashes.
>  (Well, it will lead to *faster* memory explosion, since we still flush the
> cache under certain circumstances.)
>
> However, as we start touching the sqlite db directly, the cache becomes
> stale quite quickly, since the entries hash isn't modified at the same time
> the the db is, has happens with svn_wc__entries_modify(). Since we're not
> using adm_access anymore, we've no way of knowing when to flush the cache,
> which means we'll need to flush it aggressively--so aggressively that it
> probably isn't worth keeping the cache around.
>
> As for API consumers which are using entries and now have sub-optimal memory
> consumption, I say "tough rocks, upgrade to the new APIs".  That might be
> harsh, but I think it's the reality as keeping the cache in sync is much
> more trouble than it's worth.
>
> Thoughts?
>
> -Hyrum
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2377267