You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Küng <to...@gmail.com> on 2010/07/31 15:30:33 UTC

Need fast ways to get Info once WC-NG is introduced

Hi,

I think I best first describe what I do in TSVN now:
TSVN has a cache of all working copy statuses which is used by the shell 
extension to show the icon overlays. It would be way too slow to fetch 
the status every time the shell requests the overlays, so that's why we 
have that cache.

The cache itself tries to do as little as possible while still keeping 
the status of each item up to date. It gets notified by the OS whenever 
a file is changed and decides then whether to re-fetch the status with 
the SVN API or not. But even calling the status API in those cases is 
too expensive and leads to way too heavy disk access. So the cache does 
a very quick check first: it reads the file time of the entries and 
props file inside the .svn folder - only if that time has changed it 
calls the svn status API. If it hasn't changed and there was no change 
notification for a file inside that folder, calling the API isn't necessary.

To clarify this a little bit, imagine the cache gets a change 
notification for all 'entries' files in a wc because someone did a 
commit or an update.
The problem is that the cache gets such notifications even if the file 
content hasn't changed, it's enough if a file was opened with write 
access - the notification is sent even if there was no actual write to 
the file.
So by checking the file dates of the entries/props files the cache 
determines whether a call to the svn API is needed or not for the 
subfolders.

Now, as far as I understand it, with WC-NG and the single db design, 
there are no files in each wc folder anymore which indicate whether 
something affecting the status has changed. There will only be one 
single db file for all folders of a wc.

So my first question is: is there a very quick way to find out whether 
something status related has changed since a specific time for a 
particular wc folder? I haven't found an API so far which I could use 
for this. It doesn't have to be reliable, i.e., all I need to know 
whether it *may* be that the status have changed, I don't really need to 
know whether it *really* has changed because once I get the 'maybe', I 
will call the status API and then get the definite answer.

Something else I use quite a lot in TSVN and especially the cache is a 
quick check whether a folder is versioned or not, simply by checking 
whether an .svn folder exists or not. Again here I only need to know 
whether it's *maybe* versioned. If there's no .svn folder, I *know* it's 
not versioned but if there is, I call the svn APIs and would get an 
error in return if e.g. the .svn folder is empty or corrupted.
But with the single db design, there won't be .svn folders anymore 
except for the root of the wc?
So is there an (almost as) fast way to check whether a folder is 
versioned or not?


Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: Need fast ways to get Info once WC-NG is introduced

Posted by Stefan Küng <to...@gmail.com>.
On 03.08.2010 12:42, Philip Martin wrote:
> Stefan Küng<to...@gmail.com>  writes:
>
>> On 02.08.2010 12:32, Bert Huijben wrote:
>>>> So is there an (almost as) fast way to check whether a folder is
>>>> versioned or not?
>>>
>>> I think the fastest way in the current code would be to call
>>> svn_wc_read_kind() on the directory, maybe after first checking that there
>>> is some .svn in at least one of the parent directories.
>>
>> I thought about implementing a small cache for that, so that I don't
>> have to walk up the tree every time to find an .svn dir.
>> But I thought I read something about such a small cache getting
>> implemented in the svn library itself so I wanted to ask first - maybe
>> there's already an API to use that cache. Or maybe I just remember it
>> wrong.
>
> Does TSVN cache/reuse svn_client_ctx_t handles?  In 1.7 the client
> context contains an opaque wc context which in turn includes a
> database context, svn_wc__db_t.  The database context caches sqlite
> connections and has a cache mapping directory->database.

No, the TSVN cache doesn't reuse those at the moment. I might change 
that though...

> Quite when TSVN should create/destroy svn_client_ctx_t is an
> interesting question.  Reusing a long-lived context (or perhaps a
> small number, one per-thread say) is likely to make individual svn
> calls faster.  However the open database handles means that Windows
> won't be able to delete root directories.  It's not clear to me how
> or when TSVN would close those handles.

That's why I currently don't reuse SVN pools and contexts. There is a 
mechanism in place which tells the cache to release all handles, but it 
isn't very reliable. I will have to test whether I can reuse the 
contexts or whether I have to recreate them.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: Need fast ways to get Info once WC-NG is introduced

Posted by Stefan Küng <to...@gmail.com>.
On 03.08.2010 12:42, Philip Martin wrote:
> Stefan Küng<to...@gmail.com>  writes:
>
>> On 02.08.2010 12:32, Bert Huijben wrote:
>>>> So is there an (almost as) fast way to check whether a folder is
>>>> versioned or not?
>>>
>>> I think the fastest way in the current code would be to call
>>> svn_wc_read_kind() on the directory, maybe after first checking that there
>>> is some .svn in at least one of the parent directories.
>>
>> I thought about implementing a small cache for that, so that I don't
>> have to walk up the tree every time to find an .svn dir.
>> But I thought I read something about such a small cache getting
>> implemented in the svn library itself so I wanted to ask first - maybe
>> there's already an API to use that cache. Or maybe I just remember it
>> wrong.
>
> Does TSVN cache/reuse svn_client_ctx_t handles?  In 1.7 the client
> context contains an opaque wc context which in turn includes a
> database context, svn_wc__db_t.  The database context caches sqlite
> connections and has a cache mapping directory->database.

No, the TSVN cache doesn't reuse those at the moment. I might change 
that though...

> Quite when TSVN should create/destroy svn_client_ctx_t is an
> interesting question.  Reusing a long-lived context (or perhaps a
> small number, one per-thread say) is likely to make individual svn
> calls faster.  However the open database handles means that Windows
> won't be able to delete root directories.  It's not clear to me how
> or when TSVN would close those handles.

That's why I currently don't reuse SVN pools and contexts. There is a 
mechanism in place which tells the cache to release all handles, but it 
isn't very reliable. I will have to test whether I can reuse the 
contexts or whether I have to recreate them.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: Need fast ways to get Info once WC-NG is introduced

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Küng <to...@gmail.com> writes:

> On 02.08.2010 12:32, Bert Huijben wrote:
>>> So is there an (almost as) fast way to check whether a folder is
>>> versioned or not?
>>
>> I think the fastest way in the current code would be to call
>> svn_wc_read_kind() on the directory, maybe after first checking that there
>> is some .svn in at least one of the parent directories.
>
> I thought about implementing a small cache for that, so that I don't
> have to walk up the tree every time to find an .svn dir.
> But I thought I read something about such a small cache getting
> implemented in the svn library itself so I wanted to ask first - maybe
> there's already an API to use that cache. Or maybe I just remember it
> wrong.

Does TSVN cache/reuse svn_client_ctx_t handles?  In 1.7 the client
context contains an opaque wc context which in turn includes a
database context, svn_wc__db_t.  The database context caches sqlite
connections and has a cache mapping directory->database.

Quite when TSVN should create/destroy svn_client_ctx_t is an
interesting question.  Reusing a long-lived context (or perhaps a
small number, one per-thread say) is likely to make individual svn
calls faster.  However the open database handles means that Windows
won't be able to delete root directories.  It's not clear to me how
or when TSVN would close those handles.

-- 
Philip

Re: Need fast ways to get Info once WC-NG is introduced

Posted by Stefan Küng <to...@gmail.com>.
On 03.08.2010 00:31, Talden wrote:
>>> Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
>>> defined in wc.h yet? (This enables the experimental single-db mode)
>>>
>>> It should give some impression on what you can expect with single-db. (I
>>> think the current status is about 40 testfailures (9 in the upgrade
>>> tests),
>>> but it almost reduces the testsuite time by 50% compared to multi-db)
>>
>> I don't like to build the TSVN nightlies with such experimental features
>> yet. Once the features get into trunk without compile switches, I will of
>> course start using them. But as long as they're not activated, I think I'll
>> stay away from those. Not just because they might be too unstable, but
>> mostly because that means the APIs still change a lot and that's just too
>> much work for me to adjust TSVN every time. There's enough work to be done
>> in TSVN itself :)
>>
>> Stefan
>
> That's a shame, those build of yours are handy - I've started some
> testing of 1.7 already using the nightly builds of
> svn/svnadmin/svnserve but I would love to test something closer to the
> end-game (particularly single db) even with known bugs.  I can
> understand though why you're not building these combinations for TSVN
> yet.
>
> I'm not aware of anyone else doing nightly win32 Subversion binaries
> so yours have been most helpful.
>
> I'm using builds from here:
>
>       http://nightlybuilds.tortoisesvn.net/latest/win32/full/

These are only built twice a week. If you want the builds that run every 
day, use the ones here too:
http://nightlybuilds.tortoisesvn.net/latest/win32/small/
Those are only missing the language packs.

Stefan


-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: Need fast ways to get Info once WC-NG is introduced

Posted by Talden <ta...@gmail.com>.
>> Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
>> defined in wc.h yet? (This enables the experimental single-db mode)
>>
>> It should give some impression on what you can expect with single-db. (I
>> think the current status is about 40 testfailures (9 in the upgrade
>> tests),
>> but it almost reduces the testsuite time by 50% compared to multi-db)
>
> I don't like to build the TSVN nightlies with such experimental features
> yet. Once the features get into trunk without compile switches, I will of
> course start using them. But as long as they're not activated, I think I'll
> stay away from those. Not just because they might be too unstable, but
> mostly because that means the APIs still change a lot and that's just too
> much work for me to adjust TSVN every time. There's enough work to be done
> in TSVN itself :)
>
> Stefan

That's a shame, those build of yours are handy - I've started some
testing of 1.7 already using the nightly builds of
svn/svnadmin/svnserve but I would love to test something closer to the
end-game (particularly single db) even with known bugs.  I can
understand though why you're not building these combinations for TSVN
yet.

I'm not aware of anyone else doing nightly win32 Subversion binaries
so yours have been most helpful.

I'm using builds from here:

     http://nightlybuilds.tortoisesvn.net/latest/win32/full/

--
Talden

RE: Need fast ways to get Info once WC-NG is introduced

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Stefan Küng [mailto:tortoisesvn@gmail.com]
> Sent: maandag 2 augustus 2010 21:52
> To: Bert Huijben
> Cc: 'Subversion Development'
> Subject: Re: Need fast ways to get Info once WC-NG is introduced
> 
> On 02.08.2010 12:32, Bert Huijben wrote:
> 
> > I don't think there is a specific per folder check like this, but
> retrieving
> > specific data about just one node (instead of its folder) will be
> *much*
> > faster than in the old entries store. With the entries files we had
> to read
> > the entire file in all cases, but a real database doesn't have that
> > limitation.
> >
> > For all metadata except for pristine files we only have to open one
> file and
> > sqlite just seeks to the right locations to fetch the data using its
> > indexes.
> >
> > For AnkhSVN I'm thinking about splitting the status cache in two
> layers,
> > instead of doing a 'svn status' per folder like we do in 1.6. (I
> think
> > TortoiseSVN might do the same thing, but maybe it calls status with
> depth
> > infinity)
> 
> Yes, TSVN does the same: one 'svn st' per folder with depth immediate.
> 
> > Getting information from the working copy per individual file will be
> so
> > much cheaper than before, that I will look for metadata changes first
> (and
> > cache only a fraction of the informational details I used to cache
> before)
> > and only when I really need to, I will perform the pristine file
> comparison.
> > (I don't know yet if I will use svn_(client|wc)_status for this or by
> just
> > calling svn_wc_text_modified_p2() myself).
> >
> > I would imagine that TortoiseSVN's folder glyph status would be
> calculated
> > much faster by using a similar strategy: First check if there is a
> metadata
> > change or conflict somewhere in the tree (keeping track of translated
> > filesize + filedate as these will be useful in the next step).
> > (This would be +- svn_client_infoX(). This should also inform you of
> any
> > property changes (I don't know if it already does that; but the
> information
> > in our internal API's is there now))
> > If there is such a status: just set the right glyph (early out; no
> need to
> > check any pristine files)
> 
> So basically use svn_client_info() instead of svn_client_status(), then
> only check the status for files that don't have a defined status yet
> from that info. That seems like a good idea - a lot of work to rewrite
> the existing code, but it should be worth it.
> 
> > And only if there isn't a status perform the
> svn_wc_text_modified_p2() calls
> > where needed.
> 
> Would this API get renamed to svn_client_*? Or should I risk calling an
> svn_wc_ API? It's still not clear whether the svn_wc_ APIs will get
> made
> private as was discussed before.

Personally I don't see a problem with calling a wc api for this task. (It
has the same version guarantees as the client apis: we can't break this
before 2.0). Of course we can also add a wrapper in the client layer, but in
this case that would be just a one-on-one wrapper. (You can get the wc_ctx
from the svn_client_ctx_t). But we would have to maintain both until 2.0.

If you know exactly what you need for your cache, I would prefer adding a
few helpers for that task in libsvn_client over adding exact copies of the
libsvn_wc apis.

Other system or applications integrations like SCPlugin, KSvn and AnkhSVN
will face the same issues and would want to use these same helper apis.

> I thought about implementing a small cache for that, so that I don't
> have to walk up the tree every time to find an .svn dir.
> But I thought I read something about such a small cache getting
> implemented in the svn library itself so I wanted to ask first - maybe
> there's already an API to use that cache. Or maybe I just remember it
> wrong.

Yes, the wc_db api has a cache for this, but it has two issues that would
make me avoid it in TortoiseSVN:
* It sees every directory below a working copy as part of the working copy.
  (So it is just like keeping a cache of the top level databases. Probably
not the answer you were looking for)
* And it keeps an sqlite database handle open for you.

If keeping the sqlite handle open is not an issue to you, I would recommend
keeping it open as long as possible. But with that handle open you can't
just delete a working copy by removing its files.
(I have some ideas on how we might fix that in a Windows specific way on
Vista+ using oplocks, but that will take quite some research and building
our own filesystem-layer for SQLite.)


	Bert


RE: Need fast ways to get Info once WC-NG is introduced

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Stefan Küng [mailto:tortoisesvn@gmail.com]
> Sent: maandag 2 augustus 2010 21:52
> To: Bert Huijben
> Cc: 'Subversion Development'
> Subject: Re: Need fast ways to get Info once WC-NG is introduced
> 
> On 02.08.2010 12:32, Bert Huijben wrote:
> 
> > I don't think there is a specific per folder check like this, but
> retrieving
> > specific data about just one node (instead of its folder) will be
> *much*
> > faster than in the old entries store. With the entries files we had
> to read
> > the entire file in all cases, but a real database doesn't have that
> > limitation.
> >
> > For all metadata except for pristine files we only have to open one
> file and
> > sqlite just seeks to the right locations to fetch the data using its
> > indexes.
> >
> > For AnkhSVN I'm thinking about splitting the status cache in two
> layers,
> > instead of doing a 'svn status' per folder like we do in 1.6. (I
> think
> > TortoiseSVN might do the same thing, but maybe it calls status with
> depth
> > infinity)
> 
> Yes, TSVN does the same: one 'svn st' per folder with depth immediate.
> 
> > Getting information from the working copy per individual file will be
> so
> > much cheaper than before, that I will look for metadata changes first
> (and
> > cache only a fraction of the informational details I used to cache
> before)
> > and only when I really need to, I will perform the pristine file
> comparison.
> > (I don't know yet if I will use svn_(client|wc)_status for this or by
> just
> > calling svn_wc_text_modified_p2() myself).
> >
> > I would imagine that TortoiseSVN's folder glyph status would be
> calculated
> > much faster by using a similar strategy: First check if there is a
> metadata
> > change or conflict somewhere in the tree (keeping track of translated
> > filesize + filedate as these will be useful in the next step).
> > (This would be +- svn_client_infoX(). This should also inform you of
> any
> > property changes (I don't know if it already does that; but the
> information
> > in our internal API's is there now))
> > If there is such a status: just set the right glyph (early out; no
> need to
> > check any pristine files)
> 
> So basically use svn_client_info() instead of svn_client_status(), then
> only check the status for files that don't have a defined status yet
> from that info. That seems like a good idea - a lot of work to rewrite
> the existing code, but it should be worth it.
> 
> > And only if there isn't a status perform the
> svn_wc_text_modified_p2() calls
> > where needed.
> 
> Would this API get renamed to svn_client_*? Or should I risk calling an
> svn_wc_ API? It's still not clear whether the svn_wc_ APIs will get
> made
> private as was discussed before.

Personally I don't see a problem with calling a wc api for this task. (It
has the same version guarantees as the client apis: we can't break this
before 2.0). Of course we can also add a wrapper in the client layer, but in
this case that would be just a one-on-one wrapper. (You can get the wc_ctx
from the svn_client_ctx_t). But we would have to maintain both until 2.0.

If you know exactly what you need for your cache, I would prefer adding a
few helpers for that task in libsvn_client over adding exact copies of the
libsvn_wc apis.

Other system or applications integrations like SCPlugin, KSvn and AnkhSVN
will face the same issues and would want to use these same helper apis.

> I thought about implementing a small cache for that, so that I don't
> have to walk up the tree every time to find an .svn dir.
> But I thought I read something about such a small cache getting
> implemented in the svn library itself so I wanted to ask first - maybe
> there's already an API to use that cache. Or maybe I just remember it
> wrong.

Yes, the wc_db api has a cache for this, but it has two issues that would
make me avoid it in TortoiseSVN:
* It sees every directory below a working copy as part of the working copy.
  (So it is just like keeping a cache of the top level databases. Probably
not the answer you were looking for)
* And it keeps an sqlite database handle open for you.

If keeping the sqlite handle open is not an issue to you, I would recommend
keeping it open as long as possible. But with that handle open you can't
just delete a working copy by removing its files.
(I have some ideas on how we might fix that in a Windows specific way on
Vista+ using oplocks, but that will take quite some research and building
our own filesystem-layer for SQLite.)


	Bert

Re: Need fast ways to get Info once WC-NG is introduced

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Küng <to...@gmail.com> writes:

> On 02.08.2010 12:32, Bert Huijben wrote:
>>> So is there an (almost as) fast way to check whether a folder is
>>> versioned or not?
>>
>> I think the fastest way in the current code would be to call
>> svn_wc_read_kind() on the directory, maybe after first checking that there
>> is some .svn in at least one of the parent directories.
>
> I thought about implementing a small cache for that, so that I don't
> have to walk up the tree every time to find an .svn dir.
> But I thought I read something about such a small cache getting
> implemented in the svn library itself so I wanted to ask first - maybe
> there's already an API to use that cache. Or maybe I just remember it
> wrong.

Does TSVN cache/reuse svn_client_ctx_t handles?  In 1.7 the client
context contains an opaque wc context which in turn includes a
database context, svn_wc__db_t.  The database context caches sqlite
connections and has a cache mapping directory->database.

Quite when TSVN should create/destroy svn_client_ctx_t is an
interesting question.  Reusing a long-lived context (or perhaps a
small number, one per-thread say) is likely to make individual svn
calls faster.  However the open database handles means that Windows
won't be able to delete root directories.  It's not clear to me how
or when TSVN would close those handles.

-- 
Philip

Re: Need fast ways to get Info once WC-NG is introduced

Posted by Stefan Küng <to...@gmail.com>.
On 02.08.2010 12:32, Bert Huijben wrote:

> I don't think there is a specific per folder check like this, but retrieving
> specific data about just one node (instead of its folder) will be *much*
> faster than in the old entries store. With the entries files we had to read
> the entire file in all cases, but a real database doesn't have that
> limitation.
>
> For all metadata except for pristine files we only have to open one file and
> sqlite just seeks to the right locations to fetch the data using its
> indexes.
>
> For AnkhSVN I'm thinking about splitting the status cache in two layers,
> instead of doing a 'svn status' per folder like we do in 1.6. (I think
> TortoiseSVN might do the same thing, but maybe it calls status with depth
> infinity)

Yes, TSVN does the same: one 'svn st' per folder with depth immediate.

> Getting information from the working copy per individual file will be so
> much cheaper than before, that I will look for metadata changes first (and
> cache only a fraction of the informational details I used to cache before)
> and only when I really need to, I will perform the pristine file comparison.
> (I don't know yet if I will use svn_(client|wc)_status for this or by just
> calling svn_wc_text_modified_p2() myself).
>
> I would imagine that TortoiseSVN's folder glyph status would be calculated
> much faster by using a similar strategy: First check if there is a metadata
> change or conflict somewhere in the tree (keeping track of translated
> filesize + filedate as these will be useful in the next step).
> (This would be +- svn_client_infoX(). This should also inform you of any
> property changes (I don't know if it already does that; but the information
> in our internal API's is there now))
> If there is such a status: just set the right glyph (early out; no need to
> check any pristine files)

So basically use svn_client_info() instead of svn_client_status(), then 
only check the status for files that don't have a defined status yet 
from that info. That seems like a good idea - a lot of work to rewrite 
the existing code, but it should be worth it.

> And only if there isn't a status perform the svn_wc_text_modified_p2() calls
> where needed.

Would this API get renamed to svn_client_*? Or should I risk calling an 
svn_wc_ API? It's still not clear whether the svn_wc_ APIs will get made 
private as was discussed before.

> Your disk cache (via its hook) knows which on-disk files changed since the
> last scan, so it can handle this much smarter than the simple algorithm in
> svn_(client|wc)_status, which is mostly optimized for running in a cold
> cache situation.
>
> Instead of just one timestamp to compare to, you have more information: the
> current on disk-time and the information that a file just changed. And only
> if the file was modified in the last run, or when it's time is different
> than the stored and your previous on-disk time you have to perform the
> check.
>
>
> I think this would require some redesign on your current cache strategy (It
> certainly does for AnkhSVN), but the fact that you can now perform status
> updates per file instead of per directory by itself should open room for
> performance improvement. (I hope to solve some worse scenarios in AnkhSVN on
> directories containing a lot of files with this)

I'll start with the design soon. This will take quite a while until it 
works properly...

>> Something else I use quite a lot in TSVN and especially the cache is a
>> quick check whether a folder is versioned or not, simply by checking
>> whether an .svn folder exists or not. Again here I only need to know
>> whether it's *maybe* versioned. If there's no .svn folder, I *know* it's
>> not versioned but if there is, I call the svn APIs and would get an
>> error in return if e.g. the .svn folder is empty or corrupted.
>> But with the single db design, there won't be .svn folders anymore
>> except for the root of the wc?
>> So is there an (almost as) fast way to check whether a folder is
>> versioned or not?
>
> I think the fastest way in the current code would be to call
> svn_wc_read_kind() on the directory, maybe after first checking that there
> is some .svn in at least one of the parent directories.

I thought about implementing a small cache for that, so that I don't 
have to walk up the tree every time to find an .svn dir.
But I thought I read something about such a small cache getting 
implemented in the svn library itself so I wanted to ask first - maybe 
there's already an API to use that cache. Or maybe I just remember it wrong.

>
> The effect on single-db would be: open sqlite file (if not cached) and query
> two rows by using its primary key, via an index.
> (I think that function currently does the same queries twice; but that is on
> my TODO list).
>
>
> Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
> defined in wc.h yet? (This enables the experimental single-db mode)
>
> It should give some impression on what you can expect with single-db. (I
> think the current status is about 40 testfailures (9 in the upgrade tests),
> but it almost reduces the testsuite time by 50% compared to multi-db)

I don't like to build the TSVN nightlies with such experimental features 
yet. Once the features get into trunk without compile switches, I will 
of course start using them. But as long as they're not activated, I 
think I'll stay away from those. Not just because they might be too 
unstable, but mostly because that means the APIs still change a lot and 
that's just too much work for me to adjust TSVN every time. There's 
enough work to be done in TSVN itself :)

Stefan


-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: Need fast ways to get Info once WC-NG is introduced

Posted by Stefan Küng <to...@gmail.com>.
On 02.08.2010 12:32, Bert Huijben wrote:

> I don't think there is a specific per folder check like this, but retrieving
> specific data about just one node (instead of its folder) will be *much*
> faster than in the old entries store. With the entries files we had to read
> the entire file in all cases, but a real database doesn't have that
> limitation.
>
> For all metadata except for pristine files we only have to open one file and
> sqlite just seeks to the right locations to fetch the data using its
> indexes.
>
> For AnkhSVN I'm thinking about splitting the status cache in two layers,
> instead of doing a 'svn status' per folder like we do in 1.6. (I think
> TortoiseSVN might do the same thing, but maybe it calls status with depth
> infinity)

Yes, TSVN does the same: one 'svn st' per folder with depth immediate.

> Getting information from the working copy per individual file will be so
> much cheaper than before, that I will look for metadata changes first (and
> cache only a fraction of the informational details I used to cache before)
> and only when I really need to, I will perform the pristine file comparison.
> (I don't know yet if I will use svn_(client|wc)_status for this or by just
> calling svn_wc_text_modified_p2() myself).
>
> I would imagine that TortoiseSVN's folder glyph status would be calculated
> much faster by using a similar strategy: First check if there is a metadata
> change or conflict somewhere in the tree (keeping track of translated
> filesize + filedate as these will be useful in the next step).
> (This would be +- svn_client_infoX(). This should also inform you of any
> property changes (I don't know if it already does that; but the information
> in our internal API's is there now))
> If there is such a status: just set the right glyph (early out; no need to
> check any pristine files)

So basically use svn_client_info() instead of svn_client_status(), then 
only check the status for files that don't have a defined status yet 
from that info. That seems like a good idea - a lot of work to rewrite 
the existing code, but it should be worth it.

> And only if there isn't a status perform the svn_wc_text_modified_p2() calls
> where needed.

Would this API get renamed to svn_client_*? Or should I risk calling an 
svn_wc_ API? It's still not clear whether the svn_wc_ APIs will get made 
private as was discussed before.

> Your disk cache (via its hook) knows which on-disk files changed since the
> last scan, so it can handle this much smarter than the simple algorithm in
> svn_(client|wc)_status, which is mostly optimized for running in a cold
> cache situation.
>
> Instead of just one timestamp to compare to, you have more information: the
> current on disk-time and the information that a file just changed. And only
> if the file was modified in the last run, or when it's time is different
> than the stored and your previous on-disk time you have to perform the
> check.
>
>
> I think this would require some redesign on your current cache strategy (It
> certainly does for AnkhSVN), but the fact that you can now perform status
> updates per file instead of per directory by itself should open room for
> performance improvement. (I hope to solve some worse scenarios in AnkhSVN on
> directories containing a lot of files with this)

I'll start with the design soon. This will take quite a while until it 
works properly...

>> Something else I use quite a lot in TSVN and especially the cache is a
>> quick check whether a folder is versioned or not, simply by checking
>> whether an .svn folder exists or not. Again here I only need to know
>> whether it's *maybe* versioned. If there's no .svn folder, I *know* it's
>> not versioned but if there is, I call the svn APIs and would get an
>> error in return if e.g. the .svn folder is empty or corrupted.
>> But with the single db design, there won't be .svn folders anymore
>> except for the root of the wc?
>> So is there an (almost as) fast way to check whether a folder is
>> versioned or not?
>
> I think the fastest way in the current code would be to call
> svn_wc_read_kind() on the directory, maybe after first checking that there
> is some .svn in at least one of the parent directories.

I thought about implementing a small cache for that, so that I don't 
have to walk up the tree every time to find an .svn dir.
But I thought I read something about such a small cache getting 
implemented in the svn library itself so I wanted to ask first - maybe 
there's already an API to use that cache. Or maybe I just remember it wrong.

>
> The effect on single-db would be: open sqlite file (if not cached) and query
> two rows by using its primary key, via an index.
> (I think that function currently does the same queries twice; but that is on
> my TODO list).
>
>
> Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
> defined in wc.h yet? (This enables the experimental single-db mode)
>
> It should give some impression on what you can expect with single-db. (I
> think the current status is about 40 testfailures (9 in the upgrade tests),
> but it almost reduces the testsuite time by 50% compared to multi-db)

I don't like to build the TSVN nightlies with such experimental features 
yet. Once the features get into trunk without compile switches, I will 
of course start using them. But as long as they're not activated, I 
think I'll stay away from those. Not just because they might be too 
unstable, but mostly because that means the APIs still change a lot and 
that's just too much work for me to adjust TSVN every time. There's 
enough work to be done in TSVN itself :)

Stefan


-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

RE: Need fast ways to get Info once WC-NG is introduced

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Stefan Küng [mailto:tortoisesvn@gmail.com]
> Sent: zaterdag 31 juli 2010 17:31
> To: Subversion Development
> Subject: Need fast ways to get Info once WC-NG is introduced
> 
> Hi,
> 
> I think I best first describe what I do in TSVN now:
> TSVN has a cache of all working copy statuses which is used by the shell
> extension to show the icon overlays. It would be way too slow to fetch
> the status every time the shell requests the overlays, so that's why we
> have that cache.
> 
> The cache itself tries to do as little as possible while still keeping
> the status of each item up to date. It gets notified by the OS whenever
> a file is changed and decides then whether to re-fetch the status with
> the SVN API or not. But even calling the status API in those cases is
> too expensive and leads to way too heavy disk access. So the cache does
> a very quick check first: it reads the file time of the entries and
> props file inside the .svn folder - only if that time has changed it
> calls the svn status API. If it hasn't changed and there was no change
> notification for a file inside that folder, calling the API isn't
necessary.
> 
> To clarify this a little bit, imagine the cache gets a change
> notification for all 'entries' files in a wc because someone did a
> commit or an update.
> The problem is that the cache gets such notifications even if the file
> content hasn't changed, it's enough if a file was opened with write
> access - the notification is sent even if there was no actual write to
> the file.
> So by checking the file dates of the entries/props files the cache
> determines whether a call to the svn API is needed or not for the
> subfolders.
> 
> Now, as far as I understand it, with WC-NG and the single db design,
> there are no files in each wc folder anymore which indicate whether
> something affecting the status has changed. There will only be one
> single db file for all folders of a wc.
> 
> So my first question is: is there a very quick way to find out whether
> something status related has changed since a specific time for a
> particular wc folder? I haven't found an API so far which I could use
> for this. It doesn't have to be reliable, i.e., all I need to know
> whether it *may* be that the status have changed, I don't really need to
> know whether it *really* has changed because once I get the 'maybe', I
> will call the status API and then get the definite answer.

I don't think there is a specific per folder check like this, but retrieving
specific data about just one node (instead of its folder) will be *much*
faster than in the old entries store. With the entries files we had to read
the entire file in all cases, but a real database doesn't have that
limitation.

For all metadata except for pristine files we only have to open one file and
sqlite just seeks to the right locations to fetch the data using its
indexes.

For AnkhSVN I'm thinking about splitting the status cache in two layers,
instead of doing a 'svn status' per folder like we do in 1.6. (I think
TortoiseSVN might do the same thing, but maybe it calls status with depth
infinity)

Getting information from the working copy per individual file will be so
much cheaper than before, that I will look for metadata changes first (and
cache only a fraction of the informational details I used to cache before)
and only when I really need to, I will perform the pristine file comparison.
(I don't know yet if I will use svn_(client|wc)_status for this or by just
calling svn_wc_text_modified_p2() myself).

I would imagine that TortoiseSVN's folder glyph status would be calculated
much faster by using a similar strategy: First check if there is a metadata
change or conflict somewhere in the tree (keeping track of translated
filesize + filedate as these will be useful in the next step). 
(This would be +- svn_client_infoX(). This should also inform you of any
property changes (I don't know if it already does that; but the information
in our internal API's is there now))
If there is such a status: just set the right glyph (early out; no need to
check any pristine files)

And only if there isn't a status perform the svn_wc_text_modified_p2() calls
where needed. 
Your disk cache (via its hook) knows which on-disk files changed since the
last scan, so it can handle this much smarter than the simple algorithm in
svn_(client|wc)_status, which is mostly optimized for running in a cold
cache situation.

Instead of just one timestamp to compare to, you have more information: the
current on disk-time and the information that a file just changed. And only
if the file was modified in the last run, or when it's time is different
than the stored and your previous on-disk time you have to perform the
check.


I think this would require some redesign on your current cache strategy (It
certainly does for AnkhSVN), but the fact that you can now perform status
updates per file instead of per directory by itself should open room for
performance improvement. (I hope to solve some worse scenarios in AnkhSVN on
directories containing a lot of files with this)

> Something else I use quite a lot in TSVN and especially the cache is a
> quick check whether a folder is versioned or not, simply by checking
> whether an .svn folder exists or not. Again here I only need to know
> whether it's *maybe* versioned. If there's no .svn folder, I *know* it's
> not versioned but if there is, I call the svn APIs and would get an
> error in return if e.g. the .svn folder is empty or corrupted.
> But with the single db design, there won't be .svn folders anymore
> except for the root of the wc?
> So is there an (almost as) fast way to check whether a folder is
> versioned or not?

I think the fastest way in the current code would be to call
svn_wc_read_kind() on the directory, maybe after first checking that there
is some .svn in at least one of the parent directories. 

The effect on single-db would be: open sqlite file (if not cached) and query
two rows by using its primary key, via an index.
(I think that function currently does the same queries twice; but that is on
my TODO list).


Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
defined in wc.h yet? (This enables the experimental single-db mode)

It should give some impression on what you can expect with single-db. (I
think the current status is about 40 testfailures (9 in the upgrade tests),
but it almost reduces the testsuite time by 50% compared to multi-db)

	Bert 

RE: Need fast ways to get Info once WC-NG is introduced

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Stefan Küng [mailto:tortoisesvn@gmail.com]
> Sent: zaterdag 31 juli 2010 17:31
> To: Subversion Development
> Subject: Need fast ways to get Info once WC-NG is introduced
> 
> Hi,
> 
> I think I best first describe what I do in TSVN now:
> TSVN has a cache of all working copy statuses which is used by the shell
> extension to show the icon overlays. It would be way too slow to fetch
> the status every time the shell requests the overlays, so that's why we
> have that cache.
> 
> The cache itself tries to do as little as possible while still keeping
> the status of each item up to date. It gets notified by the OS whenever
> a file is changed and decides then whether to re-fetch the status with
> the SVN API or not. But even calling the status API in those cases is
> too expensive and leads to way too heavy disk access. So the cache does
> a very quick check first: it reads the file time of the entries and
> props file inside the .svn folder - only if that time has changed it
> calls the svn status API. If it hasn't changed and there was no change
> notification for a file inside that folder, calling the API isn't
necessary.
> 
> To clarify this a little bit, imagine the cache gets a change
> notification for all 'entries' files in a wc because someone did a
> commit or an update.
> The problem is that the cache gets such notifications even if the file
> content hasn't changed, it's enough if a file was opened with write
> access - the notification is sent even if there was no actual write to
> the file.
> So by checking the file dates of the entries/props files the cache
> determines whether a call to the svn API is needed or not for the
> subfolders.
> 
> Now, as far as I understand it, with WC-NG and the single db design,
> there are no files in each wc folder anymore which indicate whether
> something affecting the status has changed. There will only be one
> single db file for all folders of a wc.
> 
> So my first question is: is there a very quick way to find out whether
> something status related has changed since a specific time for a
> particular wc folder? I haven't found an API so far which I could use
> for this. It doesn't have to be reliable, i.e., all I need to know
> whether it *may* be that the status have changed, I don't really need to
> know whether it *really* has changed because once I get the 'maybe', I
> will call the status API and then get the definite answer.

I don't think there is a specific per folder check like this, but retrieving
specific data about just one node (instead of its folder) will be *much*
faster than in the old entries store. With the entries files we had to read
the entire file in all cases, but a real database doesn't have that
limitation.

For all metadata except for pristine files we only have to open one file and
sqlite just seeks to the right locations to fetch the data using its
indexes.

For AnkhSVN I'm thinking about splitting the status cache in two layers,
instead of doing a 'svn status' per folder like we do in 1.6. (I think
TortoiseSVN might do the same thing, but maybe it calls status with depth
infinity)

Getting information from the working copy per individual file will be so
much cheaper than before, that I will look for metadata changes first (and
cache only a fraction of the informational details I used to cache before)
and only when I really need to, I will perform the pristine file comparison.
(I don't know yet if I will use svn_(client|wc)_status for this or by just
calling svn_wc_text_modified_p2() myself).

I would imagine that TortoiseSVN's folder glyph status would be calculated
much faster by using a similar strategy: First check if there is a metadata
change or conflict somewhere in the tree (keeping track of translated
filesize + filedate as these will be useful in the next step). 
(This would be +- svn_client_infoX(). This should also inform you of any
property changes (I don't know if it already does that; but the information
in our internal API's is there now))
If there is such a status: just set the right glyph (early out; no need to
check any pristine files)

And only if there isn't a status perform the svn_wc_text_modified_p2() calls
where needed. 
Your disk cache (via its hook) knows which on-disk files changed since the
last scan, so it can handle this much smarter than the simple algorithm in
svn_(client|wc)_status, which is mostly optimized for running in a cold
cache situation.

Instead of just one timestamp to compare to, you have more information: the
current on disk-time and the information that a file just changed. And only
if the file was modified in the last run, or when it's time is different
than the stored and your previous on-disk time you have to perform the
check.


I think this would require some redesign on your current cache strategy (It
certainly does for AnkhSVN), but the fact that you can now perform status
updates per file instead of per directory by itself should open room for
performance improvement. (I hope to solve some worse scenarios in AnkhSVN on
directories containing a lot of files with this)

> Something else I use quite a lot in TSVN and especially the cache is a
> quick check whether a folder is versioned or not, simply by checking
> whether an .svn folder exists or not. Again here I only need to know
> whether it's *maybe* versioned. If there's no .svn folder, I *know* it's
> not versioned but if there is, I call the svn APIs and would get an
> error in return if e.g. the .svn folder is empty or corrupted.
> But with the single db design, there won't be .svn folders anymore
> except for the root of the wc?
> So is there an (almost as) fast way to check whether a folder is
> versioned or not?

I think the fastest way in the current code would be to call
svn_wc_read_kind() on the directory, maybe after first checking that there
is some .svn in at least one of the parent directories. 

The effect on single-db would be: open sqlite file (if not cached) and query
two rows by using its primary key, via an index.
(I think that function currently does the same queries twice; but that is on
my TODO list).


Did you try compiling Subversion with the SVN_WC__SINGLE_DB and SINGLE_DB
defined in wc.h yet? (This enables the experimental single-db mode)

It should give some impression on what you can expect with single-db. (I
think the current status is about 40 testfailures (9 in the upgrade tests),
but it almost reduces the testsuite time by 50% compared to multi-db)

	Bert