You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2016/02/03 02:38:50 UTC

Re: Making FS and repos layer log API streamy

Very cool work!

On Sun, Jan 31, 2016 at 5:03 AM, Stefan Fuhrmann <st...@apache.org> wrote:

> Hi there,
>
> When the server needs to transmit the list of changed paths
> in a revision, its memory usage is O(#changes), i.e. practically
> unbound. The problems are:
>
> * FS and repos API require a full collection of all changes
>   Most consumers simply scan that data once. So, they can
>   just as well work on a one change at a time basis as a
>   callback.
>
> * The data types are different, so we end up with two copies.
>   A revised svn_fs_path_change_t with identical
>   svn_repo_path_change3_t can fix this. Minor adjustments
>   to the element data types further improve efficiency.
>
> I've played with a revised version of svn_fs_paths_changed
> and svn_repos_get_logs to use callbacks for the individual
> path changes. Effectively, the FS layer can now pump the
> info through the repos / authz filter into the RA layer.
>
> I'll commit the changes to /trunk over the next couple of weeks,
> mostly on weekends. There will be two-way conversion between
> old and new APIs to facilitate regression testing. The old API will
> get deprecated once all users have migrated.
>
> -- Stefan^2.
>

Re: Making FS and repos layer log API streamy

Posted by Greg Stein <gs...@gmail.com>.
On Wed, Feb 3, 2016 at 6:36 AM, Ivan Zhakov <iv...@visualsvn.com> wrote:
>...

> Also I think we should not use callbacks to deliver data from the FS
> layer. Currently FS API is passive and I think it should remain the
> same: FS API users may invoke FS function from callback and this will
> require FS implementation to be fully reentrant (to avoid problems
> like fixed in r1718167 [1])
>

Well, a callback-based interface can snapshot the state and use that as the
basis for each iteration of the callback.

An iteration interface may need to load/unpack on each invocation, and may
need to re-establish the snapshot established at the beginning of the
iteration.

I don't believe there is clearly-favored solution here. This new API solves
a clear problem with our existing API, so I say +1 for it. Sure, maybe
there are other approaches that might also work. I'll +1 that work when it
arrives too :-)

One thing that I found during some of the Ev2 work, and the ra_serf work,
is that push/pull models of control flow really affect our API and our
clients. Ideally, we'd have both, though I hate the additional complexity.
But providing clients the choice of push vs pull can be a huge win.

Cheers,
-g

Re: Making FS and repos layer log API streamy

Posted by Stefan Fuhrmann <st...@apache.org>.
On 03.02.2016 13:36, Ivan Zhakov wrote:
> On 31 January 2016 at 14:03, Stefan Fuhrmann <st...@apache.org> wrote:
>> Hi there,
>>
>> When the server needs to transmit the list of changed paths
>> in a revision, its memory usage is O(#changes), i.e. practically
>> unbound. The problems are:
>>
>> * FS and repos API require a full collection of all changes
>>    Most consumers simply scan that data once. So, they can
>>    just as well work on a one change at a time basis as a
>>    callback.
>>
>> * The data types are different, so we end up with two copies.
>>    A revised svn_fs_path_change_t with identical
>>    svn_repo_path_change3_t can fix this. Minor adjustments
>>    to the element data types further improve efficiency.
>>
>> I've played with a revised version of svn_fs_paths_changed
>> and svn_repos_get_logs to use callbacks for the individual
>> path changes. Effectively, the FS layer can now pump the
>> info through the repos / authz filter into the RA layer.
>
> Here is some feedback.
>
> Making FS API streamy is a good goal. But as far I remember repos
> layer still has to store all changed path in hashtable for
> authorization purposes.

No, it does not.  It filters path by path
(simply dropping the changes that the user
shall not see) and collects two global flags
indicating whether paths had to be hidden
for the current revision.

> Also I think we should not use callbacks to deliver data from the FS
> layer. Currently FS API is passive and I think it should remain the
> same: FS API users may invoke FS function from callback and this will
> require FS implementation to be fully reentrant (to avoid problems
> like fixed in r1718167 [1])

Good point.  Support for reentrace is a must.
It can be achieved with iterators and callbacks
alike.

> Instead of callbacks we should create svn_fs_changed_paths_t and some
> kind of iterator of it. As a nice benefit FS API user may request only
> interesting information by providing NULL for some arguments (like
> WC-NG API).

I will change the FS layer code to an iterator
because that makes the API easier to use with
our current repos layer code.  OTOH, the code
on the FS side gets a little more convoluted.

However, to ensure proper authz filtering, the
repos layer needs to present the changed paths
list through a callback.  It enables the repos
layer to guarantee that the changes get processed
before the revprops.

> I also noticed that svn_fs_nodeid_t member is removed from svn_fs_path_change3_t
>   in r1727822. Are you sure about this change?

Absolutely.  The nodeid is pointless and a bad
API in general.

> Some FSFS bugs are not a
> reason to drop this kind of information.

No FSFS user was ever able to use that information
outside a transaction.  Not providing it won't
break peoples application logic.

> We should fix nodeid in
> changed path and maybe eventually replace it with svn_fs_node_t.

We cannot deliver information that we don't have.
All existing FSFS repositories are "corrupted" in
that sense.  So, we will need to query that data
on demand.

With svn_fs_node_t in place, optionally providing
the affected node is certainly possible.  That
could be a convenience but it will most likely
not aid with application performance, even if the
repository info could be used directly.

-- Stefan^2.

Re: Making FS and repos layer log API streamy

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 31 January 2016 at 14:03, Stefan Fuhrmann <st...@apache.org> wrote:
> Hi there,
>
> When the server needs to transmit the list of changed paths
> in a revision, its memory usage is O(#changes), i.e. practically
> unbound. The problems are:
>
> * FS and repos API require a full collection of all changes
>   Most consumers simply scan that data once. So, they can
>   just as well work on a one change at a time basis as a
>   callback.
>
> * The data types are different, so we end up with two copies.
>   A revised svn_fs_path_change_t with identical
>   svn_repo_path_change3_t can fix this. Minor adjustments
>   to the element data types further improve efficiency.
>
> I've played with a revised version of svn_fs_paths_changed
> and svn_repos_get_logs to use callbacks for the individual
> path changes. Effectively, the FS layer can now pump the
> info through the repos / authz filter into the RA layer.

Here is some feedback.

Making FS API streamy is a good goal. But as far I remember repos
layer still has to store all changed path in hashtable for
authorization purposes.

Also I think we should not use callbacks to deliver data from the FS
layer. Currently FS API is passive and I think it should remain the
same: FS API users may invoke FS function from callback and this will
require FS implementation to be fully reentrant (to avoid problems
like fixed in r1718167 [1])

Instead of callbacks we should create svn_fs_changed_paths_t and some
kind of iterator of it. As a nice benefit FS API user may request only
interesting information by providing NULL for some arguments (like
WC-NG API).

I also noticed that svn_fs_nodeid_t member is removed from svn_fs_path_change3_t
 in r1727822. Are you sure about this change? Some FSFS bugs are not a
reason to drop this kind of information. We should fix nodeid in
changed path and maybe eventually replace it with svn_fs_node_t.

[1] http://svn.haxx.se/dev/archive-2015-12/0006.shtml

-- 
Ivan Zhakov