You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2015/09/28 12:16:13 UTC

svn commit: r1705646 - /subversion/trunk/subversion/libsvn_fs_fs/pack.c

Author: stefan2
Date: Mon Sep 28 10:16:12 2015
New Revision: 1705646

URL: http://svn.apache.org/viewvc?rev=1705646&view=rev
Log:
Tune reorg strategy during the FSFS format7 packs such that it favors
checkout-style tree walks now.

Since r1703237, following the log history no longer requires frequent
access to directory data but mainly relies on noderev predecessor chain.
Therefore, it is no longer necessary to tightly pack directories in a
separate part of the pack file.  With this patch, they are now placed
with the file contents and can be processed by a quasi-linear scan
instead of reading from two sections per pack.

* subversion/libsvn_fs_fs/pack.c
  (compare_dir_entries_format7): Adapt reporting strategy - process dirs
                                 at the same time as files now.
  (compare_is_dir): No longer needed.
  (sort_reps): No longer distinguish between file and dir reps but only
               paths and delta chains when determining reprentation order.

Modified:
    subversion/trunk/subversion/libsvn_fs_fs/pack.c

Modified: subversion/trunk/subversion/libsvn_fs_fs/pack.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/pack.c?rev=1705646&r1=1705645&r2=1705646&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/pack.c (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/pack.c Mon Sep 28 10:16:12 2015
@@ -610,9 +610,6 @@ compare_dir_entries_format7(const svn_so
   const svn_fs_dirent_t *lhs = (const svn_fs_dirent_t *) a->value;
   const svn_fs_dirent_t *rhs = (const svn_fs_dirent_t *) b->value;
 
-  if (lhs->kind != rhs->kind)
-    return lhs->kind == svn_node_dir ? -1 : 1;
-
   return strcmp(lhs->name, rhs->name);
 }
 
@@ -810,15 +807,6 @@ compare_ref_to_item(const reference_t *
   return svn_fs_fs__id_part_compare(&(*lhs_p)->from, rhs_p);
 }
 
-/* implements compare_fn_t.  Finds the DIR / FILE boundary.
- */
-static int
-compare_is_dir(const path_order_t * const * lhs_p,
-               const void *unused)
-{
-  return (*lhs_p)->is_dir ? -1 : 0;
-}
-
 /* Look for the least significant bit set in VALUE and return the smallest
  * number with the same property, i.e. the largest power of 2 that is a
  * factor in VALUE. */
@@ -966,7 +954,7 @@ sort_reps(pack_context_t *context)
 {
   apr_pool_t *temp_pool;
   const path_order_t **temp, **path_order;
-  int i, count, dir_count;
+  int i, count;
 
   /* We will later assume that there is at least one node / path.
    */
@@ -991,13 +979,8 @@ sort_reps(pack_context_t *context)
   temp = apr_pcalloc(temp_pool, count * sizeof(*temp));
   path_order = (void *)context->path_order->elts;
 
-  /* Find the boundary between DIR and FILE section. */
-  dir_count = svn_sort__bsearch_lower_bound(context->path_order, NULL,
-                     (int (*)(const void *, const void *))compare_is_dir);
-
   /* Sort those sub-sections separately. */
-  sort_reps_range(context, path_order, temp, 0, dir_count);
-  sort_reps_range(context, path_order, temp, dir_count, count);
+  sort_reps_range(context, path_order, temp, 0, count);
 
   /* We now know the final ordering. */
   for (i = 0; i < count; ++i)



Re: svn commit: r1705646 - /subversion/trunk/subversion/libsvn_fs_fs/pack.c

Posted by Stefan Fuhrmann <st...@apache.org>.
On 28.01.2016 01:10, Johan Corveleyn wrote:
> On Wed, Jan 27, 2016 at 1:54 PM, Stefan Fuhrmann <eq...@web.de> wrote:
>> On 27.01.2016 13:08, Johan Corveleyn wrote:
>>>
>>> Does this require a format bump (and another dump/load)? Or do you
>>> plan on providing a way to "repack" (or simply unpack and pack) a
>>> packed repository? Currently there is no unpack or repack command ...
>>
>> This is no format change; the beauty of format 7 is
>> that we can learn and tweak strategies as we go without
>> a format bump.  OTOH, there is not much potential for
>> further improvement right now.  FSFS is limited by its
>> fixed sharing scheme.
>>
>> I have no *plans* for unpack/repack but the thought
>> has occurred to me.  It should not be hard to do as
>> an offline operation, maybe alongside a reshard command.
>> If you want to write an 'svnadmin unpack', I'd support
>> you in that effort.
> Interesting :-). I've put it on my radar, but not sure if I'll ever
> get to it. Life keeps getting in the way ;-).
>
> Would be a nice bite-sized task though ... if someone else wants to
> have a go, feel free. Maybe it's something to put on the
> http://subversion.apache.org/ideas.html page?
>
Good idea. Done in r1727706.

-- Stefan^2.

Re: svn commit: r1705646 - /subversion/trunk/subversion/libsvn_fs_fs/pack.c

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Jan 27, 2016 at 1:54 PM, Stefan Fuhrmann <eq...@web.de> wrote:
> On 27.01.2016 13:08, Johan Corveleyn wrote:
>>
>> On Mon, Sep 28, 2015 at 12:16 PM,  <st...@apache.org> wrote:
>>>
>>> Author: stefan2
>>> Date: Mon Sep 28 10:16:12 2015
>>> New Revision: 1705646
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1705646&view=rev
>>> Log:
>>> Tune reorg strategy during the FSFS format7 packs such that it favors
>>> checkout-style tree walks now.
>>>
>>> Since r1703237, following the log history no longer requires frequent
>>> access to directory data but mainly relies on noderev predecessor chain.
>>> Therefore, it is no longer necessary to tightly pack directories in a
>>> separate part of the pack file.  With this patch, they are now placed
>>> with the file contents and can be processed by a quasi-linear scan
>>> instead of reading from two sections per pack.
>>>
>>> * subversion/libsvn_fs_fs/pack.c
>>>    (compare_dir_entries_format7): Adapt reporting strategy - process dirs
>>>                                   at the same time as files now.
>>>    (compare_is_dir): No longer needed.
>>>    (sort_reps): No longer distinguish between file and dir reps but only
>>>                 paths and delta chains when determining reprentation
>>> order.
>>
>>
>> Hi Stefan,
>>
>> In this post-1.9.x-branch commit (and a couple of subsequent
>> pack-related commits) you changed the pack layout for FSFS format7, to
>> make it more efficient for exports, checkouts, ...
>
>
> The benefit is not massive and very much depends on
> project vs. repository size.  20% or so for SVN.
>
>> I'm wondering, when 1.10 comes out, how will I be able to benefit from
>> this improved pack layout? Supposing I've already dump/load-ed with
>> 1.9 in FSFS7, and packed it.
>
>
> Future commits will use the new strategy.  So, speed
> will slightly go up over time.
>
>> Does this require a format bump (and another dump/load)? Or do you
>> plan on providing a way to "repack" (or simply unpack and pack) a
>> packed repository? Currently there is no unpack or repack command ...
>
>
> This is no format change; the beauty of format 7 is
> that we can learn and tweak strategies as we go without
> a format bump.  OTOH, there is not much potential for
> further improvement right now.  FSFS is limited by its
> fixed sharing scheme.
>
> I have no *plans* for unpack/repack but the thought
> has occurred to me.  It should not be hard to do as
> an offline operation, maybe alongside a reshard command.
> If you want to write an 'svnadmin unpack', I'd support
> you in that effort.

Interesting :-). I've put it on my radar, but not sure if I'll ever
get to it. Life keeps getting in the way ;-).

Would be a nice bite-sized task though ... if someone else wants to
have a go, feel free. Maybe it's something to put on the
http://subversion.apache.org/ideas.html page?

-- 
Johan

Re: svn commit: r1705646 - /subversion/trunk/subversion/libsvn_fs_fs/pack.c

Posted by Stefan Fuhrmann <eq...@web.de>.
On 27.01.2016 13:08, Johan Corveleyn wrote:
> On Mon, Sep 28, 2015 at 12:16 PM,  <st...@apache.org> wrote:
>> Author: stefan2
>> Date: Mon Sep 28 10:16:12 2015
>> New Revision: 1705646
>>
>> URL: http://svn.apache.org/viewvc?rev=1705646&view=rev
>> Log:
>> Tune reorg strategy during the FSFS format7 packs such that it favors
>> checkout-style tree walks now.
>>
>> Since r1703237, following the log history no longer requires frequent
>> access to directory data but mainly relies on noderev predecessor chain.
>> Therefore, it is no longer necessary to tightly pack directories in a
>> separate part of the pack file.  With this patch, they are now placed
>> with the file contents and can be processed by a quasi-linear scan
>> instead of reading from two sections per pack.
>>
>> * subversion/libsvn_fs_fs/pack.c
>>    (compare_dir_entries_format7): Adapt reporting strategy - process dirs
>>                                   at the same time as files now.
>>    (compare_is_dir): No longer needed.
>>    (sort_reps): No longer distinguish between file and dir reps but only
>>                 paths and delta chains when determining reprentation order.
>
> Hi Stefan,
>
> In this post-1.9.x-branch commit (and a couple of subsequent
> pack-related commits) you changed the pack layout for FSFS format7, to
> make it more efficient for exports, checkouts, ...

The benefit is not massive and very much depends on
project vs. repository size.  20% or so for SVN.

> I'm wondering, when 1.10 comes out, how will I be able to benefit from
> this improved pack layout? Supposing I've already dump/load-ed with
> 1.9 in FSFS7, and packed it.

Future commits will use the new strategy.  So, speed
will slightly go up over time.

> Does this require a format bump (and another dump/load)? Or do you
> plan on providing a way to "repack" (or simply unpack and pack) a
> packed repository? Currently there is no unpack or repack command ...

This is no format change; the beauty of format 7 is
that we can learn and tweak strategies as we go without
a format bump.  OTOH, there is not much potential for
further improvement right now.  FSFS is limited by its
fixed sharing scheme.

I have no *plans* for unpack/repack but the thought
has occurred to me.  It should not be hard to do as
an offline operation, maybe alongside a reshard command.
If you want to write an 'svnadmin unpack', I'd support
you in that effort.

-- Stefan^2.

Re: svn commit: r1705646 - /subversion/trunk/subversion/libsvn_fs_fs/pack.c

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Sep 28, 2015 at 12:16 PM,  <st...@apache.org> wrote:
> Author: stefan2
> Date: Mon Sep 28 10:16:12 2015
> New Revision: 1705646
>
> URL: http://svn.apache.org/viewvc?rev=1705646&view=rev
> Log:
> Tune reorg strategy during the FSFS format7 packs such that it favors
> checkout-style tree walks now.
>
> Since r1703237, following the log history no longer requires frequent
> access to directory data but mainly relies on noderev predecessor chain.
> Therefore, it is no longer necessary to tightly pack directories in a
> separate part of the pack file.  With this patch, they are now placed
> with the file contents and can be processed by a quasi-linear scan
> instead of reading from two sections per pack.
>
> * subversion/libsvn_fs_fs/pack.c
>   (compare_dir_entries_format7): Adapt reporting strategy - process dirs
>                                  at the same time as files now.
>   (compare_is_dir): No longer needed.
>   (sort_reps): No longer distinguish between file and dir reps but only
>                paths and delta chains when determining reprentation order.

Hi Stefan,

In this post-1.9.x-branch commit (and a couple of subsequent
pack-related commits) you changed the pack layout for FSFS format7, to
make it more efficient for exports, checkouts, ...

I'm wondering, when 1.10 comes out, how will I be able to benefit from
this improved pack layout? Supposing I've already dump/load-ed with
1.9 in FSFS7, and packed it.

Does this require a format bump (and another dump/load)? Or do you
plan on providing a way to "repack" (or simply unpack and pack) a
packed repository? Currently there is no unpack or repack command ...

-- 
Johan