You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Sérgio Basto <se...@serjux.com> on 2012/03/21 01:17:34 UTC

random sort of svn status and svn diff

Hi, I am facing a problem 
I do a svn diff | lsdiff 
I got file in random order , also happens with svn status 
How I sort diff and status , or at least don't get random results 
I am using subversion-1.6.17-5.fc16.x86_64 on Fedora 17 
I had clean my $HOME/.subversion nothing change, happens with at least 2
repos ( all that I tested )  

Example: 

#svn diff | lsdiff rsst2.config stop_prod.sh test.sh start_prod.sh
local.defs rsst.defs 
#svn diff | lsdiff local.defs rsst.defs rsst2.config stop_prod.sh
test.sh start_prod.sh 
#svn diff | lsdiff test.sh start_prod.sh local.defs rsst.defs
rsst2.config stop_prod.sh 
Thanks, 
-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Daniel Shahaf <da...@elego.de>.
Philip Martin wrote on Wed, Mar 21, 2012 at 00:51:53 +0000:
> Sérgio Basto <se...@serjux.com> writes:
> 
> > On Tue, 2012-03-20 at 19:23 -0500, Ryan Schmidt wrote: 
> >> On Mar 20, 2012, at 19:17, Sérgio Basto wrote:
> >> 
> >> > Hi, I am facing a problem 
> >> > I do a svn diff | lsdiff 
> >> > I got file in random order , also happens with svn status 
> >> 
> >> Correct, Subversion does not output these in a sorted order.
> >
> > but one month ago , svn diff have always same output , now it is
> > random , is a bug or regression.
> 
> That's the result of a change in the APR hash table implementation.

Another cause of losing the sort order might be the client switching
from ra_neon to ra_serf.

> Subversion often uses "hash order" and in the past this was always the
> same for a given set of hash keys.  The new APR implementation means
> that this order is no longer constant.

Re: random order due to APR hash change (was: random sort of svn status and svn diff)

Posted by Vincent Lefevre <vi...@vinc17.net>.
On 2012-03-29 10:07:02 -0400, Mark Phippard wrote:
> There is an issue filed for this:
> 
> http://subversion.tigris.org/issues/show_bug.cgi?id=4134

Thanks. I've updated the Debian bug information.

-- 
Vincent Lefèvre <vi...@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Re: random order due to APR hash change (was: random sort of svn status and svn diff)

Posted by Mark Phippard <ma...@gmail.com>.
On Thu, Mar 29, 2012 at 10:01 AM, Vincent Lefevre
<vi...@vinc17.net> wrote:

>> That's the result of a change in the APR hash table implementation.
>> Subversion often uses "hash order" and in the past this was always the
>> same for a given set of hash keys.  The new APR implementation means
>> that this order is no longer constant.
>
> In Subversion 1.6.x under Debian, it also affects the properties order
> in "svn dump": http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=664867
> I don't know about Subversion 1.7.x.
>
> As I've said in my bug report:
>
> ----------------------------------------
>
> After the upgrade to libapr1 1.4.6-1, the properties in the "svn dump"
> output are now in a random order. I don't think their order is
> specified, but IMHO, they should be in a consistent order to allow
> conventional tools like diff, MD5 checking on a dump, rsync[*] and
> so on to work nicely.

There is an issue filed for this:

http://subversion.tigris.org/issues/show_bug.cgi?id=4134


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

random order due to APR hash change (was: random sort of svn status and svn diff)

Posted by Vincent Lefevre <vi...@vinc17.net>.
On 2012-03-21 00:51:53 +0000, Philip Martin wrote:
> Sérgio Basto <se...@serjux.com> writes:
> 
> > On Tue, 2012-03-20 at 19:23 -0500, Ryan Schmidt wrote: 
> >> On Mar 20, 2012, at 19:17, Sérgio Basto wrote:
> >> 
> >> > Hi, I am facing a problem 
> >> > I do a svn diff | lsdiff 
> >> > I got file in random order , also happens with svn status 
> >> 
> >> Correct, Subversion does not output these in a sorted order.
> >
> > but one month ago , svn diff have always same output , now it is
> > random , is a bug or regression.
> 
> That's the result of a change in the APR hash table implementation.
> Subversion often uses "hash order" and in the past this was always the
> same for a given set of hash keys.  The new APR implementation means
> that this order is no longer constant.

In Subversion 1.6.x under Debian, it also affects the properties order
in "svn dump": http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=664867
I don't know about Subversion 1.7.x.

As I've said in my bug report:

----------------------------------------

After the upgrade to libapr1 1.4.6-1, the properties in the "svn dump"
output are now in a random order. I don't think their order is
specified, but IMHO, they should be in a consistent order to allow
conventional tools like diff, MD5 checking on a dump, rsync[*] and
so on to work nicely.

[*] rsync will still work, but obviously less efficiently.

----------------------------------------

I had to revert to the previous libapr1 version.

> > why the output shouldn't be sorted by name or something else ? 
> > or why svn don't have a sort option this is insane . 
> 
> It could be done but nobody has done it yet, the APR change is only a
> few weeks old.  Sorting may need to be optional as there is a cost to
> sorting that not everybody would want.

Is this cost noticeable (when done efficiently)?

-- 
Vincent Lefèvre <vi...@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Re: random sort of svn status and svn diff

Posted by Daniel Shahaf <da...@elego.de>.
Sérgio Basto wrote on Wed, Mar 21, 2012 at 07:22:03 +0000:
> Hi, Dev list , I want to fix random sort of svn status and svn diff, on
> recent code of svn, it just put a sort somewhere, could someone point me
> the directory in the source code, where is the code of svn frontend
> commands like diff and status ? 
> 

Wrong question.  You want the library code, not the frontend code. :-)

> Thanks,
> 
> -- 
> Sérgio M. B.
> 

Re: random sort of svn status and svn diff

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Mar 21, 2012, at 02:22, Sérgio Basto wrote:
> On Wed, 2012-03-21 at 08:00 +0100, Stefan Sperling wrote: 
>> You should discuss the approach you want to take to fix this on the
>> dev@ list before putting too much work into a patch. That way, if there
>> are any concerns about the approach they can get resolved upfront,
>> before a lot of work has been done. Please start a new thread on the
>> dev@ list to discuss your approach. 
> 
> Hi, Dev list , I want to fix random sort of svn status and svn diff, on
> recent code of svn, it just put a sort somewhere, could someone point me
> the directory in the source code, where is the code of svn frontend
> commands like diff and status ? 

This is the users list, not the dev list. :)




Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Wed, 2012-03-21 at 08:00 +0100, Stefan Sperling wrote: 
> On Wed, Mar 21, 2012 at 03:02:53AM +0000, Sérgio Basto wrote:
> > On Wed, 2012-03-21 at 00:51 +0000, Philip Martin wrote: 
> > > Sérgio Basto <se...@serjux.com> writes:
> > > > why the output shouldn't be sorted by name or something else ? 
> > > > or why svn don't have a sort option this is insane . 
> > > 
> > > It could be done but nobody has done it yet, the APR change is only a
> > > few weeks old.  Sorting may need to be optional as there is a cost to
> > > sorting that not everybody would want.
> > 
> > I am a volunteer to change code to do a constant sort on svn diff and
> > status and test it , I think that is more quick that do an external
> > function to do that for me . 
> 
> Great, please see
> http://subversion.apache.org/docs/community-guide/general.html#patches
> 
> You should discuss the approach you want to take to fix this on the
> dev@ list before putting too much work into a patch. That way, if there
> are any concerns about the approach they can get resolved upfront,
> before a lot of work has been done. Please start a new thread on the
> dev@ list to discuss your approach. 

Hi, Dev list , I want to fix random sort of svn status and svn diff, on
recent code of svn, it just put a sort somewhere, could someone point me
the directory in the source code, where is the code of svn frontend
commands like diff and status ? 

Thanks,

-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Wed, 2012-03-21 at 07:25 +0000, Sérgio Basto wrote: 
> Hi, Dev list , I want to fix random sort of svn status and svn diff, on
> recent code of svn, it just put a sort somewhere, could someone point me
> the directory in the source code, where is the code of svn frontend
> commands like diff and status ? 


Sorry I though that I am forwarding whole message,
where is the rest of the thread:


> >> > Hi, I am facing a problem 
> >> > I do a svn diff | lsdiff 
> >> > I got file in random order , also happens with svn status 
> >> 
> >> Correct, Subversion does not output these in a sorted order.
> >
> > but one month ago , svn diff have always same output , now it is
> > random , is a bug or regression.
> 
> That's the result of a change in the APR hash table implementation.
> Subversion often uses "hash order" and in the past this was always the
> same for a given set of hash keys.  The new APR implementation means
> that this order is no longer constant.
> 



-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Branko Čibej <br...@apache.org>.
On 21.03.2012 18:03, Julian Foad wrote:
> Discussion of the details of APR's hash function has wandered off-topic for this thread.  Can we take that to a new thread and/or the APR dev list?
>

Yes yes, but I have to get a last word in ... it's your standard
multiplicative hash straight out of Knuth. :)

-- Brane

Re: random sort of svn status and svn diff

Posted by Julian Foad <ju...@btopenworld.com>.
Discussion of the details of APR's hash function has wandered off-topic for this thread.  Can we take that to a new thread and/or the APR dev list?

- Julian


Re: random sort of svn status and svn diff

Posted by Daniel Shahaf <da...@elego.de>.
Branko Čibej wrote on Wed, Mar 21, 2012 at 17:13:07 +0100:
> On 21.03.2012 15:17, Daniel Shahaf wrote:
> > Philip Martin wrote on Wed, Mar 21, 2012 at 13:00:01 +0000:
> >> Philip Martin <ph...@wandisco.com> writes:
> >>
> >>> So if I have
> >>>
> >>>    0, 1 ... X, Y ... N-1
> >>>
> >>> where X and Y collide I would expect changing the seed to give:
> >>>
> >>>    K, K+1 ... X, Y ... N-1, 0 ... K-1
> >>>
> >>> Is that going to remove the collision?
> >> Is it something to do with the bins?.  If X and Y were originally in the
> >> same bin the seed change might cause one to move to the next bin so X
> >> and Y no longer collide but Y and Y+1 collide?
> > The issue discussed on IRC has to do with the hash buckets, yes --- with
> > someone intentionally causing most of the hash elements to be in a small
> > number of buckets, rather than the more even distribution of the
> > 'average case' analysis.
> 
> Yup. And the randomization happens before the hashing, so if the hash
> function is good enough, a small change of the key will cause a
> considerably larger and different change in the result of the hash
> function. APR's default hash function is "good enough" in this respect;
> it's not cryptographically secure, but that's not what it's intended for.

APR's default hash function is linear in the last byte of the key.
(Incrementing key[klen-1] by X increments the hash by X.)

13:33:55       @stefan2 | hash = 33**0 * x[n-1] + 33**1 * x[n-2] + 33**2 
                        | * x[n-3] + ... + 33**(n-1) * x[0] + 33**n * seed

> 
> -- Brane
> 

Re: random sort of svn status and svn diff

Posted by Branko Čibej <br...@apache.org>.
On 21.03.2012 15:17, Daniel Shahaf wrote:
> Philip Martin wrote on Wed, Mar 21, 2012 at 13:00:01 +0000:
>> Philip Martin <ph...@wandisco.com> writes:
>>
>>> So if I have
>>>
>>>    0, 1 ... X, Y ... N-1
>>>
>>> where X and Y collide I would expect changing the seed to give:
>>>
>>>    K, K+1 ... X, Y ... N-1, 0 ... K-1
>>>
>>> Is that going to remove the collision?
>> Is it something to do with the bins?.  If X and Y were originally in the
>> same bin the seed change might cause one to move to the next bin so X
>> and Y no longer collide but Y and Y+1 collide?
> The issue discussed on IRC has to do with the hash buckets, yes --- with
> someone intentionally causing most of the hash elements to be in a small
> number of buckets, rather than the more even distribution of the
> 'average case' analysis.

Yup. And the randomization happens before the hashing, so if the hash
function is good enough, a small change of the key will cause a
considerably larger and different change in the result of the hash
function. APR's default hash function is "good enough" in this respect;
it's not cryptographically secure, but that's not what it's intended for.

-- Brane


Re: random sort of svn status and svn diff

Posted by Daniel Shahaf <da...@elego.de>.
Philip Martin wrote on Wed, Mar 21, 2012 at 13:00:01 +0000:
> Philip Martin <ph...@wandisco.com> writes:
> 
> > So if I have
> >
> >    0, 1 ... X, Y ... N-1
> >
> > where X and Y collide I would expect changing the seed to give:
> >
> >    K, K+1 ... X, Y ... N-1, 0 ... K-1
> >
> > Is that going to remove the collision?
> 
> Is it something to do with the bins?.  If X and Y were originally in the
> same bin the seed change might cause one to move to the next bin so X
> and Y no longer collide but Y and Y+1 collide?

The issue discussed on IRC has to do with the hash buckets, yes --- with
someone intentionally causing most of the hash elements to be in a small
number of buckets, rather than the more even distribution of the
'average case' analysis.

> 
> -- 
> uberSVN: Apache Subversion Made Easy
> http://www.uberSVN.com

Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Wed, 2012-03-21 at 16:57 +0000, Philip Martin wrote: 
> Sérgio Basto <se...@serjux.com> writes:
> 
> > Unfortunately I don't had time before end of the month or even more ,
> > This random thing does not came in a good time . 
> > I use many svn diff to review my code  ... 
> >
> > So any ideas is appreciated. 
> 
> You can get the old behaviour by downgrading APR to a version that does
> not randomise the hash.  Or you could rebuild the current APR with a
> one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make.

Fantastic , 
rpm -Uvh --oldpackages
http://kojipkgs.fedoraproject.org/packages/apr/1.4.5/1.fc16/x86_64/apr-1.4.5-1.fc16.x86_64.rpm  http://kojipkgs.fedoraproject.org/packages/apr/1.4.5/1.fc16/x86_64/apr-devel-1.4.5-1.fc16.x86_64.rpm 

I got svn back in business 

Thanks, 
-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Julian Foad <ju...@btopenworld.com>.
Sérgio Basto wrote:

> On Mon, 2012-03-26 at 21:12 +0200, Daniel Shahaf wrote: 
>>  Sérgio Basto wrote on Mon, Mar 26, 2012 at 20:03:00 +0100:
>>  > On Mon, 2012-03-26 at 20:33 +0200, Daniel Shahaf wrote: 
>>  > > Call svn_sort__hash() :-)
>>  > 
>>  > Could someone write some code as example , please, I hate try to
>>  > visualize .
>>  > 
>>  > we already have a svn_sort__hash() or have to do it ? 
>> 
>>  The former; it's declared in svn_sorts.h.
>> 
>>  The workflow I am assuming here is that you keep the data in a hash as
>>  it's passed around, and then sort it at the last minute.
> 
> And the code asked ?  
> Could you paste the code that you are talking about please ? 

Please try to answer your own questions before asking us.  You can find lots of examples like this:

  $ grep svn_sort__hash subversion/*/*.c
  subversion/libsvn_client/list.c:90:
    array = svn_sort__hash(tmpdirents, ...
  subversion/libsvn_client/mergeinfo.c:162:
    svn_sort__hash(result_catalog, ...
  ...

So here's an example from subversion/libsvn_client/list.c:90:

  /* Sort the hash, so we can call the callback in a "deterministic" order. */
  array = svn_sort__hash(tmpdirents, svn_sort_compare_items_lexically, pool);
  for (i = 0; i < array->nelts; ++i)
    {
      svn_sort__item_t *item = &APR_ARRAY_IDX(array, i, svn_sort__item_t);
      const char *path;
      svn_dirent_t *the_ent = item->value;

Regards,
- Julian

Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Mon, 2012-03-26 at 21:12 +0200, Daniel Shahaf wrote: 
> Sérgio Basto wrote on Mon, Mar 26, 2012 at 20:03:00 +0100:
> > On Mon, 2012-03-26 at 20:33 +0200, Daniel Shahaf wrote: 
> > > Sérgio Basto wrote on Mon, Mar 26, 2012 at 19:27:42 +0100:
> > > > On Thu, 2012-03-22 at 08:23 +0000, Julian Foad wrote: 
> > > > > Sérgio Basto wrote:
> > > > > 
> > > > > > Philip Martin wrote:
> > > > > >>  Or you could rebuild the current APR with a
> > > > > >>  one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make. 
> > > > > > 
> > > > > > I think when subversion receive the result of this apr_function , we
> > > > > > should do the sort , ie , subversion should not depend on sort or not
> > > > > > sort of the apr . 
> > > > > > 
> > > > > > So where in source code we have the return of apr function ? 
> > > > > 
> > > > > There is not just one place, there are many places, even within the diff code.  Start in subversion/svn/diff-cmd.c, at svn_cl__diff(), and follow the function calls down.  For a WC-to-WC diff, these are the main calls:
> > > > > 
> > > > >   svn_cl__diff
> > > > >     |
> > > > >     svn_client_diff[_summarize]6, svn_client_diff[_summarize]_peg2
> > > > >       |
> > > > >       svn_wc_diff6
> > > > >         |
> > > > >         svn_wc__internal_walk_status
> > > > >           |
> > > > >           get_dir_status
> > > > >             |
> > > > >             /* Walk all the children of this directory. */
> > > > >             for (hi = apr_hash_first(subpool, all_children);
> > > > >                  hi; hi = apr_hash_next(hi))
> > > > 
> > > > Thanks, can you do some draft code example ? , how we got apr_hash
> > > > sorted by name ? with a for which construct a sort_apr_hash ? 
> > > 
> > > Call svn_sort__hash() :-)
> > 
> > Could someone write some code as example , please, I hate try to
> > visualize .
> > 
> > we already have a svn_sort__hash() or have to do it ? 
> 
> The former; it's declared in svn_sorts.h.
> 
> The workflow I am assuming here is that you keep the data in a hash as
> it's passed around, and then sort it at the last minute.

And the code asked ?  
Could you paste the code that you are talking about please ? 

-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Sérgio Basto wrote on Mon, Mar 26, 2012 at 20:03:00 +0100:
> On Mon, 2012-03-26 at 20:33 +0200, Daniel Shahaf wrote: 
> > Sérgio Basto wrote on Mon, Mar 26, 2012 at 19:27:42 +0100:
> > > On Thu, 2012-03-22 at 08:23 +0000, Julian Foad wrote: 
> > > > Sérgio Basto wrote:
> > > > 
> > > > > Philip Martin wrote:
> > > > >>  Or you could rebuild the current APR with a
> > > > >>  one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make. 
> > > > > 
> > > > > I think when subversion receive the result of this apr_function , we
> > > > > should do the sort , ie , subversion should not depend on sort or not
> > > > > sort of the apr . 
> > > > > 
> > > > > So where in source code we have the return of apr function ? 
> > > > 
> > > > There is not just one place, there are many places, even within the diff code.  Start in subversion/svn/diff-cmd.c, at svn_cl__diff(), and follow the function calls down.  For a WC-to-WC diff, these are the main calls:
> > > > 
> > > >   svn_cl__diff
> > > >     |
> > > >     svn_client_diff[_summarize]6, svn_client_diff[_summarize]_peg2
> > > >       |
> > > >       svn_wc_diff6
> > > >         |
> > > >         svn_wc__internal_walk_status
> > > >           |
> > > >           get_dir_status
> > > >             |
> > > >             /* Walk all the children of this directory. */
> > > >             for (hi = apr_hash_first(subpool, all_children);
> > > >                  hi; hi = apr_hash_next(hi))
> > > 
> > > Thanks, can you do some draft code example ? , how we got apr_hash
> > > sorted by name ? with a for which construct a sort_apr_hash ? 
> > 
> > Call svn_sort__hash() :-)
> 
> Could someone write some code as example , please, I hate try to
> visualize .
> 
> we already have a svn_sort__hash() or have to do it ? 

The former; it's declared in svn_sorts.h.

The workflow I am assuming here is that you keep the data in a hash as
it's passed around, and then sort it at the last minute.

Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Mon, 2012-03-26 at 20:33 +0200, Daniel Shahaf wrote: 
> Sérgio Basto wrote on Mon, Mar 26, 2012 at 19:27:42 +0100:
> > On Thu, 2012-03-22 at 08:23 +0000, Julian Foad wrote: 
> > > Sérgio Basto wrote:
> > > 
> > > > Philip Martin wrote:
> > > >>  Or you could rebuild the current APR with a
> > > >>  one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make. 
> > > > 
> > > > I think when subversion receive the result of this apr_function , we
> > > > should do the sort , ie , subversion should not depend on sort or not
> > > > sort of the apr . 
> > > > 
> > > > So where in source code we have the return of apr function ? 
> > > 
> > > There is not just one place, there are many places, even within the diff code.  Start in subversion/svn/diff-cmd.c, at svn_cl__diff(), and follow the function calls down.  For a WC-to-WC diff, these are the main calls:
> > > 
> > >   svn_cl__diff
> > >     |
> > >     svn_client_diff[_summarize]6, svn_client_diff[_summarize]_peg2
> > >       |
> > >       svn_wc_diff6
> > >         |
> > >         svn_wc__internal_walk_status
> > >           |
> > >           get_dir_status
> > >             |
> > >             /* Walk all the children of this directory. */
> > >             for (hi = apr_hash_first(subpool, all_children);
> > >                  hi; hi = apr_hash_next(hi))
> > 
> > Thanks, can you do some draft code example ? , how we got apr_hash
> > sorted by name ? with a for which construct a sort_apr_hash ? 
> 
> Call svn_sort__hash() :-)

Could someone write some code as example , please, I hate try to
visualize .

we already have a svn_sort__hash() or have to do it ? 

Thanks, 
-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Sérgio Basto wrote on Mon, Mar 26, 2012 at 19:27:42 +0100:
> On Thu, 2012-03-22 at 08:23 +0000, Julian Foad wrote: 
> > Sérgio Basto wrote:
> > 
> > > Philip Martin wrote:
> > >>  Or you could rebuild the current APR with a
> > >>  one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make. 
> > > 
> > > I think when subversion receive the result of this apr_function , we
> > > should do the sort , ie , subversion should not depend on sort or not
> > > sort of the apr . 
> > > 
> > > So where in source code we have the return of apr function ? 
> > 
> > There is not just one place, there are many places, even within the diff code.  Start in subversion/svn/diff-cmd.c, at svn_cl__diff(), and follow the function calls down.  For a WC-to-WC diff, these are the main calls:
> > 
> >   svn_cl__diff
> >     |
> >     svn_client_diff[_summarize]6, svn_client_diff[_summarize]_peg2
> >       |
> >       svn_wc_diff6
> >         |
> >         svn_wc__internal_walk_status
> >           |
> >           get_dir_status
> >             |
> >             /* Walk all the children of this directory. */
> >             for (hi = apr_hash_first(subpool, all_children);
> >                  hi; hi = apr_hash_next(hi))
> 
> Thanks, can you do some draft code example ? , how we got apr_hash
> sorted by name ? with a for which construct a sort_apr_hash ? 

Call svn_sort__hash() :-)

Re: random sort of svn status and svn diff

Posted by Johan Corveleyn <jc...@gmail.com>.
2012/3/22 Julian Foad <ju...@btopenworld.com>:
> Sérgio Basto wrote:
>
>> Philip Martin wrote:
>>>  Or you could rebuild the current APR with a
>>>  one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make.
>>
>> I think when subversion receive the result of this apr_function , we
>> should do the sort , ie , subversion should not depend on sort or not
>> sort of the apr .
>>
>> So where in source code we have the return of apr function ?
>
> There is not just one place, there are many places, even within the diff code.  Start in subversion/svn/diff-cmd.c, at svn_cl__diff(), and follow the function calls down.  For a WC-to-WC diff, these are the main calls:
>
>   svn_cl__diff
>     |
>     svn_client_diff[_summarize]6, svn_client_diff[_summarize]_peg2
>       |
>       svn_wc_diff6
>         |
>         svn_wc__internal_walk_status
>           |
>           get_dir_status
>             |
>             /* Walk all the children of this directory. */
>             for (hi = apr_hash_first(subpool, all_children);
>                  hi; hi = apr_hash_next(hi))
>
> This place determines the order of entries within a directory, for a WC-to-WC diff.  That's just one of the code paths involved in diff.  There are several others, depending on the kind of diff.  For diffs against the repository, it is more complex.  Also
> look for code that determines the order of property changes.

This reminds me of something: it may be coincidence, but up until now
(until 1.7.x that is) 'svnlook changed -rX' and 'svnlook diff -rX'
presented the files in the same order. I like this behavior, because
it's nice when composing post-commit emails: the list of changed files
on top, followed by the actual diff ... both following the same order.

Anyway, no biggie, all I'm saying is: +1 for introducing some stable
ordering here ...

-- 
Johan

Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Thu, 2012-03-22 at 08:23 +0000, Julian Foad wrote: 
> Sérgio Basto wrote:
> 
> > Philip Martin wrote:
> >>  Or you could rebuild the current APR with a
> >>  one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make. 
> > 
> > I think when subversion receive the result of this apr_function , we
> > should do the sort , ie , subversion should not depend on sort or not
> > sort of the apr . 
> > 
> > So where in source code we have the return of apr function ? 
> 
> There is not just one place, there are many places, even within the diff code.  Start in subversion/svn/diff-cmd.c, at svn_cl__diff(), and follow the function calls down.  For a WC-to-WC diff, these are the main calls:
> 
>   svn_cl__diff
>     |
>     svn_client_diff[_summarize]6, svn_client_diff[_summarize]_peg2
>       |
>       svn_wc_diff6
>         |
>         svn_wc__internal_walk_status
>           |
>           get_dir_status
>             |
>             /* Walk all the children of this directory. */
>             for (hi = apr_hash_first(subpool, all_children);
>                  hi; hi = apr_hash_next(hi))

Thanks, can you do some draft code example ? , how we got apr_hash
sorted by name ? with a for which construct a sort_apr_hash ? 



> This place determines the order of entries within a directory, for a WC-to-WC diff.  That's just one of the code paths involved in diff.  There are several others, depending on the kind of diff.  For diffs against the repository, it is more complex.  Also 
> look for code that determines the order of property changes.

Thanks, 
-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Julian Foad <ju...@btopenworld.com>.
Sérgio Basto wrote:

> Philip Martin wrote:
>>  Or you could rebuild the current APR with a
>>  one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make. 
> 
> I think when subversion receive the result of this apr_function , we
> should do the sort , ie , subversion should not depend on sort or not
> sort of the apr . 
> 
> So where in source code we have the return of apr function ? 

There is not just one place, there are many places, even within the diff code.  Start in subversion/svn/diff-cmd.c, at svn_cl__diff(), and follow the function calls down.  For a WC-to-WC diff, these are the main calls:

  svn_cl__diff
    |
    svn_client_diff[_summarize]6, svn_client_diff[_summarize]_peg2
      |
      svn_wc_diff6
        |
        svn_wc__internal_walk_status
          |
          get_dir_status
            |
            /* Walk all the children of this directory. */
            for (hi = apr_hash_first(subpool, all_children);
                 hi; hi = apr_hash_next(hi))
 
This place determines the order of entries within a directory, for a WC-to-WC diff.  That's just one of the code paths involved in diff.  There are several others, depending on the kind of diff.  For diffs against the repository, it is more complex.  Also 
look for code that determines the order of property changes.

- Julian

Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Wed, 2012-03-21 at 16:57 +0000, Philip Martin wrote:
> Or you could rebuild the current APR with a
> one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make. 

I think when subversion receive the result of this apr_function , we
should do the sort , ie , subversion should not depend on sort or not
sort of the apr . 

So where in source code we have the return of apr function ? 

Thanks, 
-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Philip Martin <ph...@wandisco.com>.
Sérgio Basto <se...@serjux.com> writes:

> Unfortunately I don't had time before end of the month or even more ,
> This random thing does not came in a good time . 
> I use many svn diff to review my code  ... 
>
> So any ideas is appreciated. 

You can get the old behaviour by downgrading APR to a version that does
not randomise the hash.  Or you could rebuild the current APR with a
one-line patch to set ht->seed to 0 in apr_hash.c:apr_hash_make.

You could even use an LD_PRELOAD library to intercept apr_hash_make and
set the seed to zero at runtime.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Wed, 2012-03-21 at 10:17 +0200, Daniel Shahaf wrote: 
> Sérgio Basto wrote on Wed, Mar 21, 2012 at 07:25:05 +0000:
> > On Wed, 2012-03-21 at 08:00 +0100, Stefan Sperling wrote: 
> > > On Wed, Mar 21, 2012 at 03:02:53AM +0000, Sérgio Basto wrote:
> > > > On Wed, 2012-03-21 at 00:51 +0000, Philip Martin wrote: 
> > > > > Sérgio Basto <se...@serjux.com> writes:
> > > > > > why the output shouldn't be sorted by name or something else ? 
> > > > > > or why svn don't have a sort option this is insane . 
> > > > > 
> > > > > It could be done but nobody has done it yet, the APR change is only a
> > > > > few weeks old.  Sorting may need to be optional as there is a cost to
> > > > > sorting that not everybody would want.
> > > > 
> > > > I am a volunteer to change code to do a constant sort on svn diff and
> > > > status and test it , I think that is more quick that do an external
> > > > function to do that for me . 
> > > 
> > > Great, please see
> > > http://subversion.apache.org/docs/community-guide/general.html#patches
> > > 
> > > You should discuss the approach you want to take to fix this on the
> > > dev@ list before putting too much work into a patch. That way, if there
> > > are any concerns about the approach they can get resolved upfront,
> > > before a lot of work has been done. Please start a new thread on the
> > > dev@ list to discuss your approach. 
> > 
> 
> +1
> 
> > Hi, Dev list , I want to fix random sort of svn status and svn diff, on
> > recent code of svn, it just put a sort somewhere, could someone point me
> > the directory in the source code, where is the code of svn frontend
> > commands like diff and status ? 
> 
> The frontend code lives in the directory subversion/svn/.
> 
> As I said elsethread, you want to patch the library code too, not just
> the frontend code.
> 
> Looking forward to hear what approach you're planning to take,

I am planning a quick fix ;) 
I never saw code of subversion before , though at point where we get the
list of file, do a sort .
Unfortunately I don't had time before end of the month or even more ,
This random thing does not came in a good time . 
I use many svn diff to review my code  ... 

So any ideas is appreciated. 
Thanks,
-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> So if I have
>
>    0, 1 ... X, Y ... N-1
>
> where X and Y collide I would expect changing the seed to give:
>
>    K, K+1 ... X, Y ... N-1, 0 ... K-1
>
> Is that going to remove the collision?

Is it something to do with the bins?.  If X and Y were originally in the
same bin the seed change might cause one to move to the next bin so X
and Y no longer collide but Y and Y+1 collide?

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: random sort of svn status and svn diff

Posted by Philip Martin <ph...@wandisco.com>.
Branko Čibej <br...@apache.org> writes:

> On 21.03.2012 10:29, Philip Martin wrote:
>> In passing I note that the way APR "randomises" the hash is not all
>> that random. It just adds a fixed prefix to the keys passed to the
>> default hash function. I thought the point of the APR change was to
>> make it harder for the user to predict hash collisions and I do wonder
>> whether this has been achieved. If two strings S1 and S2 collide then
>> does adding a constant prefix stop the collision?
>
> Yes, it does; however, since the prefix is constant per hash table, it
> won't change the total number of collisions.
>
> The point of the randomization is that collisions aren't predictable
> amongst different hash table instances, which makes collision-based DoS
> attacks a bit harder. Which was the whole point of the exercise.

I'm surprised, are you sure?  If I put N elements into the hash and
iterate over then using apr_hash_next they come out in some order:

  0, 1, 2 ... N-1

Changing the seed/prefix changes the order to something like:

   K, K+1 ... N-1, 0 ... K-1

So if I have

   0, 1 ... X, Y ... N-1

where X and Y collide I would expect changing the seed to give:

   K, K+1 ... X, Y ... N-1, 0 ... K-1

Is that going to remove the collision?

> Anyway, if we're aiming for stable ordering in the output, and the data
> are currently stored in a hash table, then the only sane option is to
> sort the data, not make the hash function more predictable.

What some people care about is stability from one run to the next when
using the same executable, other people want stability from one
executable to the next and some want predictable order.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: random sort of svn status and svn diff

Posted by Branko Čibej <br...@apache.org>.
On 21.03.2012 10:29, Philip Martin wrote:
> In passing I note that the way APR "randomises" the hash is not all
> that random. It just adds a fixed prefix to the keys passed to the
> default hash function. I thought the point of the APR change was to
> make it harder for the user to predict hash collisions and I do wonder
> whether this has been achieved. If two strings S1 and S2 collide then
> does adding a constant prefix stop the collision?

Yes, it does; however, since the prefix is constant per hash table, it
won't change the total number of collisions.

The point of the randomization is that collisions aren't predictable
amongst different hash table instances, which makes collision-based DoS
attacks a bit harder. Which was the whole point of the exercise.

Anyway, if we're aiming for stable ordering in the output, and the data
are currently stored in a hash table, then the only sane option is to
sort the data, not make the hash function more predictable.

-- Brane


Re: random sort of svn status and svn diff

Posted by Philip Martin <ph...@wandisco.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> Sérgio Basto wrote on Wed, Mar 21, 2012 at 07:25:05 +0000:
>> Hi, Dev list , I want to fix random sort of svn status and svn diff, on
>> recent code of svn, it just put a sort somewhere, could someone point me
>> the directory in the source code, where is the code of svn frontend
>> commands like diff and status ? 
>
> The frontend code lives in the directory subversion/svn/.
>
> As I said elsethread, you want to patch the library code too, not just
> the frontend code.
>
> Looking forward to hear what approach you're planning to take,

One option would be to replace all the calls to apr_hash_make with a new
function svn_hash_make that calls apr_hash_make_custom with the our own
copy of APR's default hash function.  This would bypass APR's random
seed and the hash order would be stable.

In passing I note that the way APR "randomises" the hash is not all that
random.  It just adds a fixed prefix to the keys passed to the default
hash function.  I thought the point of the APR change was to make it
harder for the user to predict hash collisions and I do wonder whether
this has been achieved.  If two strings S1 and S2 collide then does
adding a constant prefix stop the collision?

svnadmin create repo
svn mkdir -mm file://`pwd`/repo/A{0,1,2,3,4,5,6,7,8,9}
svn co file://`pwd`/repo wc
svn ps p v wc/*

Now status shows an unstable order:

svn st wc
 M      wc/A5
 M      wc/A6
 M      wc/A7
 M      wc/A8
 M      wc/A9
 M      wc/A0
 M      wc/A1
 M      wc/A2
 M      wc/A3
 M      wc/A4

svn st wc
 M      wc/A7
 M      wc/A8
 M      wc/A9
 M      wc/A0
 M      wc/A1
 M      wc/A2
 M      wc/A3
 M      wc/A4
 M      wc/A5
 M      wc/A6

but the "cycle" is always the same N->9,0->N-1.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: random sort of svn status and svn diff

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Sérgio Basto wrote on Wed, Mar 21, 2012 at 07:25:05 +0000:
> On Wed, 2012-03-21 at 08:00 +0100, Stefan Sperling wrote: 
> > On Wed, Mar 21, 2012 at 03:02:53AM +0000, Sérgio Basto wrote:
> > > On Wed, 2012-03-21 at 00:51 +0000, Philip Martin wrote: 
> > > > Sérgio Basto <se...@serjux.com> writes:
> > > > > why the output shouldn't be sorted by name or something else ? 
> > > > > or why svn don't have a sort option this is insane . 
> > > > 
> > > > It could be done but nobody has done it yet, the APR change is only a
> > > > few weeks old.  Sorting may need to be optional as there is a cost to
> > > > sorting that not everybody would want.
> > > 
> > > I am a volunteer to change code to do a constant sort on svn diff and
> > > status and test it , I think that is more quick that do an external
> > > function to do that for me . 
> > 
> > Great, please see
> > http://subversion.apache.org/docs/community-guide/general.html#patches
> > 
> > You should discuss the approach you want to take to fix this on the
> > dev@ list before putting too much work into a patch. That way, if there
> > are any concerns about the approach they can get resolved upfront,
> > before a lot of work has been done. Please start a new thread on the
> > dev@ list to discuss your approach. 
> 

+1

> Hi, Dev list , I want to fix random sort of svn status and svn diff, on
> recent code of svn, it just put a sort somewhere, could someone point me
> the directory in the source code, where is the code of svn frontend
> commands like diff and status ? 

The frontend code lives in the directory subversion/svn/.

As I said elsethread, you want to patch the library code too, not just
the frontend code.

Looking forward to hear what approach you're planning to take,

Daniel


Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Wed, 2012-03-21 at 08:00 +0100, Stefan Sperling wrote: 
> On Wed, Mar 21, 2012 at 03:02:53AM +0000, Sérgio Basto wrote:
> > On Wed, 2012-03-21 at 00:51 +0000, Philip Martin wrote: 
> > > Sérgio Basto <se...@serjux.com> writes:
> > > > why the output shouldn't be sorted by name or something else ? 
> > > > or why svn don't have a sort option this is insane . 
> > > 
> > > It could be done but nobody has done it yet, the APR change is only a
> > > few weeks old.  Sorting may need to be optional as there is a cost to
> > > sorting that not everybody would want.
> > 
> > I am a volunteer to change code to do a constant sort on svn diff and
> > status and test it , I think that is more quick that do an external
> > function to do that for me . 
> 
> Great, please see
> http://subversion.apache.org/docs/community-guide/general.html#patches
> 
> You should discuss the approach you want to take to fix this on the
> dev@ list before putting too much work into a patch. That way, if there
> are any concerns about the approach they can get resolved upfront,
> before a lot of work has been done. Please start a new thread on the
> dev@ list to discuss your approach. 

Hi, Dev list , I want to fix random sort of svn status and svn diff, on
recent code of svn, it just put a sort somewhere, could someone point me
the directory in the source code, where is the code of svn frontend
commands like diff and status ? 

Thanks,

-- 
Sérgio M. B.




Re: random sort of svn status and svn diff

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Mar 21, 2012 at 03:02:53AM +0000, Sérgio Basto wrote:
> On Wed, 2012-03-21 at 00:51 +0000, Philip Martin wrote: 
> > Sérgio Basto <se...@serjux.com> writes:
> > > why the output shouldn't be sorted by name or something else ? 
> > > or why svn don't have a sort option this is insane . 
> > 
> > It could be done but nobody has done it yet, the APR change is only a
> > few weeks old.  Sorting may need to be optional as there is a cost to
> > sorting that not everybody would want.
> 
> I am a volunteer to change code to do a constant sort on svn diff and
> status and test it , I think that is more quick that do an external
> function to do that for me . 

Great, please see
http://subversion.apache.org/docs/community-guide/general.html#patches

You should discuss the approach you want to take to fix this on the
dev@ list before putting too much work into a patch. That way, if there
are any concerns about the approach they can get resolved upfront,
before a lot of work has been done. Please start a new thread on the
dev@ list to discuss your approach. Thanks!

Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Wed, 2012-03-21 at 00:51 +0000, Philip Martin wrote: 
> Sérgio Basto <se...@serjux.com> writes:
> 
> > On Tue, 2012-03-20 at 19:23 -0500, Ryan Schmidt wrote: 
> >> On Mar 20, 2012, at 19:17, Sérgio Basto wrote:
> >> 
> >> > Hi, I am facing a problem 
> >> > I do a svn diff | lsdiff 
> >> > I got file in random order , also happens with svn status 
> >> 
> >> Correct, Subversion does not output these in a sorted order.
> >
> > but one month ago , svn diff have always same output , now it is
> > random , is a bug or regression.
> 
> That's the result of a change in the APR hash table implementation.
> Subversion often uses "hash order" and in the past this was always the
> same for a given set of hash keys.  The new APR implementation means
> that this order is no longer constant.
> 
> > why the output shouldn't be sorted by name or something else ? 
> > or why svn don't have a sort option this is insane . 
> 
> It could be done but nobody has done it yet, the APR change is only a
> few weeks old.  Sorting may need to be optional as there is a cost to
> sorting that not everybody would want.

I am a volunteer to change code to do a constant sort on svn diff and
status and test it , I think that is more quick that do an external
function to do that for me . 
I don't know any ideas for have a constant of this outputs ? 

Thanks,
-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Philip Martin <ph...@wandisco.com>.
Sérgio Basto <se...@serjux.com> writes:

> On Tue, 2012-03-20 at 19:23 -0500, Ryan Schmidt wrote: 
>> On Mar 20, 2012, at 19:17, Sérgio Basto wrote:
>> 
>> > Hi, I am facing a problem 
>> > I do a svn diff | lsdiff 
>> > I got file in random order , also happens with svn status 
>> 
>> Correct, Subversion does not output these in a sorted order.
>
> but one month ago , svn diff have always same output , now it is
> random , is a bug or regression.

That's the result of a change in the APR hash table implementation.
Subversion often uses "hash order" and in the past this was always the
same for a given set of hash keys.  The new APR implementation means
that this order is no longer constant.

> why the output shouldn't be sorted by name or something else ? 
> or why svn don't have a sort option this is insane . 

It could be done but nobody has done it yet, the APR change is only a
few weeks old.  Sorting may need to be optional as there is a cost to
sorting that not everybody would want.

-- 
Philip

Re: random sort of svn status and svn diff

Posted by Sérgio Basto <se...@serjux.com>.
On Tue, 2012-03-20 at 19:23 -0500, Ryan Schmidt wrote: 
> On Mar 20, 2012, at 19:17, Sérgio Basto wrote:
> 
> > Hi, I am facing a problem 
> > I do a svn diff | lsdiff 
> > I got file in random order , also happens with svn status 
> 
> Correct, Subversion does not output these in a sorted order.

but one month ago , svn diff have always same output , now it is
random , is a bug or regression.
why the output shouldn't be sorted by name or something else ? 
or why svn don't have a sort option this is insane . 

I use patchutils to edit diffs more than 4 years and now with random
order, mess up all results .

I not say that output have one order is seems random , I'm saying every
time I do a svn diff or status the order of files change OMG. 


> > How I sort diff and status , or at least don't get random results 
> 
> You can pipe the result to the "sort" program somehow.
> 
> 
> 

-- 
Sérgio M. B.


Re: random sort of svn status and svn diff

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Mar 20, 2012, at 19:17, Sérgio Basto wrote:

> Hi, I am facing a problem 
> I do a svn diff | lsdiff 
> I got file in random order , also happens with svn status 

Correct, Subversion does not output these in a sorted order.

> How I sort diff and status , or at least don't get random results 

You can pipe the result to the "sort" program somehow.