You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Martin Buck <mb...@gromit.dyndns.org> on 2011/11/29 14:47:27 UTC

[BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Hi,

recently, I tried to commit 2 versions of a huge file (6-7 GB) to a
Subversion repository. Both commits worked fine, but after the second one, I
could no longer access the repository with the standard svn client. Running
"svnadmin verify" on the repository resulted in the same error message I got
from the svn client:

svnadmin: Reading one svndiff window read beyond the end of the
representation

After the first commit, "svnadmin verify" on the repository still worked
fine. Digging a bit deeper, it turned out that it's not the repository itself
which is corrupted, but it's the svn client which gets confused by the
repository contents (possibly on 32 bit machines only).

This happend with Subversion 1.6.12 on Debian Squeeze x86, but I could
reproduce it with 1.7.1 as well.


Recipe to reproduce:
$ svnadmin create repo
$ svn co file://$PWD/repo wc
Checked out revision 0.
$ dd bs=1024k count=5000 if=/dev/urandom of=wc/largefile
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB) copied, 921.329 s, 5.7 MB/s
$ svn add wc/largefile
A  (bin)  wc/largefile
$ svn ci -m '' wc
Adding  (bin)  wc/largefile
Transmitting file data .
Committed revision 1.
$ echo foo >> wc/largefile
$ svn ci -m '' wc
Sending        wc/largefile
Transmitting file data .
Committed revision 2.
$ svnadmin verify repo
* Verified revision 0.
* Verified revision 1.
svnadmin: Reading one svndiff window read beyond the end of the representation
$


The problem is that subversion/libsvn_fs_fs/fs_fs.c:read_rep_line()
correctly parses the size of the base revision for r2 as a 64 bit number,
but then stores it in an apr_off_t which is 32 bits on my machine.

The attached patch fixes the bug for me. Does anybody see a problem with it?
Could somebody please commit it or should I open an issue in the tracker?

Thanks,
Martin

Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Branko Čibej <br...@xbc.nu>.
On 29.11.2011 15:32, Daniel Shahaf wrote:
> On Tuesday, November 29, 2011 2:47 PM, "Martin Buck" <mb...@gromit.dyndns.org> wrote:
>> The problem is that subversion/libsvn_fs_fs/fs_fs.c:read_rep_line()
>> correctly parses the size of the base revision for r2 as a 64 bit number,
>> but then stores it in an apr_off_t which is 32 bits on my machine.
>>
> Why isn't apr_off_t 64 bits?

If you look at the patch, I think there's a typo there -- the size is
stored in an apr_size_t, which has 32 bits; the patch changes that to
svn_filesize_t.

-- Brane


> @@ -2575,7 +2575,7 @@
>  
>    svn_revnum_t base_revision;
>    apr_off_t base_offset;
> -  apr_size_t base_length;
> +  svn_filesize_t base_length;
> I'd like both BASE_OFFSET and BASE_LENGTH to be 64 bit types.  Would it
> work to make them both svn_filesize_t?

It is not correct to use svn_filesize_t as an argument to lseek.

It would be interesting to see if Subversion works on a system that
supports large files but has a 32-bit off_t, however, that doesn't
appear to be the case here.

-- Brane


Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Branko Čibej <br...@xbc.nu>.
On 29.11.2011 16:26, Daniel Shahaf wrote:
> Okay.  Since both svn_filesize_t and apr_off_t are signed 64bits it may
> be just a bikeshed which type to use for which struct member.

It's not a bikeshed. It's a question of semantics. What does the
variable mean?
If it were a bikeshed, you wouldn't need any typedefs at all.

-- Brane


Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Daniel Shahaf <da...@elego.de>.
Daniel Shahaf wrote on Tue, Nov 29, 2011 at 17:26:36 +0200:
> I'll commit your patch and nominate it for backport to 1.7.3.  While
> at it I'll also scan the code for similar problems.

(Haven't got to it yet, but still planning to...)

Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Philip Martin wrote on Wed, Jan 11, 2012 at 19:32:55 +0000:
> "Daniel Shahaf" <d....@daniel.shahaf.name> writes:
> 
> > On Tuesday, November 29, 2011 4:08 PM, "Martin Buck" <mb...@gromit.dyndns.org> wrote:
> >> On Tue, Nov 29, 2011 at 04:32:39PM +0200, Daniel Shahaf wrote:
> >> > Why isn't apr_off_t 64 bits?
> >> 
> >> Sorry, I meant apr_size_t and that seems to correspond to C89's size_t and
> >> is not supposed to be used for file sizes. apr_off_t is 64 bits, of course.
> >> 
> >> > > --- subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c.orig	2011-10-19 19:28:55.000000000 +0200
> >> > > +++ subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c	2011-11-25 16:53:56.000000000 +0100
> >> > > @@ -2575,7 +2575,7 @@
> >> > >  
> >> > >    svn_revnum_t base_revision;
> >> > >    apr_off_t base_offset;
> >> > > -  apr_size_t base_length;
> >> > > +  svn_filesize_t base_length;
> >> > 
> >> > I'd like both BASE_OFFSET and BASE_LENGTH to be 64 bit types.  Would it
> >> > work to make them both svn_filesize_t?
> >> 
> >> I guess it would work (on a 32 bit machine, svn_filesize_t and apr_off_t are
> >> both typedefed to "long long int" anyway), but I wanted to make the types
> >> the same as those of "offset" and "size" in
> >> libsvn_fs_fs/fs.h:representation_t where the values ultimately get assigned
> >> to.
> >> 
> >> But since this is the first time I've looked at the Subversion sources, I
> >> can't really recommend anything (besides making sure that base_length is 64
> >> bits).
> >> 
> >
> > Okay.  Since both svn_filesize_t and apr_off_t are signed 64bits it may
> > be just a bikeshed which type to use for which struct member.  I'll
> > commit your patch and nominate it for backport to 1.7.3.  While at it I'll
> > also scan the code for similar problems.
> 
> Are you still planning to commit this?  Should someone else do it?

Ouch, sorry.  It's still on my todo list, but nowhere near the top;
"someone else" should feel than free to beat me to it.

Thanks.

Daniel

> 
> -- 
> Philip

Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Philip Martin <ph...@codematters.co.uk>.
"Daniel Shahaf" <d....@daniel.shahaf.name> writes:

> On Tuesday, November 29, 2011 4:08 PM, "Martin Buck" <mb...@gromit.dyndns.org> wrote:
>> On Tue, Nov 29, 2011 at 04:32:39PM +0200, Daniel Shahaf wrote:
>> > Why isn't apr_off_t 64 bits?
>> 
>> Sorry, I meant apr_size_t and that seems to correspond to C89's size_t and
>> is not supposed to be used for file sizes. apr_off_t is 64 bits, of course.
>> 
>> > > --- subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c.orig	2011-10-19 19:28:55.000000000 +0200
>> > > +++ subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c	2011-11-25 16:53:56.000000000 +0100
>> > > @@ -2575,7 +2575,7 @@
>> > >  
>> > >    svn_revnum_t base_revision;
>> > >    apr_off_t base_offset;
>> > > -  apr_size_t base_length;
>> > > +  svn_filesize_t base_length;
>> > 
>> > I'd like both BASE_OFFSET and BASE_LENGTH to be 64 bit types.  Would it
>> > work to make them both svn_filesize_t?
>> 
>> I guess it would work (on a 32 bit machine, svn_filesize_t and apr_off_t are
>> both typedefed to "long long int" anyway), but I wanted to make the types
>> the same as those of "offset" and "size" in
>> libsvn_fs_fs/fs.h:representation_t where the values ultimately get assigned
>> to.
>> 
>> But since this is the first time I've looked at the Subversion sources, I
>> can't really recommend anything (besides making sure that base_length is 64
>> bits).
>> 
>
> Okay.  Since both svn_filesize_t and apr_off_t are signed 64bits it may
> be just a bikeshed which type to use for which struct member.  I'll
> commit your patch and nominate it for backport to 1.7.3.  While at it I'll
> also scan the code for similar problems.

Are you still planning to commit this?  Should someone else do it?

-- 
Philip

Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
On Tuesday, November 29, 2011 4:08 PM, "Martin Buck" <mb...@gromit.dyndns.org> wrote:
> On Tue, Nov 29, 2011 at 04:32:39PM +0200, Daniel Shahaf wrote:
> > Why isn't apr_off_t 64 bits?
> 
> Sorry, I meant apr_size_t and that seems to correspond to C89's size_t and
> is not supposed to be used for file sizes. apr_off_t is 64 bits, of course.
> 
> > > --- subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c.orig	2011-10-19 19:28:55.000000000 +0200
> > > +++ subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c	2011-11-25 16:53:56.000000000 +0100
> > > @@ -2575,7 +2575,7 @@
> > >  
> > >    svn_revnum_t base_revision;
> > >    apr_off_t base_offset;
> > > -  apr_size_t base_length;
> > > +  svn_filesize_t base_length;
> > 
> > I'd like both BASE_OFFSET and BASE_LENGTH to be 64 bit types.  Would it
> > work to make them both svn_filesize_t?
> 
> I guess it would work (on a 32 bit machine, svn_filesize_t and apr_off_t are
> both typedefed to "long long int" anyway), but I wanted to make the types
> the same as those of "offset" and "size" in
> libsvn_fs_fs/fs.h:representation_t where the values ultimately get assigned
> to.
> 
> But since this is the first time I've looked at the Subversion sources, I
> can't really recommend anything (besides making sure that base_length is 64
> bits).
> 

Okay.  Since both svn_filesize_t and apr_off_t are signed 64bits it may
be just a bikeshed which type to use for which struct member.  I'll
commit your patch and nominate it for backport to 1.7.3.  While at it I'll
also scan the code for similar problems.

> > Are there other instances of this problem in our code?  (I'll look
> > myself later, but if you know or can easily find such it'll save me time
> > if you identified them.)
> 
> I haven't noticed any other instances, but then, I've not searched for them
> either. All I can say is that with the patch, Subversion seems to handle
> large files just fine (but I've done only very limited testing so far).
> 
> Thanks,
> Martin
> 

Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Martin Buck <mb...@gromit.dyndns.org>.
On Tue, Nov 29, 2011 at 04:32:39PM +0200, Daniel Shahaf wrote:
> Why isn't apr_off_t 64 bits?

Sorry, I meant apr_size_t and that seems to correspond to C89's size_t and
is not supposed to be used for file sizes. apr_off_t is 64 bits, of course.

> > --- subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c.orig	2011-10-19 19:28:55.000000000 +0200
> > +++ subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c	2011-11-25 16:53:56.000000000 +0100
> > @@ -2575,7 +2575,7 @@
> >  
> >    svn_revnum_t base_revision;
> >    apr_off_t base_offset;
> > -  apr_size_t base_length;
> > +  svn_filesize_t base_length;
> 
> I'd like both BASE_OFFSET and BASE_LENGTH to be 64 bit types.  Would it
> work to make them both svn_filesize_t?

I guess it would work (on a 32 bit machine, svn_filesize_t and apr_off_t are
both typedefed to "long long int" anyway), but I wanted to make the types
the same as those of "offset" and "size" in
libsvn_fs_fs/fs.h:representation_t where the values ultimately get assigned
to.

But since this is the first time I've looked at the Subversion sources, I
can't really recommend anything (besides making sure that base_length is 64
bits).

> Are there other instances of this problem in our code?  (I'll look
> myself later, but if you know or can easily find such it'll save me time
> if you identified them.)

I haven't noticed any other instances, but then, I've not searched for them
either. All I can say is that with the patch, Subversion seems to handle
large files just fine (but I've done only very limited testing so far).

Thanks,
Martin

Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
On Tuesday, November 29, 2011 2:47 PM, "Martin Buck" <mb...@gromit.dyndns.org> wrote:
> The problem is that subversion/libsvn_fs_fs/fs_fs.c:read_rep_line()
> correctly parses the size of the base revision for r2 as a 64 bit number,
> but then stores it in an apr_off_t which is 32 bits on my machine.
> 

Why isn't apr_off_t 64 bits?

> The attached patch fixes the bug for me. Does anybody see a problem with it?
> Could somebody please commit it or should I open an issue in the tracker?
> 
> Thanks,
> Martin
> 
> Email had 1 attachment:
> + large-file-repo-corruption-fix.diff
>   1k (text/x-diff)

> --- subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c.orig	2011-10-19 19:28:55.000000000 +0200
> +++ subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c	2011-11-25 16:53:56.000000000 +0100
> @@ -2575,7 +2575,7 @@
>  
>    svn_revnum_t base_revision;
>    apr_off_t base_offset;
> -  apr_size_t base_length;
> +  svn_filesize_t base_length;

I'd like both BASE_OFFSET and BASE_LENGTH to be 64 bit types.  Would it
work to make them both svn_filesize_t?

>  };
>  
>  /* Read the next line from file FILE and parse it as a text
> @@ -2636,7 +2636,7 @@
>    if (! str)
>      goto error;
>    SVN_ERR(svn_cstring_atoi64(&val, str));
> -  rep_args->base_length = (apr_size_t)val;
> +  rep_args->base_length = (svn_filesize_t)val;
>  
>    *rep_args_p = rep_args;
>    return SVN_NO_ERROR;

Are there other instances of this problem in our code?  (I'll look
myself later, but if you know or can easily find such it'll save me time
if you identified them.)

Thanks,

Daniel

Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Branko Čibej <br...@xbc.nu>.
On 29.11.2011 16:21, Philip Martin wrote:
> Martin Buck <mb...@gromit.dyndns.org> writes:
>
>> --- subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c.orig	2011-10-19 19:28:55.000000000 +0200
>> +++ subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c	2011-11-25 16:53:56.000000000 +0100
>> @@ -2575,7 +2575,7 @@
>>  
>>    svn_revnum_t base_revision;
>>    apr_off_t base_offset;
>> -  apr_size_t base_length;
>> +  svn_filesize_t base_length;
>>  };
>>  
>>  /* Read the next line from file FILE and parse it as a text
>> @@ -2636,7 +2636,7 @@
>>    if (! str)
>>      goto error;
>>    SVN_ERR(svn_cstring_atoi64(&val, str));
>> -  rep_args->base_length = (apr_size_t)val;
>> +  rep_args->base_length = (svn_filesize_t)val;
> If base_length is made 64 bits perhaps we should pass it directly to
> svn_cstring_atoi64 and avoid the cast?

Either keep the cast, or remove svn_filesize_t and use apr_int64_t
instead of it.

-- Brane


Re: [BUG+PATCH] Repository "corruption" when committing several versions of a huge file

Posted by Philip Martin <ph...@wandisco.com>.
Martin Buck <mb...@gromit.dyndns.org> writes:

> --- subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c.orig	2011-10-19 19:28:55.000000000 +0200
> +++ subversion-1.7.1/subversion/libsvn_fs_fs/fs_fs.c	2011-11-25 16:53:56.000000000 +0100
> @@ -2575,7 +2575,7 @@
>  
>    svn_revnum_t base_revision;
>    apr_off_t base_offset;
> -  apr_size_t base_length;
> +  svn_filesize_t base_length;
>  };
>  
>  /* Read the next line from file FILE and parse it as a text
> @@ -2636,7 +2636,7 @@
>    if (! str)
>      goto error;
>    SVN_ERR(svn_cstring_atoi64(&val, str));
> -  rep_args->base_length = (apr_size_t)val;
> +  rep_args->base_length = (svn_filesize_t)val;

If base_length is made 64 bits perhaps we should pass it directly to
svn_cstring_atoi64 and avoid the cast?

>  
>    *rep_args_p = rep_args;
>    return SVN_NO_ERROR;

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com