You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by John Szakmeister <jo...@szakmeister.net> on 2003/07/20 15:03:00 UTC

[PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

This patch should resolve Issue #1074.  Currently, there is no test for this 
patch, since I have to find a way to modify the a file's data without 
updating the checksum.

----------------------------------------
Resolves Issue #1074 (Need a svnadmin command to verify that the repository
is not corrupted).

* subversion/libsvn_fs/fs.c
  (svn_fs_verify): Added this high level function to verify the data stored in
  the fs.
  (verify_reps): Helper function.
  (verity_reps_handler): Helper function.
  
* subversion/include/svn_fs.h
  (svn_fs_verify): Added function prototype.

* subversion/libsvn_fs/bdb/reps-table.c
  (svn_fs__bdb_walk_reps): Added this utility function to walk the
  representation table calling a callback function for each row.

* subversion/libsvn_fs/bdb/reps-table.h
  (svn_fs__bdb_walk_reps): Added function prototype.
  (svn_fs__bdb_rep_handler_t): Added this typedef for use with
  svn_fs__bdb_walk_reps.

* subversion/svnadmin/main.c
  (subcommand_verify): Added new function to verify the data stored in the
  repository.


Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by cm...@collab.net.
John Szakmeister <jo...@szakmeister.net> writes:

> I like method 2 as well.  I'll get everything all wrapped up and
> submit a new patch tomorrow.

Cool.  And as your read from the stream, be sure to read in
SVN_STREAM_CHUNK_SIZE-sized chunks.  Why?  Consistency.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by John Szakmeister <jo...@szakmeister.net>.
On Tuesday 22 July 2003 10:13 am, cmpilato@collab.net wrote:
> John Szakmeister <jo...@szakmeister.net> writes:

[snip]

> +1 on using this second method.  (Sorry I didn't suggest it in my
> mail.)  As for the double verification, I see two routes to use:
>
>    1. do the second verification, or
>
>    2. change the docstring of svn_fs__rep_contents_read_stream() to
>       promise that the checksum will be verified as the chunks are
>       read.  that way, you're coding a documented api, which will make
>       it obvious why you didn't yourself do the checksumming.
>
> My choice?  Number 2 again.

I like method 2 as well.  I'll get everything all wrapped up and submit a new 
patch tomorrow.

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by cm...@collab.net.
John Szakmeister <jo...@szakmeister.net> writes:

> 2)  Use svn_fs__rep_contents_read_stream in my rep handler, and have the 
> handler read the data in chunks and compute the checksum.
> 
> I personally like method 2, but using that means that the checksum
> is actually getting checked twice: once inside of the read handler
> used for the stream, and then in my rep handler.

+1 on using this second method.  (Sorry I didn't suggest it in my
mail.)  As for the double verification, I see two routes to use:

   1. do the second verification, or

   2. change the docstring of svn_fs__rep_contents_read_stream() to
      promise that the checksum will be verified as the chunks are
      read.  that way, you're coding a documented api, which will make
      it obvious why you didn't yourself do the checksumming.

My choice?  Number 2 again.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by John Szakmeister <jo...@szakmeister.net>.
Just a bit of an update... I went through and took a look at who is calling 
svn_fs__rep_contents:
    dag.c:
        get_dir_entries()
        set_entry()
        svn_fs__dag_get_proplist()
        delete_entry()

It appears that only functions that were creating entries lists were calling 
svn_fs__rep_contents... which makes me feel bad that I tried to use it in the 
first place. :-(

So I see two solutions to this patch:

1)  Add a function to rep-strings (like cmpilato originally suggested :-) 
whose sole purpose is to verify the rep data by reading it in chunks and 
updating the checksum.

or

2)  Use svn_fs__rep_contents_read_stream in my rep handler, and have the 
handler read the data in chunks and compute the checksum.

I personally like method 2, but using that means that the checksum is actually 
getting checked twice: once inside of the read handler used for the stream, 
and then in my rep handler.

What's the general feeling on this?

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by John Szakmeister <jo...@szakmeister.net>.
On Tuesday 22 July 2003 12:49 am, cmpilato@collab.net wrote:
> John Szakmeister <jo...@szakmeister.net> writes:
> > Resolves Issue #1074 (Need a svnadmin command to verify that the
> > repository is not corrupted).
>
> [...]
>
> Nice log message.  That's a good start! :-)

Thanks!

>
> > +/* Verify */
> > +
> > +/** Verify the filesystem data associated with @ fs. */
> > +svn_error_t *
> > +svn_fs_verify(svn_fs_t *fs,
> > +              apr_pool_t *pool);
> > +
> > +
> > +
>
> Couple of comments.  I notice you used a weird mix of coding styles
> through your patch (sometimes doing "fn(args)", sometimes "fn
> (args)").  There are lines that won't fit on an 80-column screen, and
> some minor alignment issues.  Please stick with the coding style
> already used in a given module.  Sounds picky, I know, but it really
> does make life easier for everyone.

Not picky at all... I certainly understand the need to stay within a set of 
guidelines.  I tried to be consistent within a given module, but it appears 
something slipped through the cracks. :-)  I'll go back and take a look at 
the formatting again.

> Now, about that docstring for svn_fs_verify().  Seems a little
> lightweight for a module API docstring, but more than that -- it's
> kinda wrong.  This function doesn't actually verify all the filesystem
> data.  It only verifies the file contents, file and directory property
> lists, and directory entries lists.

Honestly, I felt the same way when naming it.  At the time nothing better came 
to mind, so I stuck with it.

> How about if we call the function svn_fs_verify_contents(), and
> compose a slightly more revealing docstring for it?
>
> > +svn_error_t *svn_fs__bdb_walk_reps (svn_fs_t *fs,
> > +                                    svn_fs__bdb_rep_handler_t
> > rep_handler, +                                    trail_t *trail)
> > +{
>
> [...]
>
> > +      /* Turn the key into a null-terminated string to be used by the
> > handler */ +      rep_key = svn_string_ncreate(key.data, key.size,
> > subpool); +
> > +      /* Run the handler */
> > +      error = (*rep_handler)(fs, rep_key->data, trail);
>
> We don't really need to build a full string structure here.  Consider
> using apr_pstrmemdup() to make a regular 'const char *'.

I actually hunted through svn looking for something that would do exactly 
that... didn't even think to look at APR.  I'll change this too.

> > +/* Verifying data integrity  */
> > +
> > +/* A simple callback function used when walking the reps */
> > +svn_error_t *
> > +verify_reps_handler (svn_fs_t *fs,
> > +                     const char *rep_key,
> > +                     trail_t *trail)
> > +{
> > +  svn_string_t str;
> > +
> > +  SVN_ERR (svn_fs__rep_contents (&str, fs, rep_key, trail));
> > +
> > +  return SVN_NO_ERROR;
> > +}
>
> You know, I'm really thinking it's not such a good idea to hold the
> entire contents of a file in memory.  I'm thinking about those
> gigabyte-sized files some folks are putting into their repositories...
>
> Perhaps we should reconsider this part of the verification plan, hm? :-)

Agreed.  I remember looking at that function the first time and thinking the 
same thing.  Any ideas?

> > +
> > +/* A simple callback function used when initiating the
> > +   walking of the repository */
> > +svn_error_t *
> > +verify_reps (void *baton,
> > +             trail_t *trail)
> > +{
> > +  svn_fs_t *fs = baton;
> > +
> > +  SVN_ERR (svn_fs__bdb_walk_reps(fs, verify_reps_handler, trail));
> > +
> > +  return SVN_NO_ERROR;
> > +}
>
> We have a longstanding convention of naming this svn_fs__retry_txn()
> callbacks 'txn_body_function_name' -- so, in this case,
> txn_body_verify_reps().

Will fix this as well.

Thanks for the helpful comments.  I'll work the changes in (except the whole 
reading the entire contents of a file in) and re-post the patch.  Meanwhile, 
if you've got an idea on how you would like to see this problem of storing 
the entire contents of a file in memory, I'll get started on a fix.

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
Hey,

>Now, about that docstring for svn_fs_verify().  Seems a little
>lightweight for a module API docstring, but more than that -- it's
>kinda wrong.  This function doesn't actually verify all the filesystem
>data.  It only verifies the file contents, file and directory property
>lists, and directory entries lists.
>
>How about if we call the function svn_fs_verify_contents(), and
>compose a slightly more revealing docstring for it?
>  
>

How about leaving it svn_fs_verify()  and adding an argument like 
verification level.

SVN_FS_V_EXHAUSTIVE
SVN_FS_V_FILE_CONTENT
SVN_FS_V_ANCESTRY
....
SVN_FS_V_WHY_BOTHER

Having verification levels to be implemented at FS implementors discretion.

gat


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by cm...@collab.net.
John Szakmeister <jo...@szakmeister.net> writes:

> Resolves Issue #1074 (Need a svnadmin command to verify that the repository
> is not corrupted).

[...]

Nice log message.  That's a good start! :-)

> +/* Verify */
> +
> +/** Verify the filesystem data associated with @ fs. */
> +svn_error_t *
> +svn_fs_verify(svn_fs_t *fs,
> +              apr_pool_t *pool);
> +
> +
> +

Couple of comments.  I notice you used a weird mix of coding styles
through your patch (sometimes doing "fn(args)", sometimes "fn
(args)").  There are lines that won't fit on an 80-column screen, and
some minor alignment issues.  Please stick with the coding style
already used in a given module.  Sounds picky, I know, but it really
does make life easier for everyone.

Now, about that docstring for svn_fs_verify().  Seems a little
lightweight for a module API docstring, but more than that -- it's
kinda wrong.  This function doesn't actually verify all the filesystem
data.  It only verifies the file contents, file and directory property
lists, and directory entries lists.

How about if we call the function svn_fs_verify_contents(), and
compose a slightly more revealing docstring for it?

> +svn_error_t *svn_fs__bdb_walk_reps (svn_fs_t *fs,
> +                                    svn_fs__bdb_rep_handler_t rep_handler,
> +                                    trail_t *trail)
> +{

[...]

> +      /* Turn the key into a null-terminated string to be used by the handler */
> +      rep_key = svn_string_ncreate(key.data, key.size, subpool);
> +      
> +      /* Run the handler */
> +      error = (*rep_handler)(fs, rep_key->data, trail);

We don't really need to build a full string structure here.  Consider
using apr_pstrmemdup() to make a regular 'const char *'.

> +/* Verifying data integrity  */
> +
> +/* A simple callback function used when walking the reps */
> +svn_error_t *
> +verify_reps_handler (svn_fs_t *fs,
> +                     const char *rep_key,
> +                     trail_t *trail)
> +{
> +  svn_string_t str;
> +
> +  SVN_ERR (svn_fs__rep_contents (&str, fs, rep_key, trail));
> +
> +  return SVN_NO_ERROR;
> +}

You know, I'm really thinking it's not such a good idea to hold the
entire contents of a file in memory.  I'm thinking about those
gigabyte-sized files some folks are putting into their repositories...

Perhaps we should reconsider this part of the verification plan, hm? :-)

> +
> +/* A simple callback function used when initiating the
> +   walking of the repository */
> +svn_error_t *
> +verify_reps (void *baton,
> +             trail_t *trail)
> +{
> +  svn_fs_t *fs = baton;
> +
> +  SVN_ERR (svn_fs__bdb_walk_reps(fs, verify_reps_handler, trail));
> +
> +  return SVN_NO_ERROR;
> +}

We have a longstanding convention of naming this svn_fs__retry_txn()
callbacks 'txn_body_function_name' -- so, in this case,
txn_body_verify_reps().

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by John Szakmeister <jo...@szakmeister.net>.
It was made against r6495.  Guess I should probably get it going with HEAD 
though. :-)

Look for a new one sometime tomorrow or Tuesday.

-John

On Sunday 20 July 2003 03:55 pm, David Summers wrote:
> I tried to patch this in to subversion head (r6521) and it didn't patch in
> cleanly.  Which version was this relative to?
>
>    Thanks!
>    - David
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by David Summers <da...@summersoft.fay.ar.us>.
Yikes!  Ok, I'll rummage around here and see what I can find out.

Sorry for the false alarm, I've probably done something silly.

   - David

On Sun, 20 Jul 2003, John Szakmeister wrote:

> Just to check and see, I copied off my working directory into a temp 
> directory, reverted the changes, and updated.  Then I ran patch -u 
> <../issue1074.patch.txt from the toplevel directory and everything made it in 
> just fine.
> 
> I also updated my real working copy, and I didn't get any conflicts.  I don't 
> know what else to say except that I'm using FreeBSD 5.1 and Patch 2.1.
> 
> -John
> 
> On Sunday 20 July 2003 03:55 pm, David Summers wrote:
> > I tried to patch this in to subversion head (r6521) and it didn't patch in
> > cleanly.  Which version was this relative to?
> >
> >    Thanks!
> >    - David
> 
> 

-- 
David Wayne Summers          "Linux: Because reboots are for hardware upgrades!"
david@summersoft.fay.ar.us   PGP Key: http://summersoft.fay.ar.us/~david/pgp.txt
PGP Key fingerprint =  C0 E0 4F 50 DD A9 B6 2B  60 A1 31 7E D2 28 6D A8 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by John Szakmeister <jo...@szakmeister.net>.
Just to check and see, I copied off my working directory into a temp 
directory, reverted the changes, and updated.  Then I ran patch -u 
<../issue1074.patch.txt from the toplevel directory and everything made it in 
just fine.

I also updated my real working copy, and I didn't get any conflicts.  I don't 
know what else to say except that I'm using FreeBSD 5.1 and Patch 2.1.

-John

On Sunday 20 July 2003 03:55 pm, David Summers wrote:
> I tried to patch this in to subversion head (r6521) and it didn't patch in
> cleanly.  Which version was this relative to?
>
>    Thanks!
>    - David


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Issue #1074: Need a svnadmin command to verify that the repository is not corrupted

Posted by David Summers <da...@summersoft.fay.ar.us>.
I tried to patch this in to subversion head (r6521) and it didn't patch in
cleanly.  Which version was this relative to?

   Thanks!
   - David

On Sun, 20 Jul 2003, John Szakmeister wrote:

> This patch should resolve Issue #1074.  Currently, there is no test for this 
> patch, since I have to find a way to modify the a file's data without 
> updating the checksum.
> 
> ----------------------------------------
> Resolves Issue #1074 (Need a svnadmin command to verify that the repository
> is not corrupted).
> 
> * subversion/libsvn_fs/fs.c
>   (svn_fs_verify): Added this high level function to verify the data stored in
>   the fs.
>   (verify_reps): Helper function.
>   (verity_reps_handler): Helper function.
>   
> * subversion/include/svn_fs.h
>   (svn_fs_verify): Added function prototype.
> 
> * subversion/libsvn_fs/bdb/reps-table.c
>   (svn_fs__bdb_walk_reps): Added this utility function to walk the
>   representation table calling a callback function for each row.
> 
> * subversion/libsvn_fs/bdb/reps-table.h
>   (svn_fs__bdb_walk_reps): Added function prototype.
>   (svn_fs__bdb_rep_handler_t): Added this typedef for use with
>   svn_fs__bdb_walk_reps.
> 
> * subversion/svnadmin/main.c
>   (subcommand_verify): Added new function to verify the data stored in the
>   repository.
> 
> 

-- 
David Wayne Summers          "Linux: Because reboots are for hardware upgrades!"
david@summersoft.fay.ar.us   PGP Key: http://summersoft.fay.ar.us/~david/pgp.txt
PGP Key fingerprint =  C0 E0 4F 50 DD A9 B6 2B  60 A1 31 7E D2 28 6D A8 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org