You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@wandisco.com> on 2010/12/20 18:21:59 UTC

[PATCH] WCNG: Enforce that a pristine checksum always references an entry in the pristine store.

I'm working towards cleaning up the pristine store automatically.

Although not strictly necessary, I think it will be helpful if a
pristine text checksum in the DB shall be guaranteed always to reference
a pristine text that is currently in the store.  In support of this
goal, the current patch enables SQLite checking for that constraint (no
effect in sqlite versions < 3.6.19) and adjusts the 1.6-to-1.7 upgrade
code to follow that rule.

I think I only made minor edits since it last passed all tests.

Any feedback before I commit it?  Philip, you probably know the upgrade
code best.  How does this look?

- Julian


Re: [PATCH] WCNG: Enforce that a pristine checksum always references an entry in the pristine store.

Posted by Julian Foad <ju...@wandisco.com>.
Philip Martin wrote:
> Philip Martin <ph...@wandisco.com> writes:
> 
> > Philip Martin <ph...@wandisco.com> writes:
> >
> >>> +        SVN_ERR(svn_io_file_checksum2(&file_info->sha1_checksum, text_base_path,
> >>> +                                      svn_checksum_sha1, result_pool));
> >>
> >> How much memory is allocated when checksumming a file?  That's all going
> >> into the long-lived result pool.  Perhaps we need a svn_stream_checksummed3
> >> that takes two pools?
> >
> > I mean svn_io_file_checksum3.  I was trying to work out how much memory
> > was allocated.
> 
> There is a 16K buffer per checksum, so 32K per file.  We need a scratch
> pool, either explicitly in this function (and copy the checksum to
> result_pool) or in svn_io_file_checksum3.

Thanks, Philip.  I fixed this by using the iterpool in this function and
then copying to result_pool, and fixed up the other two points you
mentioned.

Committed revision 1051475.

- Julian


Re: [PATCH] WCNG: Enforce that a pristine checksum always references an entry in the pristine store.

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> Philip Martin <ph...@wandisco.com> writes:
>
>>> +        SVN_ERR(svn_io_file_checksum2(&file_info->sha1_checksum, text_base_path,
>>> +                                      svn_checksum_sha1, result_pool));
>>
>> How much memory is allocated when checksumming a file?  That's all going
>> into the long-lived result pool.  Perhaps we need a svn_stream_checksummed3
>> that takes two pools?
>
> I mean svn_io_file_checksum3.  I was trying to work out how much memory
> was allocated.

There is a 16K buffer per checksum, so 32K per file.  We need a scratch
pool, either explicitly in this function (and copy the checksum to
result_pool) or in svn_io_file_checksum3.

-- 
Philip

Re: [PATCH] WCNG: Enforce that a pristine checksum always references an entry in the pristine store.

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

>> +        SVN_ERR(svn_io_file_checksum2(&file_info->sha1_checksum, text_base_path,
>> +                                      svn_checksum_sha1, result_pool));
>
> How much memory is allocated when checksumming a file?  That's all going
> into the long-lived result pool.  Perhaps we need a svn_stream_checksummed3
> that takes two pools?

I mean svn_io_file_checksum3.  I was trying to work out how much memory
was allocated.

-- 
Philip

Re: [PATCH] WCNG: Enforce that a pristine checksum always references an entry in the pristine store.

Posted by Philip Martin <ph...@wandisco.com>.
Julian Foad <ju...@wandisco.com> writes:

> @@ -1009,23 +1034,25 @@ migrate_text_bases(const char *dir_abspa
>                 complexity. :)  */
>  
>          /* Gather the two checksums. */
> -        SVN_ERR(svn_io_file_checksum2(&md5_checksum, text_base_path,
> -                                      svn_checksum_md5, iterpool));
> -        SVN_ERR(svn_io_file_checksum2(&sha1_checksum, text_base_path,
> -                                      svn_checksum_sha1, iterpool));
> +        SVN_ERR(svn_io_file_checksum2(&file_info->md5_checksum, text_base_path,
> +                                      svn_checksum_md5, result_pool));
> +        SVN_ERR(svn_io_file_checksum2(&file_info->sha1_checksum, text_base_path,
> +                                      svn_checksum_sha1, result_pool));

How much memory is allocated when checksumming a file?  That's all going
into the long-lived result pool.  Perhaps we need a svn_stream_checksummed3
that takes two pools?

>  
> +   * 1) Read the old 'entries' using the old-format reader.
> +   *
> +   * 2) Create the new DB if it hasn't already been created.
>     *
> -   * 2) Convert wcprop to the wc-ng format
> +   * 3) Use our compatibility code for writing entries to fill out the (new)
> +   *    DB state.  Use the remembered checksums, since an entry has only the
> +   *    MD5 not the SHA1 checksum, and in the case of a revert-base doesn't
> +   *    even have that.
>     *
> -   * 3) Trash old, unused files and subdirs
> +   * 4) Convert wcprop to the wc-ng format
>     *
> -   * ### (fill in other bits as they are implemented)
> +   * 5) Migrate regular properties to the WC-NG DB.
> +   * 
> +   * 6) Trash old, unused files and subdirs.

That's out-of-date, we don't remove the old files for this directory
until the whole working copy is upgraded.

> --- subversion/libsvn_wc/wc-metadata.sql	(revision 1051073)
> +++ subversion/libsvn_wc/wc-metadata.sql	(working copy)
> @@ -167,9 +167,9 @@ CREATE TABLE ACTUAL_NODE (
>             conflicts? Why do we need these in a column to refer to the
>             pristine store? Can't we just parse the checksums from
>             conflict_data as well? */
> -  older_checksum  TEXT,
> -  left_checksum  TEXT,
> -  right_checksum  TEXT,
> +  older_checksum  TEXT REFERENCES PRISTINE (checksum),
> +  left_checksum  TEXT REFERENCES PRISTINE (checksum),
> +  right_checksum  TEXT REFERENCES PRISTINE (checksum),
>  
>    PRIMARY KEY (wc_id, local_relpath)
>    );
> @@ -415,7 +415,7 @@ CREATE TABLE NODES (
>  
>    /* The SHA-1 checksum of the pristine text, if this node is a file and was
>       moved here or copied here, else NULL. */
> -  checksum  TEXT,
> +  checksum  TEXT REFERENCES PRISTINE (checksum),
>  
>    /* for kind==symlink, this specifies the target. */
>    symlink_target  TEXT,

Add a note about REFERENCES PRISTINE to the differences comment in the
format 99 section.

-- 
Philip