You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2012/10/13 16:17:43 UTC

Re: r1397773 - rep sharing in a txn - /subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c

> Author: stefan2

> Date: Sat Oct 13 05:49:50 2012
> New Revision: 1397773
> 
> URL: http://svn.apache.org/viewvc?rev=1397773&view=rev
> Log:
> Due to public request: apply rep-sharing to equal data reps within
> the same transaction. 
> 
> The idea is simple. When writing a noderev to the txn folder,
> write another file named by the rep's SHA1 and store the rep
> struct in there. Lookup is then straight-forward.

Hi Stefan.  What's the scalability?  I'm wondering about the big-O performance of storing 10000 or 100000 files in the dir.

- Julian


> * subversion/libsvn_fs_fs/fs_fs.c
>   (svn_fs_fs__put_node_revision): also look for SHA1-named files
>   (get_shared_rep): write SHA1-named files
> 
> Modified:
>     subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> 
> Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> URL: 
> http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1397773&r1=1397772&r2=1397773&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
> +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Sat Oct 13 05:49:50 2012
> @@ -2585,6 +2585,10 @@ svn_fs_fs__put_node_revision(svn_fs_t *f
>    fs_fs_data_t *ffd = fs->fsap_data;
>    apr_file_t *noderev_file;
>    const char *txn_id = svn_fs_fs__id_txn_id(id);
> +  const char *sha1 = ffd->rep_sharing_allowed && 
> noderev->data_rep
> +                   ? 
> svn_checksum_to_cstring(noderev->data_rep->sha1_checksum,
> +                                             pool)
> +                   : NULL;
> 
>    noderev->is_fresh_txn_root = fresh_txn_root;
> 
> @@ -2603,7 +2607,32 @@ svn_fs_fs__put_node_revision(svn_fs_t *f
>                                     svn_fs_fs__fs_supports_mergeinfo(fs),
>                                     pool));
> 
> -  return svn_io_file_close(noderev_file, pool);
> +  SVN_ERR(svn_io_file_close(noderev_file, pool));
> +
> +  /* if rep sharing has been enabled and the noderev has a data rep and
> +   * its SHA-1 is known, store the rep struct under its SHA1. */
> +  if (sha1)
> +    {
> +      apr_file_t *rep_file;
> +      const char *file_name = svn_dirent_join(path_txn_dir(fs, txn_id, pool),
> +                                              sha1, pool);
> +      const char *rep_string = representation_string(noderev->data_rep,
> +                                                     ffd->format,
> +                                                     (noderev->kind
> +                                                      == svn_node_dir),
> +                                                     FALSE,
> +                                                     pool);
> +      SVN_ERR(svn_io_file_open(&rep_file, file_name,
> +                               APR_WRITE | APR_CREATE | APR_TRUNCATE
> +                               | APR_BUFFERED, APR_OS_DEFAULT, pool));
> +
> +      SVN_ERR(svn_io_file_write_full(rep_file, rep_string,
> +                                     strlen(rep_string), NULL, pool));
> +
> +      SVN_ERR(svn_io_file_close(rep_file, pool));
> +    }
> +
> +  return SVN_NO_ERROR;
> }
> 
> 
> @@ -7083,6 +7112,30 @@ get_shared_rep(representation_t **old_re
>          }
>      }
> 
> +  /* look for intra-revision matches (usually data reps but not limited
> +     to them in case props happen to look like some data rep)
> +   */
> +  if (*old_rep == NULL && rep->txn_id)
> +    {
> +      svn_node_kind_t kind;
> +      const char *file_name
> +        = svn_dirent_join(path_txn_dir(fs, rep->txn_id, pool),
> +                          svn_checksum_to_cstring(rep->sha1_checksum, pool),
> +                          pool);
> +
> +      /* in our txn, is there a rep file named with the wanted SHA1?
> +         If so, read it and use that rep.
> +       */
> +      SVN_ERR(svn_io_check_path(file_name, &kind, pool));
> +      if (kind == svn_node_file)
> +        {
> +          svn_stringbuf_t *rep_string;
> +          SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name, pool));
> +          SVN_ERR(read_rep_offsets_body(old_rep, rep_string->data,
> +                                        rep->txn_id, FALSE, pool));
> +        }
> +    }
> +
>    /* Add information that is missing in the cached data. */
>    if (*old_rep)
>      {
> 

Re: r1397773 - rep sharing in a txn - /subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c

Posted by Branko Čibej <br...@wandisco.com>.
On 13.10.2012 10:17, Julian Foad wrote:
>> Author: stefan2
>> Date: Sat Oct 13 05:49:50 2012
>> New Revision: 1397773
>>
>> URL: http://svn.apache.org/viewvc?rev=1397773&view=rev
>> Log:
>> Due to public request: apply rep-sharing to equal data reps within
>> the same transaction. 
>>
>> The idea is simple. When writing a noderev to the txn folder,
>> write another file named by the rep's SHA1 and store the rep
>> struct in there. Lookup is then straight-forward.
> Hi Stefan.  What's the scalability?  I'm wondering about the big-O performance of storing 10000 or 100000 files in the dir.

Yes, I was wondering about the same thing. The short answer is that it
depends on the underlying filesystem, and that hardly seems good enough.
At the very least, these files stored in shards if the filesystem itself
is sharded.

-- Brane


-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: r1397773 - rep sharing in a txn - /subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c

Posted by Julian Foad <ju...@btopenworld.com>.
Stefan Fuhrmann wrote:

> Julian Foad wrote:
>>> The idea is simple. When writing a noderev to the txn folder,
>>> write another file named by the rep's SHA1 and store the rep
>>> struct in there. Lookup is then straight-forward.
>>
>> Hi Stefan.  What's the scalability?  I'm wondering about the
>> big-O performance of storing 10000 or 100000 files in the dir.
> 
> Yes, I don't like putting that many files into a single folder either.
> However, we do that already today by creating a file for each
> noderev. I.e. my change at worst doubled the number of files in
> the txn folder.

OK; in that case I'm satisfied that if there is a scalability problem then it was already a problem before this change.

- Julian


Re: r1397773 - rep sharing in a txn - /subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sat, Oct 13, 2012 at 4:17 PM, Julian Foad <ju...@btopenworld.com>wrote:

> > Author: stefan2
>
> > Date: Sat Oct 13 05:49:50 2012
> > New Revision: 1397773
> >
> > URL: http://svn.apache.org/viewvc?rev=1397773&view=rev
> > Log:
> > Due to public request: apply rep-sharing to equal data reps within
> > the same transaction.
> >
> > The idea is simple. When writing a noderev to the txn folder,
> > write another file named by the rep's SHA1 and store the rep
> > struct in there. Lookup is then straight-forward.
>
> Hi Stefan.  What's the scalability?  I'm wondering about the big-O
> performance of storing 10000 or 100000 files in the dir.
>

Yes, I don't like putting that many files into a single folder either.
However, we do that already today by creating a file for each
noderev. I.e. my change at worst doubled the number of files in
the txn folder.

-- Stefan^2.

-- 
*

Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
 for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*