You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2012/10/17 02:01:57 UTC

Re: svn commit: r1393211 - /subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c

stefan2@apache.org wrote on Tue, Oct 02, 2012 at 22:15:28 -0000:
> Author: stefan2
> Date: Tue Oct  2 22:15:28 2012
> New Revision: 1393211
> 
> URL: http://svn.apache.org/viewvc?rev=1393211&view=rev
> Log:
> Empty representations are relatively frequent (1..2%) in FSFS. 
> However, we can't easily distinghish them from non-empty plain
> representations because for the latter the expanded size value
> will be given as 0.
> 
> This patch explicitly detects the case that a delta rep is
> actually an empty rep and should be cached as such.
> 
> * subversion/libsvn_fs_fs/fs_fs.c
>   (build_rep_list): determine and correct the expanded size
>   (rep_read_get_baton): let the above determine the fulltext length
> 
> Modified:
>     subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> 
> Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1393211&r1=1393210&r2=1393211&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
> +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Tue Oct  2 22:15:28 2012
> @@ -4553,11 +4553,15 @@ set_cached_combined_window(svn_stringbuf
>     ID, and representation REP.
>     Also, set *WINDOW_P to the base window content for *LIST, if it
>     could be found in cache. Otherwise, *LIST will contain the base
> -   representation for the whole delta chain.  */
> +   representation for the whole delta chain.
> +   Finally, return the expanded size of the representation in 
> +   *EXPANDED_SIZE. It will take care of cases where only the on-disk
> +   size is known.  */
>  static svn_error_t *
>  build_rep_list(apr_array_header_t **list,
>                 svn_stringbuf_t **window_p,
>                 struct rep_state **src_state,
> +               svn_filesize_t *expanded_size,
>                 svn_fs_t *fs,
>                 representation_t *first_rep,
>                 apr_pool_t *pool)
> @@ -4572,10 +4576,24 @@ build_rep_list(apr_array_header_t **list
>    *list = apr_array_make(pool, 1, sizeof(struct rep_state *));
>    rep = *first_rep;
>  
> +  /* The value as stored in the data struct.
> +     0 is either for unknown length or actually zero length. */
> +  *expanded_size = first_rep->expanded_size;
>    while (1)
>      {
>        SVN_ERR(create_rep_state(&rs, &rep_args, &last_file,
>                                 &last_revision, &rep, fs, pool));
> +
> +      /* Unknown size or empty representation?
> +         That implies the this being the first iteration.
> +         Usually size equals on-disk size, except for empty,
> +         compressed representations (delta, size = 4).
> +         Please note that for all non-empty deltas have
> +         a 4-byte header _plus_ some data. */
> +      if (*expanded_size == 0)
> +        if (! rep_args->is_delta || first_rep->size != 4)
> +          *expanded_size = first_rep->size;

Is it correct to perform this assignment for delta reps with size!=4 ?

It seems odd that this check is in the loop but uses first_rep.  Should
this check move out of the loop, or use rep.size instead of
first_rep->size?  

> +        
>        SVN_ERR(get_cached_combined_window(window_p, rs, &is_cached, pool));
>        if (is_cached)
>          {
> @@ -4632,12 +4650,16 @@ rep_read_get_baton(struct rep_read_baton
>    b->md5_checksum_ctx = svn_checksum_ctx_create(svn_checksum_md5, pool);
>    b->checksum_finalized = FALSE;
>    b->md5_checksum = svn_checksum_dup(rep->md5_checksum, pool);
> -  b->len = rep->expanded_size ? rep->expanded_size : rep->size;
> +  b->len = rep->expanded_size;
>    b->off = 0;
>    b->fulltext_cache_key = fulltext_cache_key;
>    b->pool = svn_pool_create(pool);
>    b->filehandle_pool = svn_pool_create(pool);
>  
> +  SVN_ERR(build_rep_list(&b->rs_list, &b->base_window,
> +                         &b->src_state, &b->len, fs, rep,
> +                         b->filehandle_pool));
> +
>    if (SVN_IS_VALID_REVNUM(fulltext_cache_key.revision))
>      b->current_fulltext = svn_stringbuf_create_ensure
>                              ((apr_size_t)b->len,
> @@ -4645,10 +4667,6 @@ rep_read_get_baton(struct rep_read_baton
>    else
>      b->current_fulltext = NULL;
>  
> -  SVN_ERR(build_rep_list(&b->rs_list, &b->base_window,
> -                         &b->src_state, fs, rep,
> -                         b->filehandle_pool));
> -
>    /* Save our output baton. */
>    *rb_p = b;
>  
> 
> 

Re: svn commit: r1393211 - /subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Wed, Oct 17, 2012 at 2:01 AM, Daniel Shahaf <d....@daniel.shahaf.name>wrote:

> stefan2@apache.org wrote on Tue, Oct 02, 2012 at 22:15:28 -0000:
> > Author: stefan2
> > Date: Tue Oct  2 22:15:28 2012
> > New Revision: 1393211
> >
> > URL: http://svn.apache.org/viewvc?rev=1393211&view=rev
> > Log:
> > Empty representations are relatively frequent (1..2%) in FSFS.
> > However, we can't easily distinghish them from non-empty plain
> > representations because for the latter the expanded size value
> > will be given as 0.
> >
> > This patch explicitly detects the case that a delta rep is
> > actually an empty rep and should be cached as such.
> >
> > * subversion/libsvn_fs_fs/fs_fs.c
> >   (build_rep_list): determine and correct the expanded size
> >   (rep_read_get_baton): let the above determine the fulltext length
> >
> > Modified:
> >     subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> >
> > Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> > URL:
> http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1393211&r1=1393210&r2=1393211&view=diff
> >
> ==============================================================================
> > --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
> > +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Tue Oct  2 22:15:28
> 2012
> > @@ -4553,11 +4553,15 @@ set_cached_combined_window(svn_stringbuf
> >     ID, and representation REP.
> >     Also, set *WINDOW_P to the base window content for *LIST, if it
> >     could be found in cache. Otherwise, *LIST will contain the base
> > -   representation for the whole delta chain.  */
> > +   representation for the whole delta chain.
> > +   Finally, return the expanded size of the representation in
> > +   *EXPANDED_SIZE. It will take care of cases where only the on-disk
> > +   size is known.  */
> >  static svn_error_t *
> >  build_rep_list(apr_array_header_t **list,
> >                 svn_stringbuf_t **window_p,
> >                 struct rep_state **src_state,
> > +               svn_filesize_t *expanded_size,
> >                 svn_fs_t *fs,
> >                 representation_t *first_rep,
> >                 apr_pool_t *pool)
> > @@ -4572,10 +4576,24 @@ build_rep_list(apr_array_header_t **list
> >    *list = apr_array_make(pool, 1, sizeof(struct rep_state *));
> >    rep = *first_rep;
> >
> > +  /* The value as stored in the data struct.
> > +     0 is either for unknown length or actually zero length. */
> > +  *expanded_size = first_rep->expanded_size;
> >    while (1)
> >      {
> >        SVN_ERR(create_rep_state(&rs, &rep_args, &last_file,
> >                                 &last_revision, &rep, fs, pool));
> > +
> > +      /* Unknown size or empty representation?
> > +         That implies the this being the first iteration.
> > +         Usually size equals on-disk size, except for empty,
> > +         compressed representations (delta, size = 4).
> > +         Please note that for all non-empty deltas have
> > +         a 4-byte header _plus_ some data. */
> > +      if (*expanded_size == 0)
> > +        if (! rep_args->is_delta || first_rep->size != 4)
> > +          *expanded_size = first_rep->size;
>
> Is it correct to perform this assignment for delta reps with size!=4 ?
>

first_rep is a representation instead of a just a single delta window,
i.e. it represents the full content of a node. If the <size> is 0, it is
assumed to be equal to <length> and that seems to have been the
case ever since but the structure description does not mention it.
r1400426 add that info.

So, it is *always* safe to set a 0 <size> to <length> *unless* the
expanded content size is actually 0. For plain data, <length> would
be 0, too making the assignment correct. A compressed empty
representation is exactly 4 bytes long ("SVN\0x1"). Any other
representation is longer.

Since <length> can never be negative, the only way <size> may
get the wrong value is if the representation content is empty but uses
a longer physical representation on disk and we end up with the
expectation of a longer data stream than what we actually get.
That has been the norm in the past but the stream would simply
end after 0 bytes and the content would never be cached in the
fulltext cache. The patch should now correct the <size> info in
(almost?) all cases to make fulltext cache effective for them as well.


> It seems odd that this check is in the loop but uses first_rep.  Should
> this check move out of the loop, or use rep.size instead of
> first_rep->size?
>

We need to read rep_args, i.e. the rep header, to detect whether
the representation is plain or deltified. r1400436 moved that out of
the loop - at the expense of some extra code.

-- Stefan^2.

-- 
*

Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
 for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*