You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2017/08/30 05:07:21 UTC

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Good morning Evgeny,

kotkov@apache.org wrote on Thu, 27 Jul 2017 09:00 +0000:
> +++ subversion/trunk/subversion/include/svn_delta.h Thu Jul 27 09:00:43 2017
> @@ -678,6 +690,9 @@ svn_txdelta_skip_svndiff_window(apr_file
>   * @{
>   */
>  
> +/* Forward declarations. */
> +typedef struct svn_delta_editor_t svn_delta_editor_t;
> +
>  /** A structure full of callback functions the delta source will invoke
>   * as it produces the delta.
>   *
> @@ -859,7 +874,7 @@ svn_txdelta_skip_svndiff_window(apr_file
>   * dead; the only further operation which may be called on the editor
>   * is @c abort_edit.
>   */
> -typedef struct svn_delta_editor_t
> +struct svn_delta_editor_t
>  {
>    /** Set the target revision for this edit to @a target_revision.  This
>     * call, if used, should precede all other editor calls.
> @@ -1131,9 +1146,38 @@ typedef struct svn_delta_editor_t
>    svn_error_t *(*abort_edit)(void *edit_baton,
>                               apr_pool_t *scratch_pool);
>  
> +  /** Apply a text delta stream, yielding the new revision of a file.
> +   *
> +   * @a file_baton indicates the file we're creating or updating, and the
> +   * ancestor file on which it is based; it is the baton set by some
> +   * prior @c add_file or @c open_file callback.
> +   *
> +   * @a open_func is a function that opens a #svn_txdelta_stream_t object.
> +   * @a open_baton is provided by the caller.
> +   *
> +   * @a base_checksum is the hex MD5 digest for the base text against
> +   * which the delta is being applied; it is ignored if NULL, and may
> +   * be ignored even if not NULL.  If it is not ignored, it must match

What's the rationale for allowing drivees to ignore the checksum?

This leeway enables failure modes that wouldn't be possible without it.
(Drivers that are aware of this leeway will validate checksums even if the
drivee doesn't, leading to duplicate work; drivers that are unaware of this
requirement might not get checksum errors they should have.)

I get that you just copied this part of the docstring from apply_textdelta(),
but I'd like to understand what's the rationale here.  (And to see if this
leeway should be deprecated)

> +   * the checksum of the base text against which svndiff data is being
> +   * applied; if it does not, @c apply_textdelta_stream call which detects
> +   * the mismatch will return the error SVN_ERR_CHECKSUM_MISMATCH
> +   * (if there is no base text, there may still be an error if
> +   * @a base_checksum is neither NULL nor the hex MD5 checksum of the
> +   * empty string).

To the best of my knowledge, we don't special case the empty string's checksum,
d41d8cd98f00b204e9800998ecf8427e, anywhere.  We do special-case
00000000000000000000000000000000, though.  I assume the parenthetical should be
fixed accordingly (both here and in apply_textdelta())?

> +   *
> +   * Any temporary allocations may be performed in @a scratch_pool.

Need to add an @since tag here.

> +   */
> +  svn_error_t *(*apply_textdelta_stream)(

Could you update the docstring of svn_delta_editor_t itself to mention this new
callback?  All other callbacks are discussed there.

>

Is adding this function an ABI-compatible change?  The docstring of
svn_delta_editor_t does say """

 * @note Don't try to allocate one of these yourself.  Instead, always
 * use svn_delta_default_editor() or some other constructor, to ensure
 * that unused slots are filled in with no-op functions.

""", but an API consumer might have interpreted this note as meaning "You may
use malloc(..., sizeof(svn_delta_editor_t)) if you take care to initialize all
struct members", in which case, his code will not be ABI compatible with 1.10.

> +    const svn_delta_editor_t *editor,

This parameter is undocumented.

Cheers,

Daniel

> +    void *file_baton,
> +    const char *base_checksum,
> +    svn_txdelta_stream_open_func_t open_func,
> +    void *open_baton,
> +    apr_pool_t *scratch_pool);
> +
>    /* Be sure to update svn_delta_get_cancellation_editor() and
>     * svn_delta_default_editor() if you add a new callback here. */
> -} svn_delta_editor_t;
> +};
>  
>  
>  /** Return a default delta editor template, allocated in @a pool.
> 

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by James McCoy <ja...@jamessan.com>.
On Sat, Sep 02, 2017 at 08:18:00AM +0000, Daniel Shahaf wrote:
> [replying to multiple]
> 
> James McCoy wrote on Fri, Sep 01, 2017 at 08:15:05 -0400:
> > On Fri, Sep 01, 2017 at 03:07:16AM +0000, Daniel Shahaf wrote:
> > > I'm not sure I'm getting through here.
> > > 
> > > The note does say "Don't allocate one of these yourself" but in the
> > > second sentence implies that the reason for this is that if you allocate
> > > it yourself and don't initialize all pointer-to-function members,
> > > Trouble will ensue.  Therefore, the note could be construed as meaning
> > > that it is okay to allocate one of these yourself if you take care to
> > > initialize all pointer members.
> > 
> > The fact that a user can choose not to use the recommended API doesn't
> > mean that there is an ABI break.  A user can already choose to allocate
> > the structure themselves and not initialize everything.  Adding an item
> > to the struct doesn't change that.
> > 
> 
> I'm afraid I don't follow.  If a user allocates the struct themselves
> and doesn't initialize all members, then (1) they are in violation of
> the API's precondition, (2) their code will (depending on the repository
> data) dereference an uninitialized pointer-to-function at runtime.
> That's undefined behaviour de jure and some form of crash de facto.

Right, but it's not an ABI break.  It's misuing the API.

> > I ran abi-compliance-checker[0] against the head of branches/1.9.x and
> > trunk.  It shows no incompatibility issues.  The report is attached for
> > anyone that wants to view it.
> > 
> > [0]: http://ispras.linuxbase.org/index.php/ABI_compliance_checker
> 
> Thanks for adding this datapoint, but, how does it support your
> argument?  The checker's output confirms what we already knew without
> it: that adding a member to a struct type can result in breakage when
> code compiled against 1.9 is run against 1.10 without rebuilding:

It confirms that there's not an ABI break.  Code that's misuing the API
can break, yes, but not code that's using the API correctly.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Evgeny Kotkov <ev...@visualsvn.com> writes:

> Daniel Shahaf <d....@daniel.shahaf.name> writes:
>
>>> +   *
>>> +   * Any temporary allocations may be performed in @a scratch_pool.
>>
>> Need to add an @since tag here.
>
> [...]
>
>>> +   */
>>> +  svn_error_t *(*apply_textdelta_stream)(
>>
>> Could you update the docstring of svn_delta_editor_t itself to mention this
>> new callback?  All other callbacks are discussed there.
>
> [...]
>
>>> +    const svn_delta_editor_t *editor,
>>
>> This parameter is undocumented.
>
> Thank you for the review.  I agree with these three points, and will try
> to make the suggested tweaks to the documentation.

Committed in r1816063.


Daniel Shahaf <d....@daniel.shahaf.name> writes:

> +1, but note that the "shouldn't" language in current HEAD, which this
> patch [once rebased to HEAD] will remove, was copied from some other
> docstring.  I made no note of which one specifically, because I think
> that "shouldn't" language is our standard formula for docstrings of
> non-opaque struct types.

Committed in r1816066.


Regards,
Evgeny Kotkov

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
> > - * @note Don't try to allocate one of these yourself.  Instead, always
> > - * use svn_delta_default_editor() or some other constructor, to ensure
> > - * that unused slots are filled in with no-op functions.
> > + * @note Fields may be added to the end of this structure in future
> > + * versions.  Therefore, users shouldn't allocate structures of this
> > + * type, to preserve binary compatibility.
> > + *
> > + * @note It is recommended to use svn_delta_default_editor() or some other
> > + * constructor, to ensure that unused slots are filled in with no-op functions.
>
> --- subversion/include/svn_delta.h      (revision 1806549)
> +++ subversion/include/svn_delta.h      (working copy)
> @@ -694,8 +694,10 @@ svn_txdelta_skip_svndiff_window(apr_file_t *file,
>   * as it produces the delta.
>   *
>   * @note Don't try to allocate one of these yourself.  Instead, always
> - * use svn_delta_default_editor() or some other constructor, to ensure
> - * that unused slots are filled in with no-op functions.
> + * use svn_delta_default_editor() or some other constructor, to avoid
> + * backwards compatibility problems if the structure is extended in
> + * future releases and to ensure that unused slots are filled in with
> + * no-op functions.

+1, but note that the "shouldn't" language in current HEAD, which this
patch [once rebased to HEAD] will remove, was copied from some other
docstring.  I made no note of which one specifically, because I think
that "shouldn't" language is our standard formula for docstrings of
non-opaque struct types.

Let's also backport this docs patch to 1.9.8.  (As a docs patch it needs
just one vote)

Cheers,

Daniel

P.S. I still think a release notes entry would be warranted, but not
strongly enough to reopen that angle.

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

>> Also, I am not against the idea of tweaking the note by saying something
>> like "...because we may add new fields in the future", but I don't think
>> that it is required either.  (In other words, I am +0 to that.)
>
> Done in r1807028.

I think that although r1807028 provides the additional information to the
users, it simultaneously makes the API requirements less strict:

> - * @note Don't try to allocate one of these yourself.  Instead, always
> - * use svn_delta_default_editor() or some other constructor, to ensure
> - * that unused slots are filled in with no-op functions.
> + * @note Fields may be added to the end of this structure in future
> + * versions.  Therefore, users shouldn't allocate structures of this
> + * type, to preserve binary compatibility.
> + *
> + * @note It is recommended to use svn_delta_default_editor() or some other
> + * constructor, to ensure that unused slots are filled in with no-op functions.

While it does clarify that new fields can be added to the struct, this
changeset also replaces words like "don't try" and "always" with "shouldn't"
and "is recommended" thus making the requirements a recommendation:

 - "Don't try to allocate one of these yourself" became "users shouldn't
   allocate structures"

 - "Instead, always use svn_delta_default_editor()" is now "It is
   recommended to use svn_delta_default_editor()"

Perhaps, a better way would be to keep the original wording, but mention
that the structure may be extended, as in this alternative patch:

[[[
--- subversion/include/svn_delta.h      (revision 1806549)
+++ subversion/include/svn_delta.h      (working copy)
@@ -694,8 +694,10 @@ svn_txdelta_skip_svndiff_window(apr_file_t *file,
  * as it produces the delta.
  *
  * @note Don't try to allocate one of these yourself.  Instead, always
- * use svn_delta_default_editor() or some other constructor, to ensure
- * that unused slots are filled in with no-op functions.
+ * use svn_delta_default_editor() or some other constructor, to avoid
+ * backwards compatibility problems if the structure is extended in
+ * future releases and to ensure that unused slots are filled in with
+ * no-op functions.
  *
  * <h3>Function Usage</h3>
  *
]]]


Regards,
Evgeny Kotkov

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Sat, Sep 02, 2017 at 08:18:00 +0000:
> However, I find the docstring ambiguous and you do not.

.oO ( Doesn't this _ipso facto_ mean the docstring is ambiguous? )

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Stefan Hett <st...@egosoft.com>.
On 9/1/2017 5:07 AM, Daniel Shahaf wrote:
> Evgeny Kotkov wrote on Thu, 31 Aug 2017 19:05 +0300:
>> Daniel Shahaf <d....@daniel.shahaf.name> writes:
>>> Is adding this function an ABI-compatible change?  The docstring of
>>> svn_delta_editor_t does say """
>>>
>>>   * @note Don't try to allocate one of these yourself.  Instead, always
>>>   * use svn_delta_default_editor() or some other constructor, to ensure
>>>   * that unused slots are filled in with no-op functions.
>>>
>>> """, but an API consumer might have interpreted this note as meaning "You may
>>> use malloc(..., sizeof(svn_delta_editor_t)) if you take care to initialize
>>> all struct members", in which case, his code will not be ABI compatible
>>> with 1.10.
>> I think that adding this callback does not affect the ABI compatibility.
>> The note says "Don't try to allocate one of these yourself", whereas the
>> malloc(..., sizeof(svn_delta_editor_t)) example does exactly the opposite.
> I'm not sure I'm getting through here.
>
> The note does say "Don't allocate one of these yourself" but in the
> second sentence implies that the reason for this is that if you allocate
> it yourself and don't initialize all pointer-to-function members,
> Trouble will ensue.  Therefore, the note could be construed as meaning
> that it is okay to allocate one of these yourself if you take care to
> initialize all pointer members.
>
> In other words: the comment could have been interpreted as a piece
> of advice --- "it is more robust to use _default_editor() than to allocate
> with malloc" --- as opposed to a blanket prohibition on allocating this
> struct type with malloc.
>
> (If one uses malloc and doesn't initialize all pointers, the result
> would be that an uninitialized pointer-to-function struct member is
> dereferenced.)
>
> In contrast, most of our other structs explicitly say that "Don't
> allocate yourself because we may add new fields in the future".  This
> struct does not give that reason.
>
> Makes sense?
>
> I suppose can just add an API erratum and/or release notes entry about this.
>
> Thanks,
>
> Daniel
> (I'll reply to your other points separately)
>
I think your proposal to add an erratum and "correct/update" the 
documentation about allocating the struct is the right way to go here.
IMO this is only a minor inconsistency in the documentation (i.e. it 
should be as clear as the rest of the API documentation with regards to 
what the implications of not using the appropriate allocation methods are).

FWIW: I always read that current note already like this and interpreted 
the sentence as just giving an example of why this is important and what 
might go wrong if you don't follow the advice. But I certainly also see 
that it might be interpreted in another way. So no harm in being precise 
here and adding an erratum, I guess...

-- 
Regards,
Stefan Hett


Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by James McCoy <ja...@jamessan.com>.
On Fri, Sep 01, 2017 at 03:07:16AM +0000, Daniel Shahaf wrote:
> Evgeny Kotkov wrote on Thu, 31 Aug 2017 19:05 +0300:
> > Daniel Shahaf <d....@daniel.shahaf.name> writes:
> > > Is adding this function an ABI-compatible change?  The docstring of
> > > svn_delta_editor_t does say """
> > >
> > >  * @note Don't try to allocate one of these yourself.  Instead, always
> > >  * use svn_delta_default_editor() or some other constructor, to ensure
> > >  * that unused slots are filled in with no-op functions.
> > >
> > > """, but an API consumer might have interpreted this note as meaning "You may
> > > use malloc(..., sizeof(svn_delta_editor_t)) if you take care to initialize
> > > all struct members", in which case, his code will not be ABI compatible
> > > with 1.10.
> > 
> > I think that adding this callback does not affect the ABI compatibility.
> > The note says "Don't try to allocate one of these yourself", whereas the
> > malloc(..., sizeof(svn_delta_editor_t)) example does exactly the opposite.
> 
> I'm not sure I'm getting through here.
> 
> The note does say "Don't allocate one of these yourself" but in the
> second sentence implies that the reason for this is that if you allocate
> it yourself and don't initialize all pointer-to-function members,
> Trouble will ensue.  Therefore, the note could be construed as meaning
> that it is okay to allocate one of these yourself if you take care to
> initialize all pointer members.

The fact that a user can choose not to use the recommended API doesn't
mean that there is an ABI break.  A user can already choose to allocate
the structure themselves and not initialize everything.  Adding an item
to the struct doesn't change that.

I ran abi-compliance-checker[0] against the head of branches/1.9.x and
trunk.  It shows no incompatibility issues.  The report is attached for
anyone that wants to view it.

[0]: http://ispras.linuxbase.org/index.php/ABI_compliance_checker

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> I'm not sure I'm getting through here.
>
> The note does say "Don't allocate one of these yourself" but in the
> second sentence implies that the reason for this is that if you allocate
> it yourself and don't initialize all pointer-to-function members,
> Trouble will ensue.  Therefore, the note could be construed as meaning
> that it is okay to allocate one of these yourself if you take care to
> initialize all pointer members.
>
> In other words: the comment could have been interpreted as a piece
> of advice --- "it is more robust to use _default_editor() than to allocate
> with malloc" --- as opposed to a blanket prohibition on allocating this
> struct type with malloc.
>
> (If one uses malloc and doesn't initialize all pointers, the result
> would be that an uninitialized pointer-to-function struct member is
> dereferenced.)
>
> In contrast, most of our other structs explicitly say that "Don't
> allocate yourself because we may add new fields in the future".  This
> struct does not give that reason.
>
> Makes sense?
>
> I suppose can just add an API erratum and/or release notes entry about this.

I think that the important thing about this documentation is that it
states that:

 (1) The API user shouldn't try to allocate the struct himself.

 (2) A constructor such as svn_delta_default_editor() should *always*
     be used.

To my mind, the current statement is clear and it is not possible to
interpret it as if it's allowed to malloc() the struct and initialize it
manually.

My opinion here is that neither the API erratum nor including this in the
release notes are required, and doing so would just unnecessarily restate
the information that's already available.

Also, I am not against the idea of tweaking the note by saying something
like "...because we may add new fields in the future", but I don't think
that it is required either.  (In other words, I am +0 to that.)


Regards,
Evgeny Kotkov

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Evgeny Kotkov wrote on Thu, 31 Aug 2017 19:05 +0300:
> Daniel Shahaf <d....@daniel.shahaf.name> writes:
> > Is adding this function an ABI-compatible change?  The docstring of
> > svn_delta_editor_t does say """
> >
> >  * @note Don't try to allocate one of these yourself.  Instead, always
> >  * use svn_delta_default_editor() or some other constructor, to ensure
> >  * that unused slots are filled in with no-op functions.
> >
> > """, but an API consumer might have interpreted this note as meaning "You may
> > use malloc(..., sizeof(svn_delta_editor_t)) if you take care to initialize
> > all struct members", in which case, his code will not be ABI compatible
> > with 1.10.
> 
> I think that adding this callback does not affect the ABI compatibility.
> The note says "Don't try to allocate one of these yourself", whereas the
> malloc(..., sizeof(svn_delta_editor_t)) example does exactly the opposite.

I'm not sure I'm getting through here.

The note does say "Don't allocate one of these yourself" but in the
second sentence implies that the reason for this is that if you allocate
it yourself and don't initialize all pointer-to-function members,
Trouble will ensue.  Therefore, the note could be construed as meaning
that it is okay to allocate one of these yourself if you take care to
initialize all pointer members.

In other words: the comment could have been interpreted as a piece
of advice --- "it is more robust to use _default_editor() than to allocate
with malloc" --- as opposed to a blanket prohibition on allocating this
struct type with malloc.

(If one uses malloc and doesn't initialize all pointers, the result
would be that an uninitialized pointer-to-function struct member is
dereferenced.)

In contrast, most of our other structs explicitly say that "Don't
allocate yourself because we may add new fields in the future".  This
struct does not give that reason.

Makes sense?

I suppose can just add an API erratum and/or release notes entry about this.

Thanks,

Daniel
(I'll reply to your other points separately)

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

>> +   *
>> +   * Any temporary allocations may be performed in @a scratch_pool.
>
> Need to add an @since tag here.

[...]

>> +   */
>> +  svn_error_t *(*apply_textdelta_stream)(
>
> Could you update the docstring of svn_delta_editor_t itself to mention this
> new callback?  All other callbacks are discussed there.

[...]

>> +    const svn_delta_editor_t *editor,
>
> This parameter is undocumented.

Thank you for the review.  I agree with these three points, and will try
to make the suggested tweaks to the documentation.

>> +  /** Apply a text delta stream, yielding the new revision of a file.
>> +   *
>> +   * @a file_baton indicates the file we're creating or updating, and the
>> +   * ancestor file on which it is based; it is the baton set by some
>> +   * prior @c add_file or @c open_file callback.
>> +   *
>> +   * @a open_func is a function that opens a #svn_txdelta_stream_t object.
>> +   * @a open_baton is provided by the caller.
>> +   *
>> +   * @a base_checksum is the hex MD5 digest for the base text against
>> +   * which the delta is being applied; it is ignored if NULL, and may
>> +   * be ignored even if not NULL.  If it is not ignored, it must match
>
> What's the rationale for allowing drivees to ignore the checksum?
>
> This leeway enables failure modes that wouldn't be possible without it.
> (Drivers that are aware of this leeway will validate checksums even if the
> drivee doesn't, leading to duplicate work; drivers that are unaware of this
> requirement might not get checksum errors they should have.)
>
> I get that you just copied this part of the docstring from apply_textdelta(),
> but I'd like to understand what's the rationale here.  (And to see if this
> leeway should be deprecated)

My interpretation of the documentation is that "the result checksum must
be validated by the implementation, whereas validating the checksum of the
base text may be omitted".

Perhaps, there are cases where the base checksum won't be validated by
our existing implementations (what about BDB, for instance?), but in the
meanwhile, I'm not too sure about the gain from _always_ requiring such
checks.

I think that, from the editor driver's point of view, the important thing
is the result checksum.  If the implementation also validates the base
checksum, that may allow it to skip actually applying the delta in case
of the early mismatch, give away better error messages ("I have a base
checksum mismatch" instead of "I can't apply instruction 7 at offset
12345") and maybe even detect the coding mistakes which cause the
delta to be applied to an unexpected base, but still yielding the
expected resulting fulltext.

Having all these properties is nice, but probably not mandatory.  And I
think that lifting the optionality of this check could shoot us in the foot
in the future, if we find ourselves writing an implementation where it is
particularly tricky to always implement the base text checks — for instance,
due to the used protocol or any other technical reasons.

Hope that I am not missing something subtle here :)

>> +   * the checksum of the base text against which svndiff data is being
>> +   * applied; if it does not, @c apply_textdelta_stream call which detects
>> +   * the mismatch will return the error SVN_ERR_CHECKSUM_MISMATCH
>> +   * (if there is no base text, there may still be an error if
>> +   * @a base_checksum is neither NULL nor the hex MD5 checksum of the
>> +   * empty string).
>
> To the best of my knowledge, we don't special case the empty string's
> checksum, d41d8cd98f00b204e9800998ecf8427e, anywhere.  We do special-case
> 00000000000000000000000000000000, though.  I assume the parenthetical should
> be fixed accordingly (both here and in apply_textdelta())?

I agree that this part of the docstring (same in svn_fs_apply_textdelta())
looks odd, and I don't think that the digest of an empty string is specially
handled somewhere in the implementations.

Perhaps, it would make sense to check on whether this has been true at
some point in the past and also tweak the docstring.

> Is adding this function an ABI-compatible change?  The docstring of
> svn_delta_editor_t does say """
>
>  * @note Don't try to allocate one of these yourself.  Instead, always
>  * use svn_delta_default_editor() or some other constructor, to ensure
>  * that unused slots are filled in with no-op functions.
>
> """, but an API consumer might have interpreted this note as meaning "You may
> use malloc(..., sizeof(svn_delta_editor_t)) if you take care to initialize
> all struct members", in which case, his code will not be ABI compatible
> with 1.10.

I think that adding this callback does not affect the ABI compatibility.
The note says "Don't try to allocate one of these yourself", whereas the
malloc(..., sizeof(svn_delta_editor_t)) example does exactly the opposite.


Thanks,
Evgeny Kotkov

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

>> +   *
>> +   * Any temporary allocations may be performed in @a scratch_pool.
>
> Need to add an @since tag here.

[...]

>> +   */
>> +  svn_error_t *(*apply_textdelta_stream)(
>
> Could you update the docstring of svn_delta_editor_t itself to mention this
> new callback?  All other callbacks are discussed there.

[...]

>> +    const svn_delta_editor_t *editor,
>
> This parameter is undocumented.

Thank you for the review.  I agree with these three points, and will try
to make the suggested tweaks to the documentation.

>> +  /** Apply a text delta stream, yielding the new revision of a file.
>> +   *
>> +   * @a file_baton indicates the file we're creating or updating, and the
>> +   * ancestor file on which it is based; it is the baton set by some
>> +   * prior @c add_file or @c open_file callback.
>> +   *
>> +   * @a open_func is a function that opens a #svn_txdelta_stream_t object.
>> +   * @a open_baton is provided by the caller.
>> +   *
>> +   * @a base_checksum is the hex MD5 digest for the base text against
>> +   * which the delta is being applied; it is ignored if NULL, and may
>> +   * be ignored even if not NULL.  If it is not ignored, it must match
>
> What's the rationale for allowing drivees to ignore the checksum?
>
> This leeway enables failure modes that wouldn't be possible without it.
> (Drivers that are aware of this leeway will validate checksums even if the
> drivee doesn't, leading to duplicate work; drivers that are unaware of this
> requirement might not get checksum errors they should have.)
>
> I get that you just copied this part of the docstring from apply_textdelta(),
> but I'd like to understand what's the rationale here.  (And to see if this
> leeway should be deprecated)

My interpretation of the documentation is that "the result checksum must
be validated by the implementation, whereas validating the checksum of the
base text may be omitted".

Perhaps, there are cases where the base checksum won't be validated by
our existing implementations (what about BDB, for instance?), but in the
meanwhile, I'm not too sure about the gain from _always_ requiring such
checks.

I think that, from the editor driver's point of view, the important thing
is the result checksum.  If the implementation also validates the base
checksum, that may allow it to skip actually applying the delta in case
of the early mismatch, give away better error messages ("I have a base
checksum mismatch" instead of "I can't apply instruction 7 at offset
12345") and maybe even detect the coding mistakes which cause the
delta to be applied to an unexpected base, but still yielding the
expected resulting fulltext.

Having all these properties is nice, but probably not mandatory.  And I
think that lifting the optionality of this check could shoot us in the foot
in the future, if we find ourselves writing an implementation where it is
particularly tricky to always implement the base text checks — for instance,
due to the used protocol or any other technical reasons.

Hope that I am not missing something subtle here :)

>> +   * the checksum of the base text against which svndiff data is being
>> +   * applied; if it does not, @c apply_textdelta_stream call which detects
>> +   * the mismatch will return the error SVN_ERR_CHECKSUM_MISMATCH
>> +   * (if there is no base text, there may still be an error if
>> +   * @a base_checksum is neither NULL nor the hex MD5 checksum of the
>> +   * empty string).
>
> To the best of my knowledge, we don't special case the empty string's
> checksum, d41d8cd98f00b204e9800998ecf8427e, anywhere.  We do special-case
> 00000000000000000000000000000000, though.  I assume the parenthetical should
> be fixed accordingly (both here and in apply_textdelta())?

I agree that this part of the docstring (same in svn_fs_apply_textdelta())
looks odd, and I don't think that the digest of an empty string is specially
handled somewhere in the implementations.

Perhaps, it would make sense to check on whether this has been true at
some point in the past and also tweak the docstring.

> Is adding this function an ABI-compatible change?  The docstring of
> svn_delta_editor_t does say """
>
>  * @note Don't try to allocate one of these yourself.  Instead, always
>  * use svn_delta_default_editor() or some other constructor, to ensure
>  * that unused slots are filled in with no-op functions.
>
> """, but an API consumer might have interpreted this note as meaning "You may
> use malloc(..., sizeof(svn_delta_editor_t)) if you take care to initialize
> all struct members", in which case, his code will not be ABI compatible
> with 1.10.

I think that adding this callback does not affect the ABI compatibility.
The note says "Don't try to allocate one of these yourself", whereas the
malloc(..., sizeof(svn_delta_editor_t)) example does exactly the opposite.


Thanks,
Evgeny Kotkov

Re: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Bert Huijben wrote on Wed, 30 Aug 2017 12:24 +0200:
> I think this just documents current behavior. Yes a 1.9+ client
> against a 1.9+ server will always have a checksum, but this is not the
> case when mixing older clients and servers.
>
> Original serf versions (form before we declared this stable) typically
> never provided the checksum. And in some cases bulk requests didn't
> have all the checksums either. I remember fixing a few cases around
> WC-NG to make sure all ra layers reported the same errors in some
> exceptional cases.

Thanks for the clarification, Bert.  It sounds like the various commit
editors (libsvn_ra's and libsvn_repos's, at least) should document that
they do verify checksums --- I didn't check whether they already
document that --- and the svn_delta_editor_t docs should be updated to
state that the leeway for receivers not to verify checksums is deprecated.


RE: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Daniel Shahaf [mailto:d.s@daniel.shahaf.name]
> Sent: woensdag 30 augustus 2017 07:07
> To: kotkov@apache.org
> Cc: dev@subversion.apache.org; commits@subversion.apache.org
> Subject: Re: svn commit: r1803143 - in /subversion/trunk/subversion:
> include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/
> 
> Good morning Evgeny,
> 
> kotkov@apache.org wrote on Thu, 27 Jul 2017 09:00 +0000:
> > +++ subversion/trunk/subversion/include/svn_delta.h Thu Jul 27 09:00:43
> 2017
> > @@ -678,6 +690,9 @@ svn_txdelta_skip_svndiff_window(apr_file
> >   * @{
> >   */
> >
> > +/* Forward declarations. */
> > +typedef struct svn_delta_editor_t svn_delta_editor_t;
> > +
> >  /** A structure full of callback functions the delta source will invoke
> >   * as it produces the delta.
> >   *
> > @@ -859,7 +874,7 @@ svn_txdelta_skip_svndiff_window(apr_file
> >   * dead; the only further operation which may be called on the editor
> >   * is @c abort_edit.
> >   */
> > -typedef struct svn_delta_editor_t
> > +struct svn_delta_editor_t
> >  {
> >    /** Set the target revision for this edit to @a target_revision.  This
> >     * call, if used, should precede all other editor calls.
> > @@ -1131,9 +1146,38 @@ typedef struct svn_delta_editor_t
> >    svn_error_t *(*abort_edit)(void *edit_baton,
> >                               apr_pool_t *scratch_pool);
> >
> > +  /** Apply a text delta stream, yielding the new revision of a file.
> > +   *
> > +   * @a file_baton indicates the file we're creating or updating, and the
> > +   * ancestor file on which it is based; it is the baton set by some
> > +   * prior @c add_file or @c open_file callback.
> > +   *
> > +   * @a open_func is a function that opens a #svn_txdelta_stream_t
> object.
> > +   * @a open_baton is provided by the caller.
> > +   *
> > +   * @a base_checksum is the hex MD5 digest for the base text against
> > +   * which the delta is being applied; it is ignored if NULL, and may
> > +   * be ignored even if not NULL.  If it is not ignored, it must match
> 
> What's the rationale for allowing drivees to ignore the checksum?
> 
> This leeway enables failure modes that wouldn't be possible without it.
> (Drivers that are aware of this leeway will validate checksums even if the
> drivee doesn't, leading to duplicate work; drivers that are unaware of this
> requirement might not get checksum errors they should have.)
> 
> I get that you just copied this part of the docstring from apply_textdelta(),
> but I'd like to understand what's the rationale here.  (And to see if this
> leeway should be deprecated)

I think this just documents current behavior. Yes a 1.9+ client against a 1.9+ server will always have a checksum, but this is not the case when mixing older clients and servers.

Original serf versions (form before we declared this stable) typically never provided the checksum. And in some cases bulk requests didn't have all the checksums either. I remember fixing a few cases around WC-NG to make sure all ra layers reported the same errors in some exceptional cases.

	Bert


RE: svn commit: r1803143 - in /subversion/trunk/subversion: include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Daniel Shahaf [mailto:d.s@daniel.shahaf.name]
> Sent: woensdag 30 augustus 2017 07:07
> To: kotkov@apache.org
> Cc: dev@subversion.apache.org; commits@subversion.apache.org
> Subject: Re: svn commit: r1803143 - in /subversion/trunk/subversion:
> include/ libsvn_delta/ libsvn_ra_serf/ mod_dav_svn/
> 
> Good morning Evgeny,
> 
> kotkov@apache.org wrote on Thu, 27 Jul 2017 09:00 +0000:
> > +++ subversion/trunk/subversion/include/svn_delta.h Thu Jul 27 09:00:43
> 2017
> > @@ -678,6 +690,9 @@ svn_txdelta_skip_svndiff_window(apr_file
> >   * @{
> >   */
> >
> > +/* Forward declarations. */
> > +typedef struct svn_delta_editor_t svn_delta_editor_t;
> > +
> >  /** A structure full of callback functions the delta source will invoke
> >   * as it produces the delta.
> >   *
> > @@ -859,7 +874,7 @@ svn_txdelta_skip_svndiff_window(apr_file
> >   * dead; the only further operation which may be called on the editor
> >   * is @c abort_edit.
> >   */
> > -typedef struct svn_delta_editor_t
> > +struct svn_delta_editor_t
> >  {
> >    /** Set the target revision for this edit to @a target_revision.  This
> >     * call, if used, should precede all other editor calls.
> > @@ -1131,9 +1146,38 @@ typedef struct svn_delta_editor_t
> >    svn_error_t *(*abort_edit)(void *edit_baton,
> >                               apr_pool_t *scratch_pool);
> >
> > +  /** Apply a text delta stream, yielding the new revision of a file.
> > +   *
> > +   * @a file_baton indicates the file we're creating or updating, and the
> > +   * ancestor file on which it is based; it is the baton set by some
> > +   * prior @c add_file or @c open_file callback.
> > +   *
> > +   * @a open_func is a function that opens a #svn_txdelta_stream_t
> object.
> > +   * @a open_baton is provided by the caller.
> > +   *
> > +   * @a base_checksum is the hex MD5 digest for the base text against
> > +   * which the delta is being applied; it is ignored if NULL, and may
> > +   * be ignored even if not NULL.  If it is not ignored, it must match
> 
> What's the rationale for allowing drivees to ignore the checksum?
> 
> This leeway enables failure modes that wouldn't be possible without it.
> (Drivers that are aware of this leeway will validate checksums even if the
> drivee doesn't, leading to duplicate work; drivers that are unaware of this
> requirement might not get checksum errors they should have.)
> 
> I get that you just copied this part of the docstring from apply_textdelta(),
> but I'd like to understand what's the rationale here.  (And to see if this
> leeway should be deprecated)

I think this just documents current behavior. Yes a 1.9+ client against a 1.9+ server will always have a checksum, but this is not the case when mixing older clients and servers.

Original serf versions (form before we declared this stable) typically never provided the checksum. And in some cases bulk requests didn't have all the checksums either. I remember fixing a few cases around WC-NG to make sure all ra layers reported the same errors in some exceptional cases.

	Bert