You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Neels Janosch Hofmeyr <ne...@elego.de> on 2008/06/03 01:48:48 UTC

[PATCH] issue 1796: defective or malicious client can corrupt repository log messages

There was a discussion on issue 1796 up to a week ago, in which a bunch
of shortcomings of the same kind as 1796 have been revealed:

The subversion server and client do not validate props in places where
they should:
- where the server receives props from a client out there. (#1796)
- where the server reads props from the repository file system.
- where the svn client reads props from a server out there.
(Approval by kfogel)

This patch starts by fixing the specific problems of issue 1796, only:
- where the server receives props from a client out there. (#1796)
, and limited only to the log message prop (SVN_PROP_REVISION_LOG).

More patches, continuing in above list, are to follow.

Also, in the threads about issue 1796 recently, some people asked for a
way to reproduce 1796 without forging their svn client. Note that the C
test included in this patch is a good way to do so. It may be
illustrative to investigate the repository after the test has run, using
current trunk: the corrupt data shows in the repository filesystem.

Also note that this is my first "complex" patch to subversion, so please
feel very free to tell me about anything I could have done better.

Thanks!


[[[
Fix issue #1796: defective or malicious client can corrupt repository
log messages.
Also adding regression test for 1796.

* subversion/include/private/svn_utf_private.h: Add this private header
    file and move the declaration of svn_utf__is_valid from
    libsvn_subr/utf_impl.h here, because this function is needed in
    libsvn_repos.

* subversion/libsvn_subr/utf_impl.h: Include private/svn_utf_private.h.
  (svn_utf__is_valid): Move declaration away to svn_utf_private.h
    because this function is needed in libsvn_repos.
  (svn_utf__last_valid): Add comment to also see svn_utf__is_valid.

* subversion/libsvn_repos/fs-wrap.c(validate_prop): Add two validations
    for SVN_PROP_REVISION_LOG's value. Validate UTF-8 encoding using
    svn_utf__is_valid, and validate consistent LF eol style by looking
    for and rejecting CR (\r) characters.

* subversion/tests/libsvn_repos/repos-test.c
  (prop_validation): Add this regression test for issue 1796, which
    tries to commit two invalid log messages concerning UTF-8 and LF.
  (prop_validation_commit_with_revprop): Add this helper function for
    prop_validation, which runs a commit with a given revprop.

Patch by:  Neels Janosch Hofmeyr <ne...@elego.de>
Review by: Karl Fogel <kf...@red-bean.com>
           Daniel Shahaf <d....@daniel.shahaf.co.il>
           Stefan Sperling <st...@elego.de>
           Branko Cibej <br...@xbc.nu>
Found by:  garrick_olson
]]]

-- 
Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696  mobile: +49 177 2345869  fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194




Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.co.il>.
Daniel Shahaf wrote on Tue, 3 Jun 2008 at 19:41 +0300:
> Neels Janosch Hofmeyr wrote on Tue, 3 Jun 2008 at 16:40 +0200:
> > Stefan Sperling wrote:
> > > On Tue, Jun 03, 2008 at 09:23:02AM +0300, Daniel Shahaf wrote:
> > >> You should list committers by their canonical usernames from HACKING:
> > >> kfogel, danielsh, stsp, etc.  
> > 
> > The examples I found in HACKING are in the style that I wrote them in. I
> > am far happier to use the tigris usernames!
> 
> Stefan added this sentence to HACKING recently:
> 
>     Full and partial committers may be listed similarly, but preferably
>     by their canonical usernames from COMMITTERS (the leftmost column in
>     that file).
> 

I now noticed that Karl changed this sentence a couple of hours ago.

> And I see an example that uses that syntax.  Do you think it could be 
> clearer?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Karl Fogel <kf...@red-bean.com>.
Daniel Shahaf <d....@daniel.shahaf.co.il> writes:
> Stefan added this sentence to HACKING recently:
>
>     Full and partial committers may be listed similarly, but preferably
>     by their canonical usernames from COMMITTERS (the leftmost column in
>     that file).
>
> And I see an example that uses that syntax.  Do you think it could be 
> clearer?

I tweaked that some more in r31566.

> You don't need to record everyone that participated in the discussions;
> that's what list archives are for, link to them.

Yes.  See, for example, r31422, r31320, r31179...

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.co.il>.
Neels Janosch Hofmeyr wrote on Tue, 3 Jun 2008 at 16:40 +0200:
> 
> 
> Stefan Sperling wrote:
> > On Tue, Jun 03, 2008 at 09:23:02AM +0300, Daniel Shahaf wrote:
> >>> Patch by:  Neels Janosch Hofmeyr <ne...@elego.de>
> >>> Review by: Karl Fogel <kf...@red-bean.com>
> >>>            Daniel Shahaf <d....@daniel.shahaf.co.il>
> >>>            Stefan Sperling <st...@elego.de>
> >>>            Branko Čibej <br...@xbc.nu>
> >> You should list committers by their canonical usernames from HACKING:
> >> kfogel, danielsh, stsp, etc.  
> 
> The examples I found in HACKING are in the style that I wrote them in. I
> am far happier to use the tigris usernames!
> 

Stefan added this sentence to HACKING recently:

    Full and partial committers may be listed similarly, but preferably
    by their canonical usernames from COMMITTERS (the leftmost column in
    that file).

And I see an example that uses that syntax.  Do you think it could be 
clearer?

> >> But since you haven't posted a version of
> >> this patch before, it is inappropriate to list all these people as
> >> reviewers: they haven't reviewed this patch. 
> > 
> > Yes, only people who have reviewed this particular patch should
> > be listed. So right now, Daniel is the only one who should be on
> > this list.
> > 
> 
> I see. But how can I credit in a way that shows that a lot of discussion
> has been happening? I have also gotten key information by some guys,
> without which the patch would've been stupid.
> 

You don't need to record everyone that participated in the discussions;
that's what list archives are for, link to them.

> >> If you want to credit
> >> the other people than direct reviewers, a parenthetical might work:
> >>
> >>     Review by: jrandom
> >>     (and here you mention jconstant)
> > 
> > Daniel, I don't understand at all what you mean by the above...
> > What were you trying to say?
> Yeah, I don't get this either.
> 

Rephrase:

If you want to credit someone specifically, but can't credit them using 
the standard fields ("(Review|Suggested|Found) by"), remember that you can 
still credit them in the free-text parts of the log message (such as the 
parenthetical comments).

Beware of inflation, though.  Not every suggestion deserves a log
message entry, for otherwise we'd spend all day bookkeeping, recording
who suggested what...

> Thanks for the great feedback, Daniel! I'll come up with a second
> version of the patch one of these days.
> 
> 

Sure.

Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Neels Janosch Hofmeyr <ne...@elego.de>.

Stefan Sperling wrote:
> On Tue, Jun 03, 2008 at 09:23:02AM +0300, Daniel Shahaf wrote:
>>> Patch by:  Neels Janosch Hofmeyr <ne...@elego.de>
>>> Review by: Karl Fogel <kf...@red-bean.com>
>>>            Daniel Shahaf <d....@daniel.shahaf.co.il>
>>>            Stefan Sperling <st...@elego.de>
>>>            Branko Čibej <br...@xbc.nu>
>> You should list committers by their canonical usernames from HACKING:
>> kfogel, danielsh, stsp, etc.  

The examples I found in HACKING are in the style that I wrote them in. I
am far happier to use the tigris usernames!

>> But since you haven't posted a version of
>> this patch before, it is inappropriate to list all these people as
>> reviewers: they haven't reviewed this patch. 
> 
> Yes, only people who have reviewed this particular patch should
> be listed. So right now, Daniel is the only one who should be on
> this list.
> 

I see. But how can I credit in a way that shows that a lot of discussion
has been happening? I have also gotten key information by some guys,
without which the patch would've been stupid.

>> If you want to credit
>> the other people than direct reviewers, a parenthetical might work:
>>
>>     Review by: jrandom
>>     (and here you mention jconstant)
> 
> Daniel, I don't understand at all what you mean by the above...
> What were you trying to say?
Yeah, I don't get this either.

Thanks for the great feedback, Daniel! I'll come up with a second
version of the patch one of these days.

-- 
Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696  mobile: +49 177 2345869  fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194


Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Jun 03, 2008 at 09:23:02AM +0300, Daniel Shahaf wrote:
> > Patch by:  Neels Janosch Hofmeyr <ne...@elego.de>
> > Review by: Karl Fogel <kf...@red-bean.com>
> >            Daniel Shahaf <d....@daniel.shahaf.co.il>
> >            Stefan Sperling <st...@elego.de>
> >            Branko Čibej <br...@xbc.nu>
> 
> You should list committers by their canonical usernames from HACKING:
> kfogel, danielsh, stsp, etc.  But since you haven't posted a version of
> this patch before, it is inappropriate to list all these people as
> reviewers: they haven't reviewed this patch. 

Yes, only people who have reviewed this particular patch should
be listed. So right now, Daniel is the only one who should be on
this list.

> If you want to credit
> the other people than direct reviewers, a parenthetical might work:
> 
>     Review by: jrandom
>     (and here you mention jconstant)

Daniel, I don't understand at all what you mean by the above...
What were you trying to say?

Stefan

Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.co.il>.
Neels Janosch Hofmeyr wrote on Mon, 9 Jun 2008 at 02:02 +0200:
> Daniel Shahaf wrote:
> > Neels Janosch Hofmeyr wrote on Thu, 5 Jun 2008 at 03:25 +0200:
> >> Daniel Shahaf wrote:
> >>> Hmm, that docstring (and validate_prop()'s docstring) also promise that
> >>> you'll return SVN_ERR_BAD_PROPERTY_VALUE.  So you need to wrap your
> >>> errors by that error.
> >> That's right, so it would suffice to just throw a
> >> SVN_ERR_BAD_PROPERTY_VALUE for both cases of inalid UTF-8 and
> >> inconsistent EOL style, and I don't need a SVN_ERR_INVALID_UTF_8 even if
> >> it was there already. (The function validate_prop directly correlates
> >> with the mentioned comment, so it wouldn't make sense to throw
> >> differentiated errors only to wrap them to a more general error within
> >> the same function.)
> >>
> > 
> > Why do you think that it "wouldn't make sense"?  Do you think this
> > information is useless to callers of these APIs?
> 
> The point is that validate_prop() is currently a private function in
> subversion/libsvn_repos/fs-wrap.c, and that it is only ever used in
> cases where the specification is to return only
> SVN_ERR_BAD_PROPERTY_VALUE. To return more precise errors would mean
> that the callers in fs-wrap.c would *all* wrap code around the
> validate_prop() call that abstracts every error to
> SVN_ERR_BAD_PROPERTY_VALUE.

I assumed it would just be

    if (log message is not valid UTF-8)
        return svn_error_create(SVN_ERR_BAD_PROPERTY_VALUE,
                                svn_error_create(SVN_ERR_EOL_BIKESHED),
                                NULL);

locally in validate_prop().  However,

> It would make more sense (I tried to say) to just make validate_prop()
> return SVN_ERR_BAD_PROPERTY_VALUE, until someone actually needs more
> differentiated error codes.
> 

+1 here.  Let's move on.

Daniel


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Neels Janosch Hofmeyr <ne...@elego.de>.
Daniel Shahaf wrote:
> Neels Janosch Hofmeyr wrote on Thu, 5 Jun 2008 at 03:25 +0200:
>> Daniel Shahaf wrote:
>>> [...]
>>> Please also update svn_repos_fs_change_node_prop's docstring, which
>>> claims that only SVN_PROP_REVISION_DATE is validated.
>>>
>>> Hmm, that docstring (and validate_prop()'s docstring) also promise that
>>> you'll return SVN_ERR_BAD_PROPERTY_VALUE.  So you need to wrap your
>>> errors by that error.
>> That's right, so it would suffice to just throw a
>> SVN_ERR_BAD_PROPERTY_VALUE for both cases of inalid UTF-8 and
>> inconsistent EOL style, and I don't need a SVN_ERR_INVALID_UTF_8 even if
>> it was there already. (The function validate_prop directly correlates
>> with the mentioned comment, so it wouldn't make sense to throw
>> differentiated errors only to wrap them to a more general error within
>> the same function.)
>>
> 
> Why do you think that it "wouldn't make sense"?  Do you think this
> information is useless to callers of these APIs?

The point is that validate_prop() is currently a private function in
subversion/libsvn_repos/fs-wrap.c, and that it is only ever used in
cases where the specification is to return only
SVN_ERR_BAD_PROPERTY_VALUE. To return more precise errors would mean
that the callers in fs-wrap.c would *all* wrap code around the
validate_prop() call that abstracts every error to
SVN_ERR_BAD_PROPERTY_VALUE.
It would make more sense (I tried to say) to just make validate_prop()
return SVN_ERR_BAD_PROPERTY_VALUE, until someone actually needs more
differentiated error codes.

>>>> +static svn_error_t *
>>>> +prop_validation_commit_with_revprop(const char *filename,
>>>> +                                    const void *prop_key,
>>>> +                                    apr_ssize_t prop_klen,
>>>> +                                    const void *prop_val,
>>>> +                                    svn_repos_t *repos,
>>> Why are they "const void *"?
>> ...because the arguments to apr_hash_set() are also const void*. I've
>> changed prop_key to const char*, but left prop_val a const void*, since,
>> technically, it could be something different (depending on prop_klen,
>> see apr_hash_set()). Is that good enough?
>>
> 
> How could it be something different?  You pass revprop_table to
> svn_repos_get_commit_editor5(), which documents that it expects "a hash
> mapping <tt>const char *</tt> property names to @c svn_string_t values".

Hm, that's true. The patch being committed already, I might post another
patch to change that.


>> Hmm, I can just give base_revision a zero and files will still be added
>> to the HEAD, right?
> 
> Files are always added to HEAD, so I don't understand your question.
Question already answered :)

> (Did we grow some reverse-obliterate feature?  Retroactively add files
> to previous revisions?)
No, I'm just a noob asking questions.


>>>> +  /* Test an invalid commit log message: LF */
>>>> +  err = prop_validation_commit_with_revprop
>> [...]
>>>> +  if (err == SVN_NO_ERROR)
>>>> +    return svn_error_create(SVN_ERR_TEST_FAILED, err,
>>>> +                            "Failed to reject a log with inconsistent "
>>>> +                            "line ending style");
>>>> +  else
>>>> +    if (err->apr_err != SVN_ERR_IO_INCONSISTENT_EOL)
>>>> +      return svn_error_create(SVN_ERR_TEST_FAILED, err,
>>>> +                              "Expected SVN_ERR_IO_INCONSISTENT_EOL, "
>>>> +                              "got another error.");
>>>> +  svn_error_clear(err);
>>>> +
>>> So svn_error_clear() runs for two paths: on the "correct" error and on
>>> SVN_NO_ERROR.
>> No, it doesn't... It only runs for the third path, where err->apr_err ==
>> SVN_ERR_IO_INCONSISTENT_EOL, because of the return statements.
>>
> 
> Errr, right.  Hmm, I suppose it could be made clearer, for example
> 
> -  svn_error_clear(err);
> +  else
> +    svn_error_clear(err);
Well, it's used in the current way in dozens of other places in that
file, so I guess it's appropriate to stick with that.

> but it's probably my fault for reviewing the patch before I woke up.
I'd say you did pretty good...

We've got it in upstream, nice work :)

-- 
Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696  mobile: +49 177 2345869  fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194


Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.co.il>.
Neels Janosch Hofmeyr wrote on Thu, 5 Jun 2008 at 03:25 +0200:
> Daniel Shahaf wrote:
> >> +          if (svn_utf__is_valid(value->data, value->len) == FALSE)
> >> +            return svn_error_create
> >> +              (APR_EINVAL, NULL,
> >> +               _("Cannot accept log message because it is not encoded in "
> >> +                 "UTF-8"));
> >> +        
> > 
> > More specific error code (SVN_ERR_INVALID_UTF_8)?
> 
> You mean, I should create a new error code?
> (I can't find SVN_ERR_INVALID_UTF_8 anywhere)
> 

Don't confuse me with the "facts";  I think SVN_ERR_INVALID_UTF_8 is
a better error code than APR_EINVAL here -- and I'll still think
so even if it doesn't exist. :)

> ;)
> kfogel might say: "Every error code got created some day"
> 
> > [...]
> > Please also update svn_repos_fs_change_node_prop's docstring, which
> > claims that only SVN_PROP_REVISION_DATE is validated.
> > 
> > Hmm, that docstring (and validate_prop()'s docstring) also promise that
> > you'll return SVN_ERR_BAD_PROPERTY_VALUE.  So you need to wrap your
> > errors by that error.
> 
> That's right, so it would suffice to just throw a
> SVN_ERR_BAD_PROPERTY_VALUE for both cases of inalid UTF-8 and
> inconsistent EOL style, and I don't need a SVN_ERR_INVALID_UTF_8 even if
> it was there already. (The function validate_prop directly correlates
> with the mentioned comment, so it wouldn't make sense to throw
> differentiated errors only to wrap them to a more general error within
> the same function.)
> 

Why do you think that it "wouldn't make sense"?  Do you think this
information is useless to callers of these APIs?

> 
> 
> >> +static svn_error_t *
> >> +prop_validation_commit_with_revprop(const char *filename,
> >> +                                    const void *prop_key,
> >> +                                    apr_ssize_t prop_klen,
> >> +                                    const void *prop_val,
> >> +                                    svn_repos_t *repos,
> > 
> > Why are they "const void *"?
> 
> ...because the arguments to apr_hash_set() are also const void*. I've
> changed prop_key to const char*, but left prop_val a const void*, since,
> technically, it could be something different (depending on prop_klen,
> see apr_hash_set()). Is that good enough?
> 

How could it be something different?  You pass revprop_table to
svn_repos_get_commit_editor5(), which documents that it expects "a hash
mapping <tt>const char *</tt> property names to @c svn_string_t values".

> 
> >> +  /* Make an arbitrary change and commit using above values... */
> >> +
> >> +  SVN_ERR(svn_repos_get_commit_editor5(&editor, &edit_baton, repos,
> >> +                                       NULL, "file://test", "/",
> >> +                                       revprop_table, NULL, NULL,
> >> +                                       NULL, NULL, pool));
> >> +
> >> +  SVN_ERR(editor->open_root(edit_baton, 1, pool, &root_baton));
> >                                           ^
> > 
> > Should this be 1 or 0?  I'm not familiar with the editor or the C tests
> > infrastructure, but based on the docstrings I think this should be zero?
> > (I understand the code works, but I'd like to understand why.)
> 
> I just copied this from another test and didn't pay attention to this
> parameter.
> Hmm, I can just give base_revision a zero and files will still be added
> to the HEAD, right?
> 

Files are always added to HEAD, so I don't understand your question.
(Did we grow some reverse-obliterate feature?  Retroactively add files
to previous revisions?)

> 
> >> +  /* Test an invalid commit log message: LF */
> >> +  err = prop_validation_commit_with_revprop
> [...]
> >> +  if (err == SVN_NO_ERROR)
> >> +    return svn_error_create(SVN_ERR_TEST_FAILED, err,
> >> +                            "Failed to reject a log with inconsistent "
> >> +                            "line ending style");
> >> +  else
> >> +    if (err->apr_err != SVN_ERR_IO_INCONSISTENT_EOL)
> >> +      return svn_error_create(SVN_ERR_TEST_FAILED, err,
> >> +                              "Expected SVN_ERR_IO_INCONSISTENT_EOL, "
> >> +                              "got another error.");
> >> +  svn_error_clear(err);
> >> +
> > 
> > So svn_error_clear() runs for two paths: on the "correct" error and on
> > SVN_NO_ERROR.
> 
> No, it doesn't... It only runs for the third path, where err->apr_err ==
> SVN_ERR_IO_INCONSISTENT_EOL, because of the return statements.
> 

Errr, right.  Hmm, I suppose it could be made clearer, for example

-  svn_error_clear(err);
+  else
+    svn_error_clear(err);

but it's probably my fault for reviewing the patch before I woke up.

> > Looks good.
> Thanks :)
> But, with your improvements it looks much much better!
> 
> A new version of the patch coming right up...
> 
> 

Thanks :)  I'll look at the new version some day.  (Slightly crazy 
schedule here.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Neels Janosch Hofmeyr <ne...@elego.de>.

Daniel Shahaf wrote:
>> Also adding regression test for 1796.
>                                   ^
> 1796 what?

potatoes :)
(changed)

>> +          if (svn_utf__is_valid(value->data, value->len) == FALSE)
>> +            return svn_error_create
>> +              (APR_EINVAL, NULL,
>> +               _("Cannot accept log message because it is not encoded in "
>> +                 "UTF-8"));
>> +        
> 
> More specific error code (SVN_ERR_INVALID_UTF_8)?

You mean, I should create a new error code?
(I can't find SVN_ERR_INVALID_UTF_8 anywhere)

;)
kfogel might say: "Every error code got created some day"

> [...]
> Please also update svn_repos_fs_change_node_prop's docstring, which
> claims that only SVN_PROP_REVISION_DATE is validated.
> 
> Hmm, that docstring (and validate_prop()'s docstring) also promise that
> you'll return SVN_ERR_BAD_PROPERTY_VALUE.  So you need to wrap your
> errors by that error.

That's right, so it would suffice to just throw a
SVN_ERR_BAD_PROPERTY_VALUE for both cases of inalid UTF-8 and
inconsistent EOL style, and I don't need a SVN_ERR_INVALID_UTF_8 even if
it was there already. (The function validate_prop directly correlates
with the mentioned comment, so it wouldn't make sense to throw
differentiated errors only to wrap them to a more general error within
the same function.)



>> +static svn_error_t *
>> +prop_validation_commit_with_revprop(const char *filename,
>> +                                    const void *prop_key,
>> +                                    apr_ssize_t prop_klen,
>> +                                    const void *prop_val,
>> +                                    svn_repos_t *repos,
> 
> Why are they "const void *"?

...because the arguments to apr_hash_set() are also const void*. I've
changed prop_key to const char*, but left prop_val a const void*, since,
technically, it could be something different (depending on prop_klen,
see apr_hash_set()). Is that good enough?


>> +  /* Make an arbitrary change and commit using above values... */
>> +
>> +  SVN_ERR(svn_repos_get_commit_editor5(&editor, &edit_baton, repos,
>> +                                       NULL, "file://test", "/",
>> +                                       revprop_table, NULL, NULL,
>> +                                       NULL, NULL, pool));
>> +
>> +  SVN_ERR(editor->open_root(edit_baton, 1, pool, &root_baton));
>                                            ^
> 
> Should this be 1 or 0?  I'm not familiar with the editor or the C tests
> infrastructure, but based on the docstrings I think this should be zero?
> (I understand the code works, but I'd like to understand why.)

I just copied this from another test and didn't pay attention to this
parameter.
Hmm, I can just give base_revision a zero and files will still be added
to the HEAD, right?


>> +  /* Test an invalid commit log message: LF */
>> +  err = prop_validation_commit_with_revprop
[...]
>> +  if (err == SVN_NO_ERROR)
>> +    return svn_error_create(SVN_ERR_TEST_FAILED, err,
>> +                            "Failed to reject a log with inconsistent "
>> +                            "line ending style");
>> +  else
>> +    if (err->apr_err != SVN_ERR_IO_INCONSISTENT_EOL)
>> +      return svn_error_create(SVN_ERR_TEST_FAILED, err,
>> +                              "Expected SVN_ERR_IO_INCONSISTENT_EOL, "
>> +                              "got another error.");
>> +  svn_error_clear(err);
>> +
> 
> So svn_error_clear() runs for two paths: on the "correct" error and on
> SVN_NO_ERROR.

No, it doesn't... It only runs for the third path, where err->apr_err ==
SVN_ERR_IO_INCONSISTENT_EOL, because of the return statements.

> Looks good.
Thanks :)
But, with your improvements it looks much much better!

A new version of the patch coming right up...

-- 
Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696  mobile: +49 177 2345869  fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194


Re: [PATCH] issue 1796: defective or malicious client can corrupt repository log messages

Posted by Daniel Shahaf <d....@daniel.shahaf.co.il>.
Neels Janosch Hofmeyr wrote on Tue, 3 Jun 2008 at 03:48 +0200:
> There was a discussion on issue 1796 up to a week ago, in which a bunch
> of shortcomings of the same kind as 1796 have been revealed:
> 
> The subversion server and client do not validate props in places where
> they should:
> - where the server receives props from a client out there. (#1796)
> - where the server reads props from the repository file system.
> - where the svn client reads props from a server out there.
> (Approval by kfogel)
> 
> This patch starts by fixing the specific problems of issue 1796, only:
> - where the server receives props from a client out there. (#1796)
> , and limited only to the log message prop (SVN_PROP_REVISION_LOG).
> 
> More patches, continuing in above list, are to follow.
> 
> Also, in the threads about issue 1796 recently, some people asked for a
> way to reproduce 1796 without forging their svn client. Note that the C
> test included in this patch is a good way to do so. It may be
> illustrative to investigate the repository after the test has run, using
> current trunk: the corrupt data shows in the repository filesystem.
> 

...

> Also note that this is my first "complex" patch to subversion,

It's also one of the first non-trivial patches I review.
> so please
> feel very free to tell me about anything I could have done better.
> 
> Thanks!
> 
> 
> [[[
> Fix issue #1796: defective or malicious client can corrupt repository
> log messages.
> Also adding regression test for 1796.
                                  ^
1796 what?

> 
> * subversion/include/private/svn_utf_private.h: Add this private header
>     file and move the declaration of svn_utf__is_valid from
>     libsvn_subr/utf_impl.h here, because this function is needed in
>     libsvn_repos.
> 

svn_utf__is_valid should be mentioned using the (symbol_name): syntax.

> * subversion/libsvn_subr/utf_impl.h: Include private/svn_utf_private.h.
>   (svn_utf__is_valid): Move declaration away to svn_utf_private.h
>     because this function is needed in libsvn_repos.
>   (svn_utf__last_valid): Add comment to also see svn_utf__is_valid.
> 
> * subversion/libsvn_repos/fs-wrap.c(validate_prop): Add two validations

Space before parenthesis.

>     for SVN_PROP_REVISION_LOG's value. Validate UTF-8 encoding using
>     svn_utf__is_valid, and validate consistent LF eol style by looking
>     for and rejecting CR (\r) characters.
> 
> * subversion/tests/libsvn_repos/repos-test.c
>   (prop_validation): Add this regression test for issue 1796, which
>     tries to commit two invalid log messages concerning UTF-8 and LF.
>   (prop_validation_commit_with_revprop): Add this helper function for
>     prop_validation, which runs a commit with a given revprop.
> 
> Patch by:  Neels Janosch Hofmeyr <ne...@elego.de>
> Review by: Karl Fogel <kf...@red-bean.com>
>            Daniel Shahaf <d....@daniel.shahaf.co.il>
>            Stefan Sperling <st...@elego.de>
>            Branko Čibej <br...@xbc.nu>

You should list committers by their canonical usernames from HACKING:
kfogel, danielsh, stsp, etc.  But since you haven't posted a version of
this patch before, it is inappropriate to list all these people as
reviewers: they haven't reviewed this patch.  If you want to credit
the other people than direct reviewers, a parenthetical might work:

    Review by: jrandom
    (and here you mention jconstant)

> Found by:  garrick_olson

According to HACKING, mentioning the issue number is sufficient.  Not
that it hurts to credit someone :)

> ]]]
> 
> 

> Index: subversion/libsvn_repos/fs-wrap.c
> ===================================================================
> --- subversion/libsvn_repos/fs-wrap.c	(revision 31559)
> +++ subversion/libsvn_repos/fs-wrap.c	(working copy)
> @@ -28,6 +28,7 @@
>  #include "svn_time.h"
>  #include "repos.h"
>  #include "svn_private_config.h"
> +#include "private/svn_utf_private.h"
>  
>  
>  /*** Commit wrappers ***/
> @@ -169,6 +170,25 @@
>    /* Validate "svn:" properties. */
>    if (svn_prop_is_svn_prop(name) && value != NULL)
>      {
> +      /* validate log message (UTF-8/LF) */
> +      if (strcmp(name, SVN_PROP_REVISION_LOG) == 0)
> +        {
> +          /* validate encoding of log message (UTF-8) */

I'd probably capitalize these two comments and make them complete
sentences.

> +          if (svn_utf__is_valid(value->data, value->len) == FALSE)
> +            return svn_error_create
> +              (APR_EINVAL, NULL,
> +               _("Cannot accept log message because it is not encoded in "
> +                 "UTF-8"));
> +        

More specific error code (SVN_ERR_INVALID_UTF_8)?

> +          /* Disallow inconsistent line ending style, by simply looking for
> +           * carriage return characters ('\r'). */
> +          if (strstr(value->data, "\r") != NULL)

strchr?

> +            return svn_error_create
> +              (SVN_ERR_IO_INCONSISTENT_EOL, NULL,
> +               _("Cannot accept non-LF line endings "
> +                 "in log message"));
> +        }
> +

Please also update svn_repos_fs_change_node_prop's docstring, which
claims that only SVN_PROP_REVISION_DATE is validated.

Hmm, that docstring (and validate_prop()'s docstring) also promise that
you'll return SVN_ERR_BAD_PROPERTY_VALUE.  So you need to wrap your
errors by that error.

>        /* "svn:date" should be a valid date. */
>        if (strcmp(name, SVN_PROP_REVISION_DATE) == 0)
>          {
> Index: subversion/tests/libsvn_repos/repos-test.c
> ===================================================================
> --- subversion/tests/libsvn_repos/repos-test.c	(revision 31559)
> +++ subversion/tests/libsvn_repos/repos-test.c	(working copy)
> @@ -2187,6 +2187,144 @@
>  
>  
>  
> +/* Test if prop values received by the server are validated.
> + * These tests "send" property values to the server and diagnose the
> + * behaviour.
> + */
> +
> +/* Helper function that makes an arbitrary change to a repository and runs
> + * a commit with a specific revision property set to a certain value.

Should say that the property's name and value are obtained from PROP_KEY
and PROP_VAL.

> + * The filename argument names a file in the test repository to add in

In internal docstrings, capitalize parameter names, so s/filename/FILENAME/.
(Without that, when you say "filename" unquoted it is not obvious that
you're using it as a proper noun -- as a parameter name.)

> + * this commit, e.g. "/A/B/should_fail_1".
> + * If you run this function multiple times on the same repository,
> + * make sure to use differing filename arguments

Why?

> + * (e.g. increment a
> + * number every time you call this helper, as in "/A/B/should_fail_2",
> + * "/A/B/should_fail_3" etc)
> + */
> +static svn_error_t *
> +prop_validation_commit_with_revprop(const char *filename,
> +                                    const void *prop_key,
> +                                    apr_ssize_t prop_klen,
> +                                    const void *prop_val,
> +                                    svn_repos_t *repos,

Why are they "const void *"?

> +                                    apr_pool_t *pool)
> +{
> +  const svn_delta_editor_t *editor;
> +  void *edit_baton;
> +  void *root_baton;
> +  void *file_baton;
> +
> +  /* Prepare revision properties */
> +  apr_hash_t *revprop_table = apr_hash_make(pool);
> +
> +  /* Add the requested property */
> +  apr_hash_set(revprop_table, prop_key, prop_klen, prop_val);
> +
> +  /* Set usual author and log props, if not set already */
> +  if (strcmp((const char*)prop_key, SVN_PROP_REVISION_AUTHOR) != 0)

Normally I'd suggest defining a local const char * variable and use it
(see how we normally use apr_hash_get), but see above.

> +    {
> +      apr_hash_set(revprop_table, SVN_PROP_REVISION_AUTHOR,
> +                   APR_HASH_KEY_STRING,
> +                   svn_string_create("plato", pool));
> +    }
> +
> +  if (strcmp((const char*)prop_key, SVN_PROP_REVISION_LOG) != 0)
> +    {
> +      apr_hash_set(revprop_table, SVN_PROP_REVISION_LOG,
> +                   APR_HASH_KEY_STRING,
> +                   svn_string_create("revision log", pool));
> +    }
> +
> +  /* Make an arbitrary change and commit using above values... */
> +
> +  SVN_ERR(svn_repos_get_commit_editor5(&editor, &edit_baton, repos,
> +                                       NULL, "file://test", "/",
> +                                       revprop_table, NULL, NULL,
> +                                       NULL, NULL, pool));
> +
> +  SVN_ERR(editor->open_root(edit_baton, 1, pool, &root_baton));
                                           ^

Should this be 1 or 0?  I'm not familiar with the editor or the C tests
infrastructure, but based on the docstrings I think this should be zero?
(I understand the code works, but I'd like to understand why.)

> +
> +  SVN_ERR(editor->add_file( filename , root_baton, NULL,
                              ^        ^
s/ //

> +                           SVN_INVALID_REVNUM, pool,
> +                           &file_baton));
> +

Start an edit but not finish it? (close_file, close_edit...)  Why?

> +  return SVN_NO_ERROR;
> +}
> +
> +
> +/* Expect failure of invalid commit in these cases:
> + *  - log message contains invalid UTF-8 octet (issue 1796)
> + *  - log message contains invalid linefeed style (non-LF) (issue 1796)
> + */
> +static svn_error_t *
> +prop_validation(const char **msg,
> +                svn_boolean_t msg_only,
> +                svn_test_opts_t *opts,
> +                apr_pool_t *pool)
> +{
> +  svn_error_t *err;
> +  svn_repos_t *repos;
> +  svn_fs_t *fs;
               ^^
Unused local variable.

> +  const char non_utf8_string[5] = { 'a', 0xff, 'b', '\n', 0 };
> +  const char *non_lf_string = "a\r\nb\n\rc\rd\n";
> +  apr_pool_t *subpool = svn_pool_create(pool);
> +
> +  *msg = "test if revprops are validated by repos";
> +
> +  if (msg_only)
> +    return SVN_NO_ERROR;
> +
> +  /* Create a filesystem and repository. */
> +  SVN_ERR(svn_test__create_repos(&repos, "test-repo-prop-validation",
> +                                 opts, subpool));
> +  fs = svn_repos_fs(repos);
> +
> +
> +  /* Test an invalid commit log message: UTF-8 */
> +  err = prop_validation_commit_with_revprop
> +            ("/non_utf8_log_msg",
> +             SVN_PROP_REVISION_LOG, APR_HASH_KEY_STRING,
> +             svn_string_create(non_utf8_string, subpool),
> +             repos, subpool);
> +
> +  if (err == SVN_NO_ERROR)
> +    return svn_error_create(SVN_ERR_TEST_FAILED, err,
> +                            "Failed to reject a log with invalid "
> +                            "UTF-8");
> +  else
> +    if (err->apr_err != APR_EINVAL)
> +      return svn_error_create(SVN_ERR_TEST_FAILED, err,
> +                              "Expected APR_EINVAL, got another error.");
> +  svn_error_clear(err);
> +
> +
> +  /* Test an invalid commit log message: LF */
> +  err = prop_validation_commit_with_revprop
> +            ("/non_lf_log_msg",
> +             SVN_PROP_REVISION_LOG, APR_HASH_KEY_STRING,
> +             svn_string_create(non_lf_string, subpool),
> +             repos, subpool);
> +
> +  if (err == SVN_NO_ERROR)
> +    return svn_error_create(SVN_ERR_TEST_FAILED, err,
> +                            "Failed to reject a log with inconsistent "
> +                            "line ending style");
> +  else
> +    if (err->apr_err != SVN_ERR_IO_INCONSISTENT_EOL)
> +      return svn_error_create(SVN_ERR_TEST_FAILED, err,
> +                              "Expected SVN_ERR_IO_INCONSISTENT_EOL, "
> +                              "got another error.");
> +  svn_error_clear(err);
> +

So svn_error_clear() runs for two paths: on the "correct" error and on
SVN_NO_ERROR.

> +
> +  /* Done. */
> +  svn_pool_destroy(subpool);
> +
> +  return SVN_NO_ERROR;
> +}
> +
> +
> +
>  /* The test table.  */
>  
>  struct svn_test_descriptor_t test_funcs[] =
> @@ -2203,5 +2341,6 @@
>      SVN_TEST_PASS(commit_continue_txn),
>      SVN_TEST_PASS(node_location_segments),
>      SVN_TEST_PASS(reporter_depth_exclude),
> +    SVN_TEST_PASS(prop_validation),
>      SVN_TEST_NULL
>    };
> 


Looks good.

Daniel