You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Branko Čibej <br...@apache.org> on 2018/12/14 15:01:09 UTC

New canonicalization functions [was: Subversion Exception!]

On 13.12.2018 17:00, Branko Čibej wrote:
> On 13.12.2018 16:53, Michael Pilato wrote:
>> On 12/13/18 10:45 AM, Branko Čibej wrote:
>>> Uh. I forgot about the malfunction handler. However this doesn't really
>>> help, other than putting possibly sensitive paths into the crash handler
>>> info? We really shouldn't do it this way, users *will* just copy and
>>> paste the output tot he 'net.
>> Ahem.  What Grandpa *meant* to say was:
>>
>> "Oh, cool!  So there _is_ a way to report the non-canonical path. 
>> Thanks for figuring this out, Julian!  Unfortunately, it comes at a 
>> cost, namely that of revealing potentially sensitive paths in the output 
>> which I strongly suspect will get copied and paste to the 'net.  If we 
>> could mitigate that part of it, this might turn out to be truly beneficial."
>
> Well, no, I meant to say exactly what I said. But if I were in a
> politically correct fame of mind, then I might have said something like
> what you wrote.
>
> Re FUD: it's not just paths, it's also URLs, and people do consider one
> or the other sensitive. Of course ... in the end that's no worse than
> printing paths or URLs in error messages.
>
> I still think we should add canonicalisation functions that validate
> their own output.


r1848943, for review. Obviouisly we'd have to start using these in our
own code for them to be any good, but TSVN could use them, too. Doing
that is independent of changing the is-canonical assertions as Johan and
Julian suggested.

-- Brane


Re: New canonicalization functions [was: Subversion Exception!]

Posted by Branko Čibej <br...@apache.org>.
On Fri, 14 Dec 2018, 16:18 Julian Foad <julianfoad@apache.org wrote:

> Branko Čibej wrote:
> > r1848943, for review. [...]
>
> Woo hoo! Thanks, Brane.
>
> Are you willing to add random-input testing for them?
>
> Minor comments:
>   * the 'relpath' one is not needed because, AFAIK, it's possible to
> canonicalize any relpath and we already do;
>


Possibly. I admit I was quite short on time, so I went for consistency
first.


  * the non-canonical 'result' and scratch pool params are probably
> over-kill; a subjective judgment, I know, so just a suggestion.
>


My thinking was that a GUI client may want to handle canonicalisation
errors differently than just by displaying the error message we happen to
generate.

Separate result and scratch pools are just policy, I think we've learned by
now that taking shortcuts with pools eventually leads to problems.

-- Brane

Re: New canonicalization functions [was: Subversion Exception!]

Posted by Branko Čibej <br...@apache.org>.
On 15.12.2018 12:24, Branko Čibej wrote:
> On 14.12.2018 19:05, Branko Čibej wrote:
>> On Fri, 14 Dec 2018, 18:14 Julian Foad <julianfoad@apache.org wrote:
>>
>>> Julian Foad wrote:
>>>> Are you willing to add random-input testing for them?
>>> The attached patch 'dirent-uri-test-random-2.patch' tests rules like:
>>>
>>>   * every result should pass an X_is_canonical() test (obvious by code
>>> inspection);
>>>   * every other input should produce SVN_ERR_CANONICALIZATION_FAILED;
>>>   * when a path is "canonical", it should be unchanged by "canonicalize".
>>>
>>> Some findings:
>>>
>>>   * svn_uri_canonicalize_safe("") aborts;
>>>   * svn_uri_canonicalize_safe("/foo") aborts;
>>>
>> We can fix this in the private "canonicalize()" function that all these
>> eventually call. As before, I didn't have time to unknit the internal
>> implementation.
>
> I saw a funny thing in that function the other day ...
>
>   if (SVN_PATH_IS_EMPTY(path))
>     {
>       assert(type != type_uri);
>       return "";
>     }


r1848990. There were two assertions in that function.

Note that we'll also have to create "safe" versions of the
_internal_style() functions, because those pass their result through
their _canonicalize() colleagues.

-- Brane

P.S.: Replacing all assertions in libsvn_subr/dirent_uri.c will cause
code churn of awseome proportions.


Re: New canonicalization functions [was: Subversion Exception!]

Posted by Branko Čibej <br...@apache.org>.
On 14.12.2018 19:05, Branko Čibej wrote:
> On Fri, 14 Dec 2018, 18:14 Julian Foad <julianfoad@apache.org wrote:
>
>> Julian Foad wrote:
>>> Are you willing to add random-input testing for them?
>> The attached patch 'dirent-uri-test-random-2.patch' tests rules like:
>>
>>   * every result should pass an X_is_canonical() test (obvious by code
>> inspection);
>>   * every other input should produce SVN_ERR_CANONICALIZATION_FAILED;
>>   * when a path is "canonical", it should be unchanged by "canonicalize".
>>
>> Some findings:
>>
>>   * svn_uri_canonicalize_safe("") aborts;
>>   * svn_uri_canonicalize_safe("/foo") aborts;
>>
>
> We can fix this in the private "canonicalize()" function that all these
> eventually call. As before, I didn't have time to unknit the internal
> implementation.


I saw a funny thing in that function the other day ...

  if (SVN_PATH_IS_EMPTY(path))
    {
      assert(type != type_uri);
      return "";
    }


-- Brane


Re: New canonicalization functions [was: Subversion Exception!]

Posted by Julian Foad <ju...@apache.org>.
Branko Čibej wrote:
> What inconsistencies? A canonical URI has the scheme and host name in
> lowercase, but the path, query, fragment and user info parts are case-
> sensitive.> 
> Similarly, I think we (or APR?) make drive letters lowercase.

In both cases I found examples where is_canonical said an input was
canonical but canonicalize changed its case.
To everything else: ack.

- Julian


Re: New canonicalization functions [was: Subversion Exception!]

Posted by Branko Čibej <br...@apache.org>.
On Fri, 14 Dec 2018, 18:14 Julian Foad <julianfoad@apache.org wrote:

> Julian Foad wrote:
> > Are you willing to add random-input testing for them?
>
> The attached patch 'dirent-uri-test-random-2.patch' tests rules like:
>
>   * every result should pass an X_is_canonical() test (obvious by code
> inspection);
>   * every other input should produce SVN_ERR_CANONICALIZATION_FAILED;
>   * when a path is "canonical", it should be unchanged by "canonicalize".
>
> Some findings:
>
>   * svn_uri_canonicalize_safe("") aborts;
>   * svn_uri_canonicalize_safe("/foo") aborts;
>


We can fix this in the private "canonicalize()" function that all these
eventually call. As before, I didn't have time to unknit the internal
implementation.


  * upper/lower case inconsistencies in URIs
>



What inconsistencies? A canonical URI has the scheme and host name in
lowercase, but the path, query, fragment and user info parts are
case-sensitive.



> I previously also found upper/lower case inconsistencies in dirent drive
> letters, when running these tests with "#define SVN_USE_DOS_PATHS" set in
> dirent_uri.c, but am right now failing to replicate that.
>


Similarly, I think we (or APR?) make drive letters lowercase.


-- Brane

Re: New canonicalization functions [was: Subversion Exception!]

Posted by Julian Foad <ju...@apache.org>.
Julian Foad wrote:
> Are you willing to add random-input testing for them?

The attached patch 'dirent-uri-test-random-2.patch' tests rules like:

  * every result should pass an X_is_canonical() test (obvious by code inspection);
  * every other input should produce SVN_ERR_CANONICALIZATION_FAILED;
  * when a path is "canonical", it should be unchanged by "canonicalize".

Some findings:

  * svn_uri_canonicalize_safe("") aborts;
  * svn_uri_canonicalize_safe("/foo") aborts;
  * upper/lower case inconsistencies in URIs

I previously also found upper/lower case inconsistencies in dirent drive letters, when running these tests with "#define SVN_USE_DOS_PATHS" set in dirent_uri.c, but am right now failing to replicate that.

>   * the 'relpath' one is not needed because, AFAIK, it's possible to canonicalize any
>     relpath and we already do;

Maybe that will no longer be true if we decide to disallow some characters (control characters) and/or check for valid UTF-8 in the future.

-- 
- Julian

Re: New canonicalization functions [was: Subversion Exception!]

Posted by Julian Foad <ju...@apache.org>.
Branko Čibej wrote:
> r1848943, for review. [...]

Woo hoo! Thanks, Brane.

Are you willing to add random-input testing for them?

Minor comments:
  * the 'relpath' one is not needed because, AFAIK, it's possible to canonicalize any relpath and we already do;
  * the non-canonical 'result' and scratch pool params are probably over-kill; a subjective judgment, I know, so just a suggestion.

-- 
- Julian