You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@collab.net> on 2002/06/25 19:52:31 UTC

The Data Sanitization Plan

Karl and I are talking about how to tackle issues #494 and #667
together as a unit.  After reading and discussing, here's our Plan.

Essentially, we are imposing an Official set of burdens upon all svn
client applications, (i.e. any users of the libsvn_* libraries):

  1. all paths passed to libsvn_* are assumed to be

          - using '/' separators
          - using canonicalized case
          - in UTF-8
          
  2. all URLs passed to libsvn_* are assumed to be

          - properly URL-escaped
          - in UTF-8

  3. all log messages passed to libsvn_* are assumed to be
 
          - in UTF-8


Our thoughts are to create three utility routines in libsvn_subr,
something like:

    svn_sanitize_path()
    svn_sanitize_url()
    svn_sanitize_logmsg()

These utility routines would perform all the necessary transforms on
the data, according to the requirements above.

After that, it's a simple matter of making all of our cmdline client
code use these routines before passing data to libsvn_*.


==> How to implement the sanitization routines

  * Karl says apr_file_path_merge() will give us the "canonical" case
    of path components.  That sounds good.  Except the name of that
    func is really weird.  :-)

  * Rumors has it that there exist various apr iconv routines to
    convert data to UTF-8.  We've already decided that each
    sanitization func is going to take a locale argument; in the case
    of our cmdline client, this information can be gathered either by
    getting the system locale, or by using a particular locale from
    the commandline (--locale ?)

  * We can easily write a routine to convert '\' into '/'

  * The only real question is whether (and how) our cmdline client
    should "automatically escape" URLs.  Is this too dangerous?  Is
    there some reasonable heuristic to use?  It would stink if I had
    to type this:

        svn diff -r3:4 http://path/to/my%20file


Feedback welcome...



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Eric Gillespie <ep...@pretzelnet.org>.

<br...@xbc.nu> writes:

> Depends on your shell, right. The Windows shell will merrily try to 
> interpret %, no matter how many parens you wrap the parametsrs in.

OK, but one supported platform not supporting something is no reason
to reject a valid URL.

--  
Eric Gillespie <*> epg@pretzelnet.org

Build a fire for a man, and he'll be warm for a day.  Set a man on
fire, and he'll be warm for the rest of his life. -Terry Pratchett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Branko Čibej <br...@xbc.nu>.

Eric Gillespie wrote:

>mark benedetto king <bk...@inquira.com> writes:
>
>  
>
>>The paste-from-browser problem is much harder: metacharacters in URLs
>>confuse command interpreters.  We can't solve this problem.  It doesn't
>>make sense to introduce inconsistency to svn over a problem that we
>>can't solve.
>>    
>>
>
>Incorrect.  It's as simple as hitting ' before pasting and '
>afterwards; this is not exactly arcane.  As a matter of fact, i do
>this all the time.  I'd hate to seen svn accept invalid URLs but
>not valid ones...
>  
>
Depends on your shell, right. The Windows shell will merrily try to 
interpret %, no matter how many parens you wrap the parametsrs in.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by mark benedetto king <bk...@inquira.com>.

On Thu, Jun 27, 2002 at 10:39:00AM +0200, Michael Wood wrote:
>
> Why should 0.9.9 do anything different with the above?  Does it unescape
> the URLs on the address bar or something?
>

I think the behavior may be best described as "it shows the literal version
of the link that it followed".

I seem to have taken this discussion way off subject.  I was only
trying to point out that protecting users from browser->cmdline pasting
is a battle that we cannot win.

Back to the original topic:  If I read the RFCs correctly, URLs
cannot contain ' '.  This means, to me, that the URL validator should
reject strings containing spaces (or other illegal sequences), yielding
the message "That is not a well-formed URL".  I think this makes for
unambiguous encoding, and unsurprising behavior.

--ben

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

user input of paths and urls

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

This was the "Re: The Data Sanitization Plan" thread.

This mail is just to rename the thread, because the topic being
discussed now is independent of the Data Sanitization Plan (issue
#494), and I'd like to keep comments on the two separate.

Any comments on #494 are important now.  Comments on what users
type/paste, etc (before it ever gets the libsvn_client api), are
important later.

(I have nothing to add to either thread right now. )

-K

Michael Wood <mw...@its.uct.ac.za> writes:
> On Wed, Jun 26, 2002 at 08:17:57PM -0400, mark benedetto king wrote:
> > On Wed, Jun 26, 2002 at 07:14:31PM -0500, Eric Gillespie wrote:
> > > mark benedetto king <bk...@inquira.com> writes:
> > >
> > > >  http://foo.com/';rm -rf /;echo 'sorry!
> > >
> > > No, that's what you're advocating.  I'd be pasting:
> > >
> > > http://foo.com/%27%3Brm%20-rf%20/%3Becho+%27sorry%21
> > >
> > 
> > Visit: http://www.boredom.org/~egrep/demo.html
> > 
> > Click the link.
> > 
> > Highlight your browser's URL-bar.
> > 
> > then type:
> > 
> > echo '
> > 
> > then paste
> > 
> > then '[enter]
> 
> eh?
> 
> $ echo 'http://www.boredom.org/~egrep/demo.html?%27;ls;echo%27'
> http://www.boredom.org/~egrep/demo.html?%27;ls;echo%27
> 
> > Note: I've only tested this with Mozilla 0.9.9
> 
> OK, so my about:mozilla says "Mozilla 0.9.5+"
> 
> Why should 0.9.9 do anything different with the above?  Does it unescape
> the URLs on the address bar or something?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Michael Wood <mw...@its.uct.ac.za>.

On Wed, Jun 26, 2002 at 08:17:57PM -0400, mark benedetto king wrote:
> On Wed, Jun 26, 2002 at 07:14:31PM -0500, Eric Gillespie wrote:
> > mark benedetto king <bk...@inquira.com> writes:
> >
> > >  http://foo.com/';rm -rf /;echo 'sorry!
> >
> > No, that's what you're advocating.  I'd be pasting:
> >
> > http://foo.com/%27%3Brm%20-rf%20/%3Becho+%27sorry%21
> >
> 
> Visit: http://www.boredom.org/~egrep/demo.html
> 
> Click the link.
> 
> Highlight your browser's URL-bar.
> 
> then type:
> 
> echo '
> 
> then paste
> 
> then '[enter]

eh?

$ echo 'http://www.boredom.org/~egrep/demo.html?%27;ls;echo%27'
http://www.boredom.org/~egrep/demo.html?%27;ls;echo%27

> Note: I've only tested this with Mozilla 0.9.9

OK, so my about:mozilla says "Mozilla 0.9.5+"

Why should 0.9.9 do anything different with the above?  Does it unescape
the URLs on the address bar or something?

-- 
Michael Wood <mw...@its.uct.ac.za>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Eric Gillespie <ep...@pretzelnet.org>.

mark benedetto king <bk...@inquira.com> writes:

> Visit: http://www.boredom.org/~egrep/demo.html
 
Point taken.  Nevertheless:

1) We're not protected from lame filenames in the other scheme
   either.
 
2) In all my years of pasting URLs into my shell, i've never
   encountered such a URL.

3) If i see a hostile URL, i will not paste it.

4) Last but most important:

You should never have files named like that, in svn or otherwise.
This sort of thing is exactly why.  You may say that's a limitation
svn should not impose.  I say you don't need files like that anyway;
where Windows users tend to put spaces in their filenames, everyone
i have known just uses dashes.  Do you really need these kinds of
characters?

OK, you don't buy that one; i have more :).  svn could go out of
its way to reject valid URLs due to shell meta-characters, but it
buys you nothing; nothing else handles them.  Still, some of our
user's really really really want a file called "Bob's Shopping
List!".  Fine.  But tell me: is that kind of user going to be
pasting URLs into a shell?  Not my boss.

--  
Eric Gillespie <*> epg@pretzelnet.org

Build a fire for a man, and he'll be warm for a day.  Set a man on
fire, and he'll be warm for the rest of his life. -Terry Pratchett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by mark benedetto king <bk...@inquira.com>.

On Wed, Jun 26, 2002 at 07:14:31PM -0500, Eric Gillespie wrote:
> mark benedetto king <bk...@inquira.com> writes:
>
> >  http://foo.com/';rm -rf /;echo 'sorry!
>
> No, that's what you're advocating.  I'd be pasting:
>
> http://foo.com/%27%3Brm%20-rf%20/%3Becho+%27sorry%21
>

Visit: http://www.boredom.org/~egrep/demo.html

Click the link.

Highlight your browser's URL-bar.

then type:

echo '

then paste

then '[enter]


Note: I've only tested this with Mozilla 0.9.9

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Eric Gillespie <ep...@pretzelnet.org>.

mark benedetto king <bk...@inquira.com> writes:

>  http://foo.com/';rm -rf /;echo 'sorry!

No, that's what you're advocating.  I'd be pasting:

http://foo.com/%27%3Brm%20-rf%20/%3Becho+%27sorry%21

--  
Eric Gillespie <*> epg@pretzelnet.org

Build a fire for a man, and he'll be warm for a day.  Set a man on
fire, and he'll be warm for the rest of his life. -Terry Pratchett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by mark benedetto king <bk...@inquira.com>.

On Wed, Jun 26, 2002 at 04:08:17PM -0500, Eric Gillespie wrote:
> mark benedetto king <bk...@inquira.com> writes:
>
> > The paste-from-browser problem is much harder: metacharacters in URLs
> > confuse command interpreters.  We can't solve this problem.  It doesn't
> > make sense to introduce inconsistency to svn over a problem that we
> > can't solve.
>
> Incorrect.  It's as simple as hitting ' before pasting and '
> afterwards; this is not exactly arcane.  As a matter of fact, i do
> this all the time.  I'd hate to seen svn accept invalid URLs but
> not valid ones...
>

How about:

 http://foo.com/';rm -rf /;echo 'sorry!

:-)

--ben



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Eric Gillespie <ep...@pretzelnet.org>.

mark benedetto king <bk...@inquira.com> writes:

> The paste-from-browser problem is much harder: metacharacters in URLs
> confuse command interpreters.  We can't solve this problem.  It doesn't
> make sense to introduce inconsistency to svn over a problem that we
> can't solve.

Incorrect.  It's as simple as hitting ' before pasting and '
afterwards; this is not exactly arcane.  As a matter of fact, i do
this all the time.  I'd hate to seen svn accept invalid URLs but
not valid ones...

--  
Eric Gillespie <*> epg@pretzelnet.org

Build a fire for a man, and he'll be warm for a day.  Set a man on
fire, and he'll be warm for the rest of his life. -Terry Pratchett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Re: The Data Sanitization Plan

Posted by mark benedetto king <bk...@inquira.com>.

On Wed, Jun 26, 2002 at 01:33:10AM -0700, Greg Stein wrote:
> On Tue, Jun 25, 2002 at 01:55:23PM -0700, Bill Tutt wrote:
>
> > This escapes most of the data when necessary and still allows a little
> > fudge room from pasting in URIs from browser address bars.
> >
> > The only other alternative to that is a stat() like approach. i.e. see
> > if either non-escaped target, or the escaped target exist, and pick the
> > first one that exists as your winner.
>
> That doesn't work. What if you're importing? Does the new file need to be
> escaped, or not?
>

Or if they both exist.

POLA/UI-Consistency argues for escaping '%' as well.

The paste-from-browser problem is much harder: metacharacters in URLs
confuse command interpreters.  We can't solve this problem.  It doesn't
make sense to introduce inconsistency to svn over a problem that we
can't solve.

Let's say someone *does* paste a URL-encoded string onto the command-line:

$ svn co http://www.foo.com/repo%20name -d dir

They'll get an error message.  It will be:

subversion/libsvn_ra_dav/options.c:126
svn_error: #21097 : <RA layer didn't receive requested OPTIONS info>
  The OPTIONS response did not include the requested activity-collection-set.
(Check the URL again;  this often means that the URL is not WebDAV-enabled.)

This is probably good enough to convince the user that pasting URL-encoded
data is not such a good idea ("Check the URL again").

--ben

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Re: The Data Sanitization Plan

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Jun 25, 2002 at 01:55:23PM -0700, Bill Tutt wrote:
>...
> > >         svn diff -r3:4 http://path/to/my%20file
> > 
> > I'm almost positive that you will have to type it that way. 
> 
> Ick. No way. No typing in of URL escapes. That's EVIL to our users.
> That's a big usability -1.

Of course it is a pain.

However, as I said: a client can certainly choose to do the escaping. But
the libraries won't do it.

>...
> The really icky part I see here isn't the question mark related one.
> What should the command line client do when it sees this command line:
> 	svn diff -r 3:4 http://path/to/my%20file
> 
> Should it escape the % or not? We don't want the URI accidentally double
> escaped to something like: http://path/to/my%2520file

Yup. Just another variation. I had to pick one for an example :-)

> Of course, just to be annoying, there could theoretically be a file
> called my%20file in the repository. What to do, what to do..

Yup.

> I'd tend towards this heuristic for deciding whether to escape the URI
> for a command line client only:
> * Escape all normally escaped characters except for the percentage sign.
> (0x25)

Heuristic? Heh. That's also known as "we'll get it wrong one day"

> This escapes most of the data when necessary and still allows a little
> fudge room from pasting in URIs from browser address bars.
> 
> The only other alternative to that is a stat() like approach. i.e. see
> if either non-escaped target, or the escaped target exist, and pick the
> first one that exists as your winner.

That doesn't work. What if you're importing? Does the new file need to be
escaped, or not?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: The Data Sanitization Plan

Posted by Bill Tutt <ra...@lyra.org>.


> From: Greg Stein [mailto:gstein@lyra.org]
> 
> >   * The only real question is whether (and how) our cmdline client
> >     should "automatically escape" URLs.  Is this too dangerous?  Is
> >     there some reasonable heuristic to use?  It would stink if I had
> >     to type this:
> >
> >         svn diff -r3:4 http://path/to/my%20file
> 
> I'm almost positive that you will have to type it that way. 
> 

Ick. No way. No typing in of URL escapes. That's EVIL to our users.
That's a big usability -1. 

svn diff -r3:4 'http://path/to/my file' should be perfectly acceptable
command line input. That's clearly how a GUI is likely to get the data
when it is about ready to call the SVN APIs.

The really icky part I see here isn't the question mark related one.
What should the command line client do when it sees this command line:
	svn diff -r 3:4 http://path/to/my%20file

Should it escape the % or not? We don't want the URI accidentally double
escaped to something like: http://path/to/my%2520file

Of course, just to be annoying, there could theoretically be a file
called my%20file in the repository. What to do, what to do..

I'd tend towards this heuristic for deciding whether to escape the URI
for a command line client only:
* Escape all normally escaped characters except for the percentage sign.
(0x25)

This escapes most of the data when necessary and still allows a little
fudge room from pasting in URIs from browser address bars.

The only other alternative to that is a stat() like approach. i.e. see
if either non-escaped target, or the escaped target exist, and pick the
first one that exists as your winner.

Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Jun 25, 2002 at 02:52:31PM -0500, Ben Collins-Sussman wrote:
>...
>   1. all paths passed to libsvn_* are assumed to be
> 
>           - using '/' separators
>           - using canonicalized case
>           - in UTF-8

Yes.

>   2. all URLs passed to libsvn_* are assumed to be
> 
>           - properly URL-escaped
>           - in UTF-8

Yes.

>   3. all log messages passed to libsvn_* are assumed to be
>  
>           - in UTF-8

Yes.

> Our thoughts are to create three utility routines in libsvn_subr,
> something like:
> 
>     svn_sanitize_path()
>     svn_sanitize_url()
>     svn_sanitize_logmsg()

The last one is not needed. We can simply use a function that converts from
the locale charset to UTF-8. I don't think we need to call out a special
function for that.

Note that Marcus has a big-ass patch outstanding. Some/all of that patch
needs to be applied to the codebase. In particular, there are some utility
functions for converting to UTF-8.

Also: it has a patch to APR's xlate functions which should be applied to
APR.

>...
>   * Karl says apr_file_path_merge() will give us the "canonical" case
>     of path components.  That sounds good.  Except the name of that
>     func is really weird.  :-)

It comes from the semantic, "given a canonicalized root, merge <this>
fragment into the path."

>   * Rumors has it that there exist various apr iconv routines to
>     convert data to UTF-8.  We've already decided that each

Look at Marcus' patch.

>     sanitization func is going to take a locale argument; in the case
>     of our cmdline client, this information can be gathered either by
>     getting the system locale, or by using a particular locale from
>     the commandline (--locale ?)

Yes, take a source character set (string). The client can use
APR_LOCALE_CHARSET from apr_xlate.h if they want to use the system locale.

>   * We can easily write a routine to convert '\' into '/'

Already done (by Branko). See svn_path_internal_style(). It modifies a
stringbuf in place, so we may want to consider changing that to be:

  const char * svn_path_internal_style(const char *path, apr_pool_t *pool);

>   * The only real question is whether (and how) our cmdline client
>     should "automatically escape" URLs.  Is this too dangerous?  Is
>     there some reasonable heuristic to use?  It would stink if I had
>     to type this:
> 
>         svn diff -r3:4 http://path/to/my%20file

I'm almost positive that you will have to type it that way. Consider these
two URLs:

  http://example.com/repos/notes/conversion?
  http://example.com/repos/notes/conversion%3f

The URLs have entirely different meanings. If we escape the first one, then
we change the meaning of the URL. The ambiguity is then, "which of the two
meanings were intended?"

However, if we carefully read through RFC 2396, we may find that we can
reliably escape the URLs. For example, in the above set of URLs, the URL
with the trailing "?" is not a "legal" URL for our purposes. Queries,
parameters, and anchors are not allowed in our URLs for repositories. Thus,
we could "assume" that a "?" present in the URL is intended to be escaped,
rather than forming a "query" within the URL.

But this is going to require a review of which characters are in and which
are out. And then to consider this stuff in reference to UTF-8 encoding...

Eek :-) For now, I would recommend /not/ escaping (because it could damage
the URL) unless/until we have a firm document on the viability of escaping
the inputs.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Kevin Pilch-Bisson <ke...@pilch-bisson.net>.

On Tue, Jun 25, 2002 at 02:52:31PM -0500, Ben Collins-Sussman wrote:
>snip 
>   1. all paths passed to libsvn_* are assumed to be
> 
>           - using '/' separators
>           - using canonicalized case
>           - in UTF-8
>snip           
If that is the case then the value of svn:ignore also needs to follow the same
semantics.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: The Data Sanitization Plan

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Ben Collins-Sussman <su...@collab.net> writes:
> ==> How to implement the sanitization routines
> 
>   * Karl says apr_file_path_merge() will give us the "canonical" case
>     of path components.  That sounds good.  Except the name of that
>     func is really weird.  :-)

Quick clarification:

apr_filepath_merge(), with the APR_FILEPATH_TRUENAME mask.  (I haven't
actually tried this yet, just going on what's in the headers.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Branko Čibej <br...@xbc.nu>.

Ben Collins-Sussman wrote:

>  3. all log messages passed to libsvn_* are assumed to be
> 
>          - in UTF-8
>
I think we have to agree on a line ending convention for the log 
messages, too. If we're going to convert them, we might as well go all 
the way.

BTW, the log message file templates for the $EDITOR functionality should 
be created with native line endings; otherwise, Windows users, for 
example, won't be able to use notepad as their editor.

>Our thoughts are to create three utility routines in libsvn_subr,
>something like:
>
>    svn_sanitize_path()
>    svn_sanitize_url()
>
Don't forget that the command line client needs conversions in _both_ 
directions.

>    svn_sanitize_logmsg()
>
Given the above, we _do_ need this last function, because log message 
sanitization won't be just a charset conversion. Sorry, Greg. :-)

>  * We can easily write a routine to convert '\' into '/'
>
I think apr_filepath_merge already does that, if you don't set the 
APR_FILEPATH_NATIVE flag. And vice versa, for output conversions.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Marcus Comstedt <ma...@mc.pp.se>.

Karl Fogel <kf...@newton.ch.collab.net> writes:

> Should also point out that this is where we *finally* make use of
> Marcus "Infinite Patience" Comstedt's UTF-8 patch(es).

I think I can gather enough enthusiasm to make another merge and post
an updated patch.  :-)  I have fixed svnlook and svnadmin, and
verified proper operation with a file: URL containing non-ascii
characters.  I haven't been able to test the http:/https: case since I
haven't had time to set up an Apache.

> >   * Rumors has it that there exist various apr iconv routines to
> >     convert data to UTF-8.  We've already decided that each
> >     sanitization func is going to take a locale argument; in the case
> >     of our cmdline client, this information can be gathered either by
> >     getting the system locale, or by using a particular locale from
> >     the commandline (--locale ?)

This locale argument is somewhat problematic, since it should in this
case presumably be used also when "desanitizing" the paths.  Since
this has to be done whenever a file is opened or stated, the data flow
for this parameter would be pretty ubiquitous.

  // Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

Should also point out that this is where we *finally* make use of
Marcus "Infinite Patience" Comstedt's UTF-8 patch(es).

-K

Ben Collins-Sussman <su...@collab.net> writes:
> ==> How to implement the sanitization routines
> 
>   * Karl says apr_file_path_merge() will give us the "canonical" case
>     of path components.  That sounds good.  Except the name of that
>     func is really weird.  :-)
> 
>   * Rumors has it that there exist various apr iconv routines to
>     convert data to UTF-8.  We've already decided that each
>     sanitization func is going to take a locale argument; in the case
>     of our cmdline client, this information can be gathered either by
>     getting the system locale, or by using a particular locale from
>     the commandline (--locale ?)
> 
>   * We can easily write a routine to convert '\' into '/'
> 
>   * The only real question is whether (and how) our cmdline client
>     should "automatically escape" URLs.  Is this too dangerous?  Is
>     there some reasonable heuristic to use?  It would stink if I had
>     to type this:
> 
>         svn diff -r3:4 http://path/to/my%20file
> 
> 
> Feedback welcome...
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Branko Čibej <br...@xbc.nu>.

Josef Wolf wrote:

>On Tue, Jun 25, 2002 at 02:52:31PM -0500, Ben Collins-Sussman wrote:
>
>  
>
>>  1. all paths passed to libsvn_* are assumed to be
>>
>>          - using '/' separators
>>          - using canonicalized case
>>    
>>
>              ^^^^^^^^^^^^^^^^^^^^^^^^
>Do you really want case-independency on unix-like-systems?
>
>Just curious...
>

"Canonicalized" in this context means "with correct case", not 
case-independent.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Josef Wolf <jw...@raven.inka.de>.

On Tue, Jun 25, 2002 at 02:52:31PM -0500, Ben Collins-Sussman wrote:

>   1. all paths passed to libsvn_* are assumed to be
> 
>           - using '/' separators
>           - using canonicalized case
              ^^^^^^^^^^^^^^^^^^^^^^^^
Do you really want case-independency on unix-like-systems?

Just curious...

-- 
-- Josef Wolf -- jw@raven.inka.de --

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Jun 25, 2002 at 01:25:29PM -0700, Bill Tutt wrote:
> I don't suppose this means that you'll be using a real honest to
> goodness URI struct in the Subversion code now does it?

I hope not.

Consider the following API:

svn_error_t *
svn_client_delete (svn_client_commit_info_t **commit_info,
                   const char *path,
		   svn_boolean_t force,
		   svn_client_auth_baton_t *auth_baton,
                   svn_client_get_commit_log_t log_msg_func,
		   void *log_msg_baton,
		   svn_wc_notify_func_t notify_func,
		   void *notify_baton,
                   apr_pool_t *pool);


The 'path' variable can be a local path or a URL. Are you seriously
suggesting that we need to convert every use of a path into a union of a
local path and a URL? Bleck. That sure as hell makes working with the API a
lot harder.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by William Uther <wi...@cs.cmu.edu>.

On 25/6/02 3:52 PM, "Ben Collins-Sussman" <su...@collab.net> wrote:

> 
> Karl and I are talking about how to tackle issues #494 and #667
> together as a unit.  After reading and discussing, here's our Plan.
> 
> Essentially, we are imposing an Official set of burdens upon all svn
> client applications, (i.e. any users of the libsvn_* libraries):
> 
> 1. all paths passed to libsvn_* are assumed to be
> 
>         - using '/' separators
>         - using canonicalized case
>         - in UTF-8

I don't think this entirely solves issue 667.

I just added the following to that issue:

Path canonicalization handles most, but not all, of this issue.

 - 'svn mv myFile myfile' still needs to be able to rename a file to a new
case.  Canonicalizing this into a nop isn't the answer.
 - Canonicalization only really helps when the problem comes from the
command line.  If there is a conflict when files are checked out/updated,
there needs to be a reasonable error message.  e.g. if the repos contains
both 'myfile' and 'myFile' in a directory then I believe that'll cause
issues on Mac OS X when you try and check out.

  The solution should probably generalize to an error when trying to check
out any file that cannot be represented on the current system because of
either an illegal character in the file name or because two different
filenames alias to the same file on the current filesystem (because of
case).  (I agree svn doesn't neccessarily have to work with these files, but
it should at least get the error message right.)

Later,

\x/ill          :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Branko Čibej <br...@xbc.nu>.

cmpilato@collab.net wrote:

>=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:
>
>  
>
>>Bill Tutt wrote:
>>
>>    
>>
>>>I don't suppose this means that you'll be using a real honest to
>>>goodness URI struct in the Subversion code now does it?
>>>
>>>      
>>>
>>Seconded. IMHO any work on these changes should also have us start using 
>>the apr_uri_* functions from apr-util.
>>    
>>
>
>+1.  Been waiting for this to happen, and I seem to recall that APU
>actually added "file" as a valid schema (when first we required
>apr-util on the client-side, this was not the case).
>
Yes, it does. Of course, we have to fix it to accept file:///c:/... on 
Windows ... but, at least we won't have to special-case that in SVN.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by cm...@collab.net.

=?UTF-8?B?QnJhbmtvIMSMaWJlag==?= <br...@xbc.nu> writes:

> Bill Tutt wrote:
> 
> >I don't suppose this means that you'll be using a real honest to
> >goodness URI struct in the Subversion code now does it?
> >
> Seconded. IMHO any work on these changes should also have us start using 
> the apr_uri_* functions from apr-util.

+1.  Been waiting for this to happen, and I seem to recall that APU
actually added "file" as a valid schema (when first we required
apr-util on the client-side, this was not the case).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: The Data Sanitization Plan

Posted by Branko Čibej <br...@xbc.nu>.

Bill Tutt wrote:

>I don't suppose this means that you'll be using a real honest to
>goodness URI struct in the Subversion code now does it?
>
Seconded. IMHO any work on these changes should also have us start using 
the apr_uri_* functions from apr-util.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: The Data Sanitization Plan

Posted by Bill Tutt <ra...@lyra.org>.

I don't suppose this means that you'll be using a real honest to
goodness URI struct in the Subversion code now does it?


Keeps on poking,
Bill
----
Do you want a dangerous fugitive staying in your flat?
No.
Well, don't upset him and he'll be a nice fugitive staying in your flat.
 




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org