You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ivan Zhakov <iv...@visualsvn.com> on 2011/07/19 09:49:29 UTC

Lock token encoding (was Re: svn commit: r1147882 - /subversion/trunk/subversion/libsvn_repos/hooks.c)

On Mon, Jul 18, 2011 at 23:50, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Ivan notes on IRC that this fixes issues with non-ASCII lock tokens.
> (At least, he reports errors in 'svn ls -v' over http://; but for me
> 'ls' and 'info' work fine over svn://.)
>
> Are lock tokens even permitted to be non-ASCII?
> (both backends generate ASCII-only lock tokens (in the same manner))
>
> If they are, are they required to be in UTF-8?  Or can they be, say,
> arbitrary octet strings?
>
I didn't find explicit specification for lock token encoding, but it
handled everywhere as normal utf-8 string.

That's why I decided to implement minimum patch just to decide lock
token returned from hook output to utf-8 string.

But I'm fine to require lock token to be ASCII only string.

-- 
Ivan Zhakov

Re: Lock token encoding (was Re: svn commit: r1147882 - /subversion/trunk/subversion/libsvn_repos/hooks.c)

Posted by Ivan Zhakov <iv...@visualsvn.com>.
Hi Daniel,

Thanks for looking into this issue. See my comments below.

>> >> FWIW, I think UTF-8 is a fine choice, especially if we've been internally
>> >> handling this string as UTF-8 already.  But let's document the fact
>> >> somewhere, please (preferably in the pre-lock-hook template comments).
>> >>
>> >
>> > Looking again, svn_fs.h:2059 documents that lock tokens are URI's.
>> > (and, consequently, are printable ASCII)
>> >
>> > We also need them to be XML-safe for their use in ra_dav LOCK responses.
>> >
>> > So, I'm thinking that:
>> >
>> > * We should validate the supplied lock tokens, not for being well-formed
>> >  UTF-8 as in Ivan's change, but for being ASCII-only.
>> >
>> > * We should move the validation to the FS layer.
>> >
>> > Thoughts?
>> >
>> I'm fine with this approach, but I think we should leave explicit
>> validation of lock tokens in pre-lock hook handling for better
>> diagnostic.
>
> I don't think that's necessary, given that we don't (and likely won't)
> do anything between calling the pre-lock hook and calling svn_fs_lock().
>
> Anyway, untested patch:
>
[..]
> ===================================================================
> --- subversion/libsvn_fs/fs-loader.c    (revision 1147872)
> +++ subversion/libsvn_fs/fs-loader.c    (working copy)
> @@ -28,6 +28,7 @@
>  #include <apr_md5.h>
>  #include <apr_thread_mutex.h>
>  #include <apr_uuid.h>
> +#include <apr_uri.h>
>
>  #include "svn_types.h"
>  #include "svn_dso.h"
> @@ -1303,6 +1304,31 @@ svn_fs_lock(svn_lock_t **lock, svn_fs_t *fs, const
>         return svn_error_create
>           (SVN_ERR_XML_UNESCAPABLE_DATA, NULL,
>            _("Lock comment contains illegal characters"));
> +    }
> +
> +  /* Enforce that the token be an XML-safe URI. */
> +  if (token)
> +    {
> +      apr_uri_t uri;
> +      apr_status_t status;
> +
> +      status = apr_uri_parse(pool, token, &uri);
> +      if (status)
> +        return svn_error_createf(SVN_ERR_FS_BAD_LOCK_TOKEN,
> +                                 svn_error_wrap_apr(status, NULL),
> +                                 _("Can't parse token '%s' as a URI"),
> +                                 token);
> +
> +      if (strcmp(uri.scheme, "opaquelocktoken"))
> +        return svn_error_createf(SVN_ERR_FS_BAD_LOCK_TOKEN, NULL,
> +                                 _("Lock token URI '%s' has bad scheme; "
> +                                   "expected '%s'"),
> +                                 token, "opaquelocktoken");
> +
In my test apr_uri_parse() returning uri.scheme == NULL. See attached
patch with test.

-- 
Ivan Zhakov

Re: Lock token encoding (was Re: svn commit: r1147882 - /subversion/trunk/subversion/libsvn_repos/hooks.c)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Ivan Zhakov wrote on Tue, Jul 19, 2011 at 21:54:19 +0400:
> On Tue, Jul 19, 2011 at 21:46, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > C. Michael Pilato wrote on Tue, Jul 19, 2011 at 09:45:09 -0400:
> >> On 07/19/2011 03:49 AM, Ivan Zhakov wrote:
> >> > On Mon, Jul 18, 2011 at 23:50, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> >> >> Ivan notes on IRC that this fixes issues with non-ASCII lock tokens.
> >> >> (At least, he reports errors in 'svn ls -v' over http://; but for me
> >> >> 'ls' and 'info' work fine over svn://.)
> >> >>
> >> >> Are lock tokens even permitted to be non-ASCII?
> >> >> (both backends generate ASCII-only lock tokens (in the same manner))
> >> >>
> >> >> If they are, are they required to be in UTF-8?  Or can they be, say,
> >> >> arbitrary octet strings?
> >> >>
> >> > I didn't find explicit specification for lock token encoding, but it
> >> > handled everywhere as normal utf-8 string.
> >> >
> >> > That's why I decided to implement minimum patch just to decide lock
> >> > token returned from hook output to utf-8 string.
> >> >
> >> > But I'm fine to require lock token to be ASCII only string.
> >>
> >> Having this as an open question is unacceptable, so we need to make a
> >> decision about it.  To my knowledge, ASCII is the only thing used for lock
> >> tokens in the code we ship.  There's no telling what encoding could be in
> >
> > For the record, our code generates tokens in the form
> > "opaquelocktoken:$UUID".
> >
> >> use if folks are taking advantage of the pre-lock-hook token dictation
> >> feature -- Apache will give them a "C" locale environment, but sure anyone
> >> who has advanced enough configuration to require taking advantage of this
> >> feature is also setting the locale from inside those hooks.
> >>
> >> FWIW, I think UTF-8 is a fine choice, especially if we've been internally
> >> handling this string as UTF-8 already.  But let's document the fact
> >> somewhere, please (preferably in the pre-lock-hook template comments).
> >>
> >
> > Looking again, svn_fs.h:2059 documents that lock tokens are URI's.
> > (and, consequently, are printable ASCII)
> >
> > We also need them to be XML-safe for their use in ra_dav LOCK responses.
> >
> > So, I'm thinking that:
> >
> > * We should validate the supplied lock tokens, not for being well-formed
> >  UTF-8 as in Ivan's change, but for being ASCII-only.
> >
> > * We should move the validation to the FS layer.
> >
> > Thoughts?
> >
> I'm fine with this approach, but I think we should leave explicit
> validation of lock tokens in pre-lock hook handling for better
> diagnostic.

I don't think that's necessary, given that we don't (and likely won't)
do anything between calling the pre-lock hook and calling svn_fs_lock().

Anyway, untested patch:

[[[
Index: subversion/libsvn_fs_fs/lock.c
===================================================================
--- subversion/libsvn_fs_fs/lock.c	(revision 1147872)
+++ subversion/libsvn_fs_fs/lock.c	(working copy)
@@ -950,7 +950,7 @@ svn_fs_fs__generate_lock_token(const char **token,
      want to use the fs UUID + some incremented number?  For now, we
      generate a URI that matches the DAV RFC.  We could change this to
      some other URI scheme someday, if we wish. */
-  *token = apr_pstrcat(pool, "opaquelocktoken:",
+  *token = apr_pstrcat(pool, "opaquelocktoken:", /* ### */
                        svn_uuid_generate(pool), (char *)NULL);
   return SVN_NO_ERROR;
 }
Index: subversion/libsvn_fs/fs-loader.c
===================================================================
--- subversion/libsvn_fs/fs-loader.c	(revision 1147872)
+++ subversion/libsvn_fs/fs-loader.c	(working copy)
@@ -28,6 +28,7 @@
 #include <apr_md5.h>
 #include <apr_thread_mutex.h>
 #include <apr_uuid.h>
+#include <apr_uri.h>
 
 #include "svn_types.h"
 #include "svn_dso.h"
@@ -1303,6 +1304,31 @@ svn_fs_lock(svn_lock_t **lock, svn_fs_t *fs, const
         return svn_error_create
           (SVN_ERR_XML_UNESCAPABLE_DATA, NULL,
            _("Lock comment contains illegal characters"));
+    }
+
+  /* Enforce that the token be an XML-safe URI. */
+  if (token)
+    {
+      apr_uri_t uri;
+      apr_status_t status;
+
+      status = apr_uri_parse(pool, token, &uri);
+      if (status)
+        return svn_error_createf(SVN_ERR_FS_BAD_LOCK_TOKEN,
+                                 svn_error_wrap_apr(status, NULL),
+                                 _("Can't parse token '%s' as a URI"),
+                                 token);
+
+      if (strcmp(uri.scheme, "opaquelocktoken"))
+        return svn_error_createf(SVN_ERR_FS_BAD_LOCK_TOKEN, NULL,
+                                 _("Lock token URI '%s' has bad scheme; "
+                                   "expected '%s'"),
+                                 token, "opaquelocktoken");
+                                   
+      if (! svn_xml_is_xml_safe(token, strlen(token)))
+        return svn_error_create(
+           SVN_ERR_FS_BAD_LOCK_TOKEN, NULL,
+           _("Lock token URI is not XML-safe"));
     }
 
   if (expiration_date < 0)
Index: subversion/libsvn_fs_base/lock.c
===================================================================
--- subversion/libsvn_fs_base/lock.c	(revision 1147872)
+++ subversion/libsvn_fs_base/lock.c	(working copy)
@@ -250,7 +250,7 @@ svn_fs_base__generate_lock_token(const char **toke
      want to use the fs UUID + some incremented number?  For now, we
      generate a URI that matches the DAV RFC.  We could change this to
      some other URI scheme someday, if we wish. */
-  *token = apr_pstrcat(pool, "opaquelocktoken:",
+  *token = apr_pstrcat(pool, "opaquelocktoken:", /* ### */
                        svn_uuid_generate(pool), (char *)NULL);
   return SVN_NO_ERROR;
 }
]]]

I'll commit something along these lines (after testing that it works and
so on) if I don't hear otherwise.

Re: Lock token encoding (was Re: svn commit: r1147882 - /subversion/trunk/subversion/libsvn_repos/hooks.c)

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Tue, Jul 19, 2011 at 21:46, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> C. Michael Pilato wrote on Tue, Jul 19, 2011 at 09:45:09 -0400:
>> On 07/19/2011 03:49 AM, Ivan Zhakov wrote:
>> > On Mon, Jul 18, 2011 at 23:50, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> >> Ivan notes on IRC that this fixes issues with non-ASCII lock tokens.
>> >> (At least, he reports errors in 'svn ls -v' over http://; but for me
>> >> 'ls' and 'info' work fine over svn://.)
>> >>
>> >> Are lock tokens even permitted to be non-ASCII?
>> >> (both backends generate ASCII-only lock tokens (in the same manner))
>> >>
>> >> If they are, are they required to be in UTF-8?  Or can they be, say,
>> >> arbitrary octet strings?
>> >>
>> > I didn't find explicit specification for lock token encoding, but it
>> > handled everywhere as normal utf-8 string.
>> >
>> > That's why I decided to implement minimum patch just to decide lock
>> > token returned from hook output to utf-8 string.
>> >
>> > But I'm fine to require lock token to be ASCII only string.
>>
>> Having this as an open question is unacceptable, so we need to make a
>> decision about it.  To my knowledge, ASCII is the only thing used for lock
>> tokens in the code we ship.  There's no telling what encoding could be in
>
> For the record, our code generates tokens in the form
> "opaquelocktoken:$UUID".
>
>> use if folks are taking advantage of the pre-lock-hook token dictation
>> feature -- Apache will give them a "C" locale environment, but sure anyone
>> who has advanced enough configuration to require taking advantage of this
>> feature is also setting the locale from inside those hooks.
>>
>> FWIW, I think UTF-8 is a fine choice, especially if we've been internally
>> handling this string as UTF-8 already.  But let's document the fact
>> somewhere, please (preferably in the pre-lock-hook template comments).
>>
>
> Looking again, svn_fs.h:2059 documents that lock tokens are URI's.
> (and, consequently, are printable ASCII)
>
> We also need them to be XML-safe for their use in ra_dav LOCK responses.
>
> So, I'm thinking that:
>
> * We should validate the supplied lock tokens, not for being well-formed
>  UTF-8 as in Ivan's change, but for being ASCII-only.
>
> * We should move the validation to the FS layer.
>
> Thoughts?
>
I'm fine with this approach, but I think we should leave explicit
validation of lock tokens in pre-lock hook handling for better
diagnostic.


-- 
Ivan Zhakov

Re: Lock token encoding (was Re: svn commit: r1147882 - /subversion/trunk/subversion/libsvn_repos/hooks.c)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
C. Michael Pilato wrote on Tue, Jul 19, 2011 at 09:45:09 -0400:
> On 07/19/2011 03:49 AM, Ivan Zhakov wrote:
> > On Mon, Jul 18, 2011 at 23:50, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> >> Ivan notes on IRC that this fixes issues with non-ASCII lock tokens.
> >> (At least, he reports errors in 'svn ls -v' over http://; but for me
> >> 'ls' and 'info' work fine over svn://.)
> >>
> >> Are lock tokens even permitted to be non-ASCII?
> >> (both backends generate ASCII-only lock tokens (in the same manner))
> >>
> >> If they are, are they required to be in UTF-8?  Or can they be, say,
> >> arbitrary octet strings?
> >>
> > I didn't find explicit specification for lock token encoding, but it
> > handled everywhere as normal utf-8 string.
> > 
> > That's why I decided to implement minimum patch just to decide lock
> > token returned from hook output to utf-8 string.
> > 
> > But I'm fine to require lock token to be ASCII only string.
> 
> Having this as an open question is unacceptable, so we need to make a
> decision about it.  To my knowledge, ASCII is the only thing used for lock
> tokens in the code we ship.  There's no telling what encoding could be in

For the record, our code generates tokens in the form
"opaquelocktoken:$UUID".

> use if folks are taking advantage of the pre-lock-hook token dictation
> feature -- Apache will give them a "C" locale environment, but sure anyone
> who has advanced enough configuration to require taking advantage of this
> feature is also setting the locale from inside those hooks.
> 
> FWIW, I think UTF-8 is a fine choice, especially if we've been internally
> handling this string as UTF-8 already.  But let's document the fact
> somewhere, please (preferably in the pre-lock-hook template comments).
> 

Looking again, svn_fs.h:2059 documents that lock tokens are URI's.
(and, consequently, are printable ASCII)

We also need them to be XML-safe for their use in ra_dav LOCK responses.

So, I'm thinking that:

* We should validate the supplied lock tokens, not for being well-formed
  UTF-8 as in Ivan's change, but for being ASCII-only.

* We should move the validation to the FS layer.

Thoughts?


> -- 
> C. Michael Pilato <cm...@collab.net>
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
> 



Re: Lock token encoding (was Re: svn commit: r1147882 - /subversion/trunk/subversion/libsvn_repos/hooks.c)

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 07/19/2011 03:49 AM, Ivan Zhakov wrote:
> On Mon, Jul 18, 2011 at 23:50, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> Ivan notes on IRC that this fixes issues with non-ASCII lock tokens.
>> (At least, he reports errors in 'svn ls -v' over http://; but for me
>> 'ls' and 'info' work fine over svn://.)
>>
>> Are lock tokens even permitted to be non-ASCII?
>> (both backends generate ASCII-only lock tokens (in the same manner))
>>
>> If they are, are they required to be in UTF-8?  Or can they be, say,
>> arbitrary octet strings?
>>
> I didn't find explicit specification for lock token encoding, but it
> handled everywhere as normal utf-8 string.
> 
> That's why I decided to implement minimum patch just to decide lock
> token returned from hook output to utf-8 string.
> 
> But I'm fine to require lock token to be ASCII only string.

Having this as an open question is unacceptable, so we need to make a
decision about it.  To my knowledge, ASCII is the only thing used for lock
tokens in the code we ship.  There's no telling what encoding could be in
use if folks are taking advantage of the pre-lock-hook token dictation
feature -- Apache will give them a "C" locale environment, but sure anyone
who has advanced enough configuration to require taking advantage of this
feature is also setting the locale from inside those hooks.

FWIW, I think UTF-8 is a fine choice, especially if we've been internally
handling this string as UTF-8 already.  But let's document the fact
somewhere, please (preferably in the pre-lock-hook template comments).

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand