You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by VK Sameer <sa...@collab.net> on 2004/12/09 06:23:57 UTC

[PATCH] issue 1954 - try 2

Hello,

This is still incomplete, but it should have covered all the review
points.

Thanks
Sameer

Re: [PATCH] issue 1954 - try 2

Posted by kf...@collab.net.
"Peter N. Lundblad" <pe...@famlundblad.se> writes:
> Rename this to svn_path_check_valid and drop the above one. INvalid paths
> are really not expected very often, so there is no point in cluttering the
> API with an extra function that just returns a boolean. Also, drop the
> UTF8 check in this function. Nearly all our APIs that take strings expect
> valid UTF8. Idally the servers should check *all* strings that are
> expected to be in UTF8 as a means of input validation. That's not unique
> to paths. Exporting the UTF8 validation, as you do later in the patch, is
> a good idea for this purpose.

Oh -- I hadn't read all of Peter's comments when I wrote my long reply
just now.  (I probably should have read Peter's post carefully first.)

I basically agree with everything Peter and Philip have said.  My main
point was just to reserve the name 'svn_path_is_valid' for a true
boolean function.  Peter's point about there being no special need for
paths to test UTF8-ness is a good one.

I don't know why I thought the the no_control_chars() function needed
a public interface.  It doesn't really even need an internal
interface, it can just be part of svn_path_check_valid().  All we need
to do is loop and check for control chars.  Much simpler :-).

-Karl


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1954 - try 2

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 9 Dec 2004, VK Sameer wrote:

Index: subversion/include/svn_path.h
===================================================================
--- subversion/include/svn_path.h	(revision 12257)
+++ subversion/include/svn_path.h	(working copy)
@@ -356,6 +356,39 @@
                                const char *path2,
                                apr_pool_t *pool);

+/**
+ * @since New in 1.2.
+ *
+ * Check whether @a path is a valid Subversion path
+ * @a pool is used for temporary allocation
+ * @a check_utf8 turns on additional checks
+ *

Use periods between the sentences above.

+ * A valid Subversion pathname is a UTF-8 string without control
+ * characters, except for SPACE (0x20). "Valid" means Subversion

When did space become a control character? I know you got this docstring
in a review, but it's still not correct in this regard.

+ * can store the pathname in a repository. There may be other
+ * OS-specific limitations on what paths can be represented in
+ * a working copy.
+ */
+svn_boolean_t svn_path_is_valid (const char *path,
+                                 apr_pool_t *pool,
+                                 svn_boolean_t check_utf8);
+
+/**
+ * @since New in 1.2.
+ *
+ * Check whether @a path is a valid Subversion path and generate
+ * error if invalid. See docstring of svn_path_is_valid() for
+ * definition of valid pathname
+ * @a pool is used for temporary allocation
+ * @a check_utf8 turns on additional checks
+ *
+ * returns SVN_NO_ERROR if valid
+ * returns SVN_ERR_FS_PATH_SYNTAX if invalid
+ */
+svn_error_t *svn_path_error_if_invalid (const char *path,
+                                        apr_pool_t *pool,
+                                        svn_boolean_t check_utf8);
+
Rename this to svn_path_check_valid and drop the above one. INvalid paths
are really not expected very often, so there is no point in cluttering the
API with an extra function that just returns a boolean. Also, drop the
UTF8 check in this function. Nearly all our APIs that take strings expect
valid UTF8. Idally the servers should check *all* strings that are
expected to be in UTF8 as a means of input validation. That's not unique
to paths. Exporting the UTF8 validation, as you do later in the patch, is
a good idea for this purpose.

I agree with philip regarding the rest of the patch.

Thansk for taking this,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1954 - try 2

Posted by kf...@collab.net.
Philip Martin <ph...@codematters.co.uk> writes:
> The reason to have a function return a boolean is because we expect
> the "error" to occur, we intend to handle it, and we don't want the
> overhead of creating and destroying an error.  I don't think that
> applies to this stuff, the error is likely to be irrecoverable.  So I
> guess we don't need the boolean function.

We can dispense with the boolean function for now, and just have one
public interface, that returns a specific SVN_ERR_ error if the path
is invalid.  If someday we want to ask the question in a
non-error-generating way, we can add the boolean interface then.

> The function name svn_path_error_if_invalid is ugly.

How about svn_path_check_valid()?

I would prefer not to use up the intuitive boolean interface name
"svn_path_is_valid" on an error-generating function.  The vast
majority of Subversion uses

   svn_foo_is_bar()

for a standard boolean-by-reference form.  If we ever want to add a
boolean interface for path validity, we'll want that name available.
Therefore, let's give the error-generating version a different name.

It doesn't have to be "svn_path_error_if_invalid".  As long as it's
*not* "svn_path_is_valid", I'm happy :-).  "svn_path_check_valid" is
fine, no doubt there are many other possible names.

> I'm guessing we don't need the check_utf8 flag.  You have already
> added a new public interface to the existing check-UTF-8 function, so
> all we really need is a new check-control-character-function, one that
> returns an error.  Where it is know that the path is UTF-8 just call
> the second, if the path is not known to be UTF-8 call both.

I think it would be good to have a single svn_path_foo() function to
do this.  I'm assuming it would look something like this:

   svn_error_t *
   svn_path_check_valid (const char *path, apr_pool_t *pool)
   {
     svn_error_t *err;

     err = svn_utf_check_cstring_utf8 (path, pool);
     if (err && (err->apr_err == SVN_ERR_BLAH_NOT_UTF8))
       {
         return svn_error_createf
                (SVN_ERR_PATH_INVALID,
                "Path '%s' is not encoded in UTF8",
                svn_some_fuzzy_conversion_function (path, pool));
       }
     else if (err)
       {
         return err;
       }

     err = svn_utf_check_cstring_no_controls (path, pool);
     if (err && (err->apr_err == SVN_ERR_BLAH_HAS_CONTROL_CHARS))
       {
         return svn_error_createf
                (SVN_ERR_PATH_INVALID,
                "Path '%s' is not encoded in UTF8",
                svn_some_fuzzy_conversion_function (path, pool));
       }
     else if (err)
       {
         return err;
       }

     return SVN_NO_ERROR;
   }

(Untested code, and the error names are improvised, of course).

True, this ends up resulting in three new public interfaces:

   svn_utf_check_cstring_utf8()
   svn_utf_check_cstring_no_controls()
   svn_path_check_valid()

But the calling sites would become very readable -- they wouldn't even
need comments, because the name of the function will say everything
that needs to be said.  That seems like a bargain.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1954 - try 2

Posted by Philip Martin <ph...@codematters.co.uk>.
VK Sameer <sa...@collab.net> writes:

> --- subversion/include/svn_path.h	(revision 12257)
> +++ subversion/include/svn_path.h	(working copy)
> @@ -356,6 +356,39 @@
>                                 const char *path2,
>                                 apr_pool_t *pool);
>  
> +/**
> + * @since New in 1.2.
> + *
> + * Check whether @a path is a valid Subversion path
> + * @a pool is used for temporary allocation
> + * @a check_utf8 turns on additional checks
> + *
> + * A valid Subversion pathname is a UTF-8 string without control
> + * characters, except for SPACE (0x20). "Valid" means Subversion
> + * can store the pathname in a repository. There may be other
> + * OS-specific limitations on what paths can be represented in
> + * a working copy.
> + */
> +svn_boolean_t svn_path_is_valid (const char *path,
> +                                 apr_pool_t *pool,
> +                                 svn_boolean_t check_utf8);

Most of our interfaces have pool as the last parameter.

> +
> +/**
> + * @since New in 1.2.
> + *
> + * Check whether @a path is a valid Subversion path and generate
> + * error if invalid. See docstring of svn_path_is_valid() for
> + * definition of valid pathname
> + * @a pool is used for temporary allocation
> + * @a check_utf8 turns on additional checks
> + *
> + * returns SVN_NO_ERROR if valid
> + * returns SVN_ERR_FS_PATH_SYNTAX if invalid
> + */
> +svn_error_t *svn_path_error_if_invalid (const char *path,
> +                                        apr_pool_t *pool,
> +                                        svn_boolean_t check_utf8);

The function name svn_path_error_if_invalid is ugly.

> +
>  
>  /** URI/URL stuff
>   *
> Index: subversion/include/svn_utf.h
> ===================================================================
> --- subversion/include/svn_utf.h	(revision 12257)
> +++ subversion/include/svn_utf.h	(working copy)
> @@ -171,6 +171,16 @@
>                                                 const svn_string_t *src,
>                                                 apr_pool_t *pool);
>  
> +/** @since New in 1.2
> + *
> + * Public interface around check_cstring_utf8, which checks

The public don't care about check_cstring_utf8, so if that function
has any important documentation it should be here.

> + * whether the NULL-terminated sequence @a DATA is valid UTF-8
> + * 
> + * returns SVN_NO_ERROR if valid
> + * returns APR_EINVAL   if invalid

I don't think Subversion should generate APR_XXX errors, although
that's probably the existing code rather than your stuff.

> + */
> +svn_error_t *svn_utf_check_cstring_utf8 (const char *data, apr_pool_t *pool);

That makes three new public API functions just to check paths!

> +
>  #ifdef __cplusplus
>  }
>  #endif /* __cplusplus */
> Index: subversion/libsvn_subr/utf.c
> ===================================================================
> --- subversion/libsvn_subr/utf.c	(revision 12257)
> +++ subversion/libsvn_subr/utf.c	(working copy)
> @@ -476,6 +476,19 @@
>    return SVN_NO_ERROR;
>  }
>  
> +/* @since New in 1.2
> + *
> + * Public interface around check_cstring_utf8, which checks
> + * whether the NULL-terminated sequence @a DATA is valid UTF-8
> + * 
> + * returns SVN_NO_ERROR if valid
> + * returns APR_EINVAL   if invalid
> + */

Better not to duplicate the docstring.

> +svn_error_t *
> +svn_utf_check_cstring_utf8 (const char *data, apr_pool_t *pool)
> +{
> +  return (check_cstring_utf8 (data, pool));

Is there any point keeping check_cstring_utf8?

> +}
>  
>  svn_error_t *
>  svn_utf_stringbuf_to_utf8 (svn_stringbuf_t **dest,
> Index: subversion/libsvn_subr/path.c
> ===================================================================
> --- subversion/libsvn_subr/path.c	(revision 12257)
> +++ subversion/libsvn_subr/path.c	(working copy)
> @@ -29,6 +29,7 @@
>  #include "svn_private_config.h"         /* for SVN_PATH_LOCAL_SEPARATOR */
>  #include "svn_utf.h"
>  #include "svn_io.h"                     /* for svn_io_stat() */
> +#include "svn_ctype.h"
>  
>  
>  /* The canonical empty path.  Can this be changed?  Well, change the empty
> @@ -1248,3 +1249,69 @@
>    else
>      return svn_utf_cstring_to_utf8 (path_utf8, path_apr, pool);
>  }
> +
> +/* catchall function that performs various checks on path
> + * and fills out params based on switches.
> + * If check_utf8 is TRUE, validates path for utf8 correctness
> + * Otherwise, only checks for existence of control chars.
> + * If generate_error is TRUE and path is invalid, an error message
> + * is returned in err
> + * (created to avoid repeating code in two functions,
> + * one which returns an error message and one which doesn't)
> + */
> +static svn_boolean_t
> +validate_svn_path (const char *path, apr_pool_t *pool,
> +                   svn_boolean_t check_utf8,
> +                   svn_boolean_t generate_error,
> +                   svn_error_t *err)

Huh?  You are passing an error in?

> +{
> +  const char *c = path;

I don't like initialisation spread out like this for no reason, for()
statements have a perfectly good initialisation clause.

> +
> +  if (check_utf8)
> +    {
> +      svn_error_t *utf_err = svn_utf_check_cstring_utf8 (path, pool);
> +      if (utf_err)
> +        {
> +          if (generate_error)
> +            err = svn_error_createf (SVN_ERR_FS_PATH_SYNTAX, utf_err,
> +                                     "Invalid UTF-8 in path: '%s'", 
> +                                     svn_path_local_style (path, pool));

You can't print that path it's invalid UTF-8.  It's academic since the
error never gets returned.

> +          else
> +            svn_error_clear (utf_err);
> +
> +          return FALSE;
> +        }
> +    }
> +
> +  for (; *c; c++)
> +    if (svn_ctype_iscntrl(*c))
> +      {
> +        if (generate_error)
> +          err = svn_error_createf (SVN_ERR_FS_PATH_SYNTAX, NULL,
> +                               "Invalid character '%x' in path '%s'",
> +                               *c,

char could be signed, control characters are probably less than
SCHAR_MAX but why take the risk?  Cast to unsigned instead.

> +                               svn_path_local_style (path, pool));
> +        return FALSE;
> +      }
> +
> +  return TRUE;
> +}
> +
> +svn_boolean_t
> +svn_path_is_valid (const char *path, apr_pool_t *pool,
> +                   svn_boolean_t check_utf8)
> +{
> +  svn_error_t *err = SVN_NO_ERROR;

Pointless initialisation.

> +
> +  return validate_svn_path (path, pool, check_utf8, FALSE, err);
> +}
> +
> +svn_error_t*
> +svn_path_error_if_invalid (const char *path, apr_pool_t *pool,
> +                           svn_boolean_t check_utf8)
> +{
> +  svn_error_t* err = SVN_NO_ERROR;
> +
> +  validate_svn_path (path, pool, check_utf8, TRUE, err);
> +  return err;

This always returns SVN_NO_ERROR.

> +}

The whole interface looks slightly odd, but I can't really put my
finger on the problem.  Do we really need two functions?  Do we really
need the check_utf8 flag?

The reason to have a function return a boolean is because we expect
the "error" to occur, we intend to handle it, and we don't want the
overhead of creating and destroying an error.  I don't think that
applies to this stuff, the error is likely to be irrecoverable.  So I
guess we don't need the boolean function.

I'm guessing we don't need the check_utf8 flag.  You have already
added a new public interface to the existing check-UTF-8 function, so
all we really need is a new check-control-character-function, one that
returns an error.  Where it is know that the path is UTF-8 just call
the second, if the path is not known to be UTF-8 call both.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org