You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by VK Sameer <sa...@collab.net> on 2004/12/09 06:23:57 UTC
[PATCH] issue 1954 - try 2
Hello,
This is still incomplete, but it should have covered all the review
points.
Thanks
Sameer
Re: [PATCH] issue 1954 - try 2
Posted by kf...@collab.net.
"Peter N. Lundblad" <pe...@famlundblad.se> writes:
> Rename this to svn_path_check_valid and drop the above one. INvalid paths
> are really not expected very often, so there is no point in cluttering the
> API with an extra function that just returns a boolean. Also, drop the
> UTF8 check in this function. Nearly all our APIs that take strings expect
> valid UTF8. Idally the servers should check *all* strings that are
> expected to be in UTF8 as a means of input validation. That's not unique
> to paths. Exporting the UTF8 validation, as you do later in the patch, is
> a good idea for this purpose.
Oh -- I hadn't read all of Peter's comments when I wrote my long reply
just now. (I probably should have read Peter's post carefully first.)
I basically agree with everything Peter and Philip have said. My main
point was just to reserve the name 'svn_path_is_valid' for a true
boolean function. Peter's point about there being no special need for
paths to test UTF8-ness is a good one.
I don't know why I thought the the no_control_chars() function needed
a public interface. It doesn't really even need an internal
interface, it can just be part of svn_path_check_valid(). All we need
to do is loop and check for control chars. Much simpler :-).
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [PATCH] issue 1954 - try 2
Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 9 Dec 2004, VK Sameer wrote:
Index: subversion/include/svn_path.h
===================================================================
--- subversion/include/svn_path.h (revision 12257)
+++ subversion/include/svn_path.h (working copy)
@@ -356,6 +356,39 @@
const char *path2,
apr_pool_t *pool);
+/**
+ * @since New in 1.2.
+ *
+ * Check whether @a path is a valid Subversion path
+ * @a pool is used for temporary allocation
+ * @a check_utf8 turns on additional checks
+ *
Use periods between the sentences above.
+ * A valid Subversion pathname is a UTF-8 string without control
+ * characters, except for SPACE (0x20). "Valid" means Subversion
When did space become a control character? I know you got this docstring
in a review, but it's still not correct in this regard.
+ * can store the pathname in a repository. There may be other
+ * OS-specific limitations on what paths can be represented in
+ * a working copy.
+ */
+svn_boolean_t svn_path_is_valid (const char *path,
+ apr_pool_t *pool,
+ svn_boolean_t check_utf8);
+
+/**
+ * @since New in 1.2.
+ *
+ * Check whether @a path is a valid Subversion path and generate
+ * error if invalid. See docstring of svn_path_is_valid() for
+ * definition of valid pathname
+ * @a pool is used for temporary allocation
+ * @a check_utf8 turns on additional checks
+ *
+ * returns SVN_NO_ERROR if valid
+ * returns SVN_ERR_FS_PATH_SYNTAX if invalid
+ */
+svn_error_t *svn_path_error_if_invalid (const char *path,
+ apr_pool_t *pool,
+ svn_boolean_t check_utf8);
+
Rename this to svn_path_check_valid and drop the above one. INvalid paths
are really not expected very often, so there is no point in cluttering the
API with an extra function that just returns a boolean. Also, drop the
UTF8 check in this function. Nearly all our APIs that take strings expect
valid UTF8. Idally the servers should check *all* strings that are
expected to be in UTF8 as a means of input validation. That's not unique
to paths. Exporting the UTF8 validation, as you do later in the patch, is
a good idea for this purpose.
I agree with philip regarding the rest of the patch.
Thansk for taking this,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [PATCH] issue 1954 - try 2
Posted by kf...@collab.net.
Philip Martin <ph...@codematters.co.uk> writes:
> The reason to have a function return a boolean is because we expect
> the "error" to occur, we intend to handle it, and we don't want the
> overhead of creating and destroying an error. I don't think that
> applies to this stuff, the error is likely to be irrecoverable. So I
> guess we don't need the boolean function.
We can dispense with the boolean function for now, and just have one
public interface, that returns a specific SVN_ERR_ error if the path
is invalid. If someday we want to ask the question in a
non-error-generating way, we can add the boolean interface then.
> The function name svn_path_error_if_invalid is ugly.
How about svn_path_check_valid()?
I would prefer not to use up the intuitive boolean interface name
"svn_path_is_valid" on an error-generating function. The vast
majority of Subversion uses
svn_foo_is_bar()
for a standard boolean-by-reference form. If we ever want to add a
boolean interface for path validity, we'll want that name available.
Therefore, let's give the error-generating version a different name.
It doesn't have to be "svn_path_error_if_invalid". As long as it's
*not* "svn_path_is_valid", I'm happy :-). "svn_path_check_valid" is
fine, no doubt there are many other possible names.
> I'm guessing we don't need the check_utf8 flag. You have already
> added a new public interface to the existing check-UTF-8 function, so
> all we really need is a new check-control-character-function, one that
> returns an error. Where it is know that the path is UTF-8 just call
> the second, if the path is not known to be UTF-8 call both.
I think it would be good to have a single svn_path_foo() function to
do this. I'm assuming it would look something like this:
svn_error_t *
svn_path_check_valid (const char *path, apr_pool_t *pool)
{
svn_error_t *err;
err = svn_utf_check_cstring_utf8 (path, pool);
if (err && (err->apr_err == SVN_ERR_BLAH_NOT_UTF8))
{
return svn_error_createf
(SVN_ERR_PATH_INVALID,
"Path '%s' is not encoded in UTF8",
svn_some_fuzzy_conversion_function (path, pool));
}
else if (err)
{
return err;
}
err = svn_utf_check_cstring_no_controls (path, pool);
if (err && (err->apr_err == SVN_ERR_BLAH_HAS_CONTROL_CHARS))
{
return svn_error_createf
(SVN_ERR_PATH_INVALID,
"Path '%s' is not encoded in UTF8",
svn_some_fuzzy_conversion_function (path, pool));
}
else if (err)
{
return err;
}
return SVN_NO_ERROR;
}
(Untested code, and the error names are improvised, of course).
True, this ends up resulting in three new public interfaces:
svn_utf_check_cstring_utf8()
svn_utf_check_cstring_no_controls()
svn_path_check_valid()
But the calling sites would become very readable -- they wouldn't even
need comments, because the name of the function will say everything
that needs to be said. That seems like a bargain.
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: [PATCH] issue 1954 - try 2
Posted by Philip Martin <ph...@codematters.co.uk>.
VK Sameer <sa...@collab.net> writes:
> --- subversion/include/svn_path.h (revision 12257)
> +++ subversion/include/svn_path.h (working copy)
> @@ -356,6 +356,39 @@
> const char *path2,
> apr_pool_t *pool);
>
> +/**
> + * @since New in 1.2.
> + *
> + * Check whether @a path is a valid Subversion path
> + * @a pool is used for temporary allocation
> + * @a check_utf8 turns on additional checks
> + *
> + * A valid Subversion pathname is a UTF-8 string without control
> + * characters, except for SPACE (0x20). "Valid" means Subversion
> + * can store the pathname in a repository. There may be other
> + * OS-specific limitations on what paths can be represented in
> + * a working copy.
> + */
> +svn_boolean_t svn_path_is_valid (const char *path,
> + apr_pool_t *pool,
> + svn_boolean_t check_utf8);
Most of our interfaces have pool as the last parameter.
> +
> +/**
> + * @since New in 1.2.
> + *
> + * Check whether @a path is a valid Subversion path and generate
> + * error if invalid. See docstring of svn_path_is_valid() for
> + * definition of valid pathname
> + * @a pool is used for temporary allocation
> + * @a check_utf8 turns on additional checks
> + *
> + * returns SVN_NO_ERROR if valid
> + * returns SVN_ERR_FS_PATH_SYNTAX if invalid
> + */
> +svn_error_t *svn_path_error_if_invalid (const char *path,
> + apr_pool_t *pool,
> + svn_boolean_t check_utf8);
The function name svn_path_error_if_invalid is ugly.
> +
>
> /** URI/URL stuff
> *
> Index: subversion/include/svn_utf.h
> ===================================================================
> --- subversion/include/svn_utf.h (revision 12257)
> +++ subversion/include/svn_utf.h (working copy)
> @@ -171,6 +171,16 @@
> const svn_string_t *src,
> apr_pool_t *pool);
>
> +/** @since New in 1.2
> + *
> + * Public interface around check_cstring_utf8, which checks
The public don't care about check_cstring_utf8, so if that function
has any important documentation it should be here.
> + * whether the NULL-terminated sequence @a DATA is valid UTF-8
> + *
> + * returns SVN_NO_ERROR if valid
> + * returns APR_EINVAL if invalid
I don't think Subversion should generate APR_XXX errors, although
that's probably the existing code rather than your stuff.
> + */
> +svn_error_t *svn_utf_check_cstring_utf8 (const char *data, apr_pool_t *pool);
That makes three new public API functions just to check paths!
> +
> #ifdef __cplusplus
> }
> #endif /* __cplusplus */
> Index: subversion/libsvn_subr/utf.c
> ===================================================================
> --- subversion/libsvn_subr/utf.c (revision 12257)
> +++ subversion/libsvn_subr/utf.c (working copy)
> @@ -476,6 +476,19 @@
> return SVN_NO_ERROR;
> }
>
> +/* @since New in 1.2
> + *
> + * Public interface around check_cstring_utf8, which checks
> + * whether the NULL-terminated sequence @a DATA is valid UTF-8
> + *
> + * returns SVN_NO_ERROR if valid
> + * returns APR_EINVAL if invalid
> + */
Better not to duplicate the docstring.
> +svn_error_t *
> +svn_utf_check_cstring_utf8 (const char *data, apr_pool_t *pool)
> +{
> + return (check_cstring_utf8 (data, pool));
Is there any point keeping check_cstring_utf8?
> +}
>
> svn_error_t *
> svn_utf_stringbuf_to_utf8 (svn_stringbuf_t **dest,
> Index: subversion/libsvn_subr/path.c
> ===================================================================
> --- subversion/libsvn_subr/path.c (revision 12257)
> +++ subversion/libsvn_subr/path.c (working copy)
> @@ -29,6 +29,7 @@
> #include "svn_private_config.h" /* for SVN_PATH_LOCAL_SEPARATOR */
> #include "svn_utf.h"
> #include "svn_io.h" /* for svn_io_stat() */
> +#include "svn_ctype.h"
>
>
> /* The canonical empty path. Can this be changed? Well, change the empty
> @@ -1248,3 +1249,69 @@
> else
> return svn_utf_cstring_to_utf8 (path_utf8, path_apr, pool);
> }
> +
> +/* catchall function that performs various checks on path
> + * and fills out params based on switches.
> + * If check_utf8 is TRUE, validates path for utf8 correctness
> + * Otherwise, only checks for existence of control chars.
> + * If generate_error is TRUE and path is invalid, an error message
> + * is returned in err
> + * (created to avoid repeating code in two functions,
> + * one which returns an error message and one which doesn't)
> + */
> +static svn_boolean_t
> +validate_svn_path (const char *path, apr_pool_t *pool,
> + svn_boolean_t check_utf8,
> + svn_boolean_t generate_error,
> + svn_error_t *err)
Huh? You are passing an error in?
> +{
> + const char *c = path;
I don't like initialisation spread out like this for no reason, for()
statements have a perfectly good initialisation clause.
> +
> + if (check_utf8)
> + {
> + svn_error_t *utf_err = svn_utf_check_cstring_utf8 (path, pool);
> + if (utf_err)
> + {
> + if (generate_error)
> + err = svn_error_createf (SVN_ERR_FS_PATH_SYNTAX, utf_err,
> + "Invalid UTF-8 in path: '%s'",
> + svn_path_local_style (path, pool));
You can't print that path it's invalid UTF-8. It's academic since the
error never gets returned.
> + else
> + svn_error_clear (utf_err);
> +
> + return FALSE;
> + }
> + }
> +
> + for (; *c; c++)
> + if (svn_ctype_iscntrl(*c))
> + {
> + if (generate_error)
> + err = svn_error_createf (SVN_ERR_FS_PATH_SYNTAX, NULL,
> + "Invalid character '%x' in path '%s'",
> + *c,
char could be signed, control characters are probably less than
SCHAR_MAX but why take the risk? Cast to unsigned instead.
> + svn_path_local_style (path, pool));
> + return FALSE;
> + }
> +
> + return TRUE;
> +}
> +
> +svn_boolean_t
> +svn_path_is_valid (const char *path, apr_pool_t *pool,
> + svn_boolean_t check_utf8)
> +{
> + svn_error_t *err = SVN_NO_ERROR;
Pointless initialisation.
> +
> + return validate_svn_path (path, pool, check_utf8, FALSE, err);
> +}
> +
> +svn_error_t*
> +svn_path_error_if_invalid (const char *path, apr_pool_t *pool,
> + svn_boolean_t check_utf8)
> +{
> + svn_error_t* err = SVN_NO_ERROR;
> +
> + validate_svn_path (path, pool, check_utf8, TRUE, err);
> + return err;
This always returns SVN_NO_ERROR.
> +}
The whole interface looks slightly odd, but I can't really put my
finger on the problem. Do we really need two functions? Do we really
need the check_utf8 flag?
The reason to have a function return a boolean is because we expect
the "error" to occur, we intend to handle it, and we don't want the
overhead of creating and destroying an error. I don't think that
applies to this stuff, the error is likely to be irrecoverable. So I
guess we don't need the boolean function.
I'm guessing we don't need the check_utf8 flag. You have already
added a new public interface to the existing check-UTF-8 function, so
all we really need is a new check-control-character-function, one that
returns an error. Where it is know that the path is UTF-8 just call
the second, if the path is not known to be UTF-8 call both.
--
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org