You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Karl Fogel <kf...@red-bean.com> on 2008/08/27 20:23:44 UTC

Incorrectness in svn_opt__split_arg_at_peg_revision() [was: regression from 1.4.3 to 1.5.1 in handling filenames containg at (@) signs]

I think svn_opt__split_arg_at_peg_revision() is buggy.  This discovery
was spurred by a bug report (quoted at the end of this mail).

Here's the svn_opt__split_arg_at_peg_revision API:

   /* Extract the peg revision, if any, from UTF8_TARGET. Return the peg
    * revision in *PEG_REVISION and the true target portion in *TRUE_TARGET.
    * *PEG_REVISION will be an empty string if no peg revision is found.
    *
    * UTF8_TARGET need not be canonical. *TRUE_TARGET will not be canonical
    * unless UTF8_TARGET is.
    *
    * Note that *PEG_REVISION will still contain the '@' symbol as the first
    * character if a peg revision was found.
    *
    * All allocations are done in POOL.
    */
   svn_error_t *
   svn_opt__split_arg_at_peg_revision(const char **true_target,
                                      const char **peg_revision,
                                      const char *utf8_target,
                                      apr_pool_t *pool);

And here's how we call it in svn_client_args_to_target_array() in
libsvn_client/cmdline.c (indentation adjusted for email):

    /*
     * This is needed so that the target can be properly canonicalized,
     * otherwise the canonicalization does not treat a ".@BASE" as a "."
     * with a BASE peg revision, and it is not canonicalized to "@BASE".
     * If any peg revision exists, it is appended to the final
     * canonicalized path or URL.  Do not use svn_opt_parse_path()
     * because the resulting peg revision is a structure that would have
     * to be converted back into a string.  Converting from a string date
     * to the apr_time_t field in the svn_opt_revision_value_t and back to
     * a string would not necessarily preserve the exact bytes of the
     * input date, so its easier just to keep it in string form.
     */
    SVN_ERR(svn_opt__split_arg_at_peg_revision(&true_target, &peg_rev,
                                               utf8_target, pool));

And here's an annotated GDB session, starting from right before the
above call, showing the bugginess:

   (gdb) p utf8_target
   $6 = 0x9e47570 "dir/@file"

   ### Hmmm, I want to test what happens with "dir/@file@" instead,
   ### so I'll just set utf8_target to that: 

   (gdb) p utf8_target = "dir/@file@"
   $7 = 0x9e49988 "dir/@file@"

   ### Great.  Now, let's step into svn_opt__split_arg_at_peg_revision():

   (gdb) s
   svn_opt__split_arg_at_peg_revision (true_target=0xbfe058b8,             \
                                       peg_revision=0xbfe058b4,            \
                                       utf8_target=0x9e49988 "dir/@file@", \
                                       pool=0x9e45990)                     \
                                       at subversion/libsvn_subr/opt.c:1098
   (gdb) n
   (gdb) n
   (gdb) n
   (gdb) ...

   ### Etc, etc.  Watch as we get to this code near the end:
   ###
   ###    if (peg_start)
   ###      {
   ###        *true_target = apr_pstrmemdup(pool, utf8_target, j);
   ###        *peg_revision = apr_pstrdup(pool, peg_start);
   ###      }
   ###
   ### Whoa!  Surprisingly, 'peg_start' is non-NULL and non-"" here:

   (gdb) p peg_start
   $8 = 0x9e49991 "@"

   ### Now, why is that?  There shouldn't be any peg revision on this
   ### path; indeed, the whole point of the final '@' was to prevent
   ### there from being a peg revision.  But if you look at the
   ### for-loop earlier in svn_opt__split_arg_at_peg_revision(), you
   ### can see exactly why it wrongly thinks there is a peg revision
   ### (that is, a 'peg_start'): on the very first iteration of the
   ### loop, the following condition will be true, so we'll break out
   ### with a real 'peg_start':
   ###
   ###     if (utf8_target[j] == '@')
   ###       {
   ###         peg_start = &utf8_target[j];
   ###         break;
   ###       }
   ###
   ### And from this point on things are wrong, of course, though at
   ### least they're wrong in the expected way:

   (gdb) n
   (gdb) n
   (gdb) n
   svn_client_args_to_target_array (targets_p=0xbfe05988, \
                                    os=0x9e45b38,         \
                                    known_targets=0x0,    \
                                    ctx=0x9e46328,        \
                                    pool=0x9e45990)       \
                                    at subversion/libsvn_client/cmdline.c:229
   (gdb) p true_target
   $11 = 0x9e47598 "dir/@file"
   (gdb) p peg_rev
   $12 = 0x9e475a8 "@"

   ### Mmmm.  Yup.  So, 'peg_rev' should really be "", right?

Should I just make the obvious fix -- that is, set 'peg_start' to NULL
if we're on the first iteration of the loop -- or is there something
deeper going on here?

-Karl

scottnotrobot@gmail.com writes:
> in 1.4.3, this works:
>
> mkdir dir
> svn add dir
> svn ci -m test dir
> touch dir/@file
> svn add dir/@file
> svn ci -m test dir/@file
>
> but in 1.5.1, when trying to add @file, i get:
>
> svn: warning: 'dir@file' not found
>
> i also tried: "svn add dir/@file@", but i get:
>
> svn: warning: 'dir/@file@' not found

Yeah, there's a bug here, I think; see my long explanation above.

Also, note that the script below succeeds in adding and committing
"dir/@file@", because I adjust the 'touch' line to use that filename
too.  However, I shouldn't have to adjust the 'touch' line to use
"dir/@file@"!  Instead, Subversion should transform "dir/@file@" to
"dir/@file", as you understandably expected.

-----------------------------------------------------------------------
#!/bin/sh

# The next line is the only line you should need to adjust.
SVNDIR=/home/kfogel/src/subversion

SVN=${SVNDIR}/subversion/svn/svn
SVNADMIN=${SVNDIR}/subversion/svnadmin/svnadmin
URL=file:///`pwd`/repos

rm -rf repos wc
${SVNADMIN} create repos
${SVN} co -q ${URL} wc

cd wc
mkdir dir
${SVN} add dir
${SVN} ci -m test dir
touch dir/@file@
${SVN} add dir/@file@
${SVN} ci -m test dir/@file@
cd ..

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Incorrectness in svn_opt__split_arg_at_peg_revision()

Posted by Troy Curtis Jr <tr...@gmail.com>.
On Wed, Aug 27, 2008 at 8:44 PM,  <sc...@gmail.com> wrote:
> one more observation in case it helps... this problem seems limited to
> filenames with @ as the first character only... if the @ comes after the
> first, things seem ok to me:
>
> i.e. this works in 1.5.1:
>
> mkdir dir
> svn add dir
> svn ci -m test dir
> touch dir/a@file
> svn add dir/a@file
> svn ci -m test dir/a@file
>
> -Scott

Yep the reason here is that 'dir/@file' is split into 'dir/' and
'@file' (because the characters following the '@' symbol are detected
as a peg revision).  Then 'dir/' is canonicalized into 'dir' (trailing
slashes are not canonical).  Then it is stuck back together as
'dir@file' which of course doesn't exist.

dir/a@file is split into 'dir/a' and '@file', but now there is no
trailing slash for it to 'canonicalize' away so it is joined back
together as 'dir/a@file'.

Troy



-- 
"Beware of spyware. If you can, use the Firefox browser." - USA Today
Download now at http://getfirefox.com
Registered Linux User #354814 ( http://counter.li.org/)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Incorrectness in svn_opt__split_arg_at_peg_revision()

Posted by sc...@gmail.com.
one more observation in case it helps... this problem seems limited to 
filenames with @ as the first character only... if the @ comes after the 
first, things seem ok to me:

i.e. this works in 1.5.1:

mkdir dir
svn add dir
svn ci -m test dir
touch dir/a@file
svn add dir/a@file
svn ci -m test dir/a@file

-Scott

On Wed, 27 Aug 2008, Julian Foad wrote:

> On Wed, 2008-08-27 at 16:56 -0400, Karl Fogel wrote:
>> Julian Foad <ju...@btopenworld.com> writes:
>>> No. It is by design that this function returns the entire peg-specifier
>>> part of the input string, rather than a canonical peg specifier derived
>>> from it. The callers rely on this.
>>>
>>> Now, it's maybe not a great design, but that's how it is at the moment.
>>>
>>> And the doc string is not clear on this point.
>>
>> So in this instance, the caller is simply using the returned values
>> wrongly?  (Forgetting for the moment the possibility of improving the
>> design -- I'm just talking about whether the caller is working correctly
>> with the API as it stands today.)
>
> This instance being svn_client_args_to_target_array()? No, that's using
> it correctly. It just splits off the peg-specifier part of the string
> temporarily, canonicalises the rest, and the re-appends the
> peg-specifier string, to leave a result string which is still properly
> peg-escaped (i.e. may end with "@" or with "@PEGREV" or with neither,
> just as it did on input).
>
>> Who is supposed to be responsible for peg-rev escaping?  That is, if
>> someone puts "@" on the end of a path (presumably for the sole purpose
>> of establishing that there is *no* peg rev here), what API is
>> responsible for handling that?
>
> svn_opt_parse_path() - the API that parses the string into a path part
> and a peg. (Not this one that "splits" the string.)
>
> I notice now that svn_opt_parse_path()'s doc string doesn't mention its
> behaviour when just a trailing "@" is found, and furthermore the sole
> example of this shown in its doc string appears to be wrong, showing
> that the revision "base" would be returned in this case.
>
> - Julian
>
>
>> I would have thought it would be svn_opt__split_arg_at_peg_revision().
>> That is, it hadn't occurred to me to consider the final "@" to be "the
>> entire peg-specifier part of the input string".  Rather, I'd have
>> considered it a marker that indicates that there is no peg-specifier
>> here at all.  Clearly I'm wrong, but I still don't quite understand what
>> the system is supposed to be, then... Do you understand it?
>>
>> -Karl
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Incorrectness in svn_opt__split_arg_at_peg_revision()

Posted by Julian Foad <ju...@btopenworld.com>.
On Wed, 2008-08-27 at 16:56 -0400, Karl Fogel wrote:
> Julian Foad <ju...@btopenworld.com> writes:
> > No. It is by design that this function returns the entire peg-specifier
> > part of the input string, rather than a canonical peg specifier derived
> > from it. The callers rely on this.
> >
> > Now, it's maybe not a great design, but that's how it is at the moment.
> >
> > And the doc string is not clear on this point.
> 
> So in this instance, the caller is simply using the returned values
> wrongly?  (Forgetting for the moment the possibility of improving the
> design -- I'm just talking about whether the caller is working correctly
> with the API as it stands today.)

This instance being svn_client_args_to_target_array()? No, that's using
it correctly. It just splits off the peg-specifier part of the string
temporarily, canonicalises the rest, and the re-appends the
peg-specifier string, to leave a result string which is still properly
peg-escaped (i.e. may end with "@" or with "@PEGREV" or with neither,
just as it did on input).

> Who is supposed to be responsible for peg-rev escaping?  That is, if
> someone puts "@" on the end of a path (presumably for the sole purpose
> of establishing that there is *no* peg rev here), what API is
> responsible for handling that?

svn_opt_parse_path() - the API that parses the string into a path part
and a peg. (Not this one that "splits" the string.)

I notice now that svn_opt_parse_path()'s doc string doesn't mention its
behaviour when just a trailing "@" is found, and furthermore the sole
example of this shown in its doc string appears to be wrong, showing
that the revision "base" would be returned in this case.

- Julian


> I would have thought it would be svn_opt__split_arg_at_peg_revision().
> That is, it hadn't occurred to me to consider the final "@" to be "the
> entire peg-specifier part of the input string".  Rather, I'd have
> considered it a marker that indicates that there is no peg-specifier
> here at all.  Clearly I'm wrong, but I still don't quite understand what
> the system is supposed to be, then... Do you understand it?
> 
> -Karl




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Incorrectness in svn_opt__split_arg_at_peg_revision()

Posted by Karl Fogel <kf...@red-bean.com>.
Julian Foad <ju...@btopenworld.com> writes:
> No. It is by design that this function returns the entire peg-specifier
> part of the input string, rather than a canonical peg specifier derived
> from it. The callers rely on this.
>
> Now, it's maybe not a great design, but that's how it is at the moment.
>
> And the doc string is not clear on this point.

So in this instance, the caller is simply using the returned values
wrongly?  (Forgetting for the moment the possibility of improving the
design -- I'm just talking about whether the caller is working correctly
with the API as it stands today.)

Who is supposed to be responsible for peg-rev escaping?  That is, if
someone puts "@" on the end of a path (presumably for the sole purpose
of establishing that there is *no* peg rev here), what API is
responsible for handling that?

I would have thought it would be svn_opt__split_arg_at_peg_revision().
That is, it hadn't occurred to me to consider the final "@" to be "the
entire peg-specifier part of the input string".  Rather, I'd have
considered it a marker that indicates that there is no peg-specifier
here at all.  Clearly I'm wrong, but I still don't quite understand what
the system is supposed to be, then... Do you understand it?

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Incorrectness in svn_opt__split_arg_at_peg_revision() [was: regression from 1.4.3 to 1.5.1 in handling filenames containg at (@) signs]

Posted by Julian Foad <ju...@btopenworld.com>.
Karl Fogel wrote:
> I think svn_opt__split_arg_at_peg_revision() is buggy.  This discovery
> was spurred by a bug report (quoted at the end of this mail).
> 
> Here's the svn_opt__split_arg_at_peg_revision API:
> 
>    /* Extract the peg revision, if any, from UTF8_TARGET. Return the peg
>     * revision in *PEG_REVISION and the true target portion in *TRUE_TARGET.
>     * *PEG_REVISION will be an empty string if no peg revision is found.
>     *
>     * UTF8_TARGET need not be canonical. *TRUE_TARGET will not be canonical
>     * unless UTF8_TARGET is.
>     *
>     * Note that *PEG_REVISION will still contain the '@' symbol as the first
>     * character if a peg revision was found.
>     *
>     * All allocations are done in POOL.
>     */
>    svn_error_t *
>    svn_opt__split_arg_at_peg_revision(const char **true_target,
>                                       const char **peg_revision,
>                                       const char *utf8_target,
>                                       apr_pool_t *pool);
[...]
>     SVN_ERR(svn_opt__split_arg_at_peg_revision(&true_target, &peg_rev,
>                                                utf8_target, pool));
[...]
>    ### Hmmm, I want to test what happens with "dir/@file@" instead,
>    ### so I'll just set utf8_target to that: 
[...]
>    (gdb) p true_target
>    $11 = 0x9e47598 "dir/@file"
>    (gdb) p peg_rev
>    $12 = 0x9e475a8 "@"
> 
>    ### Mmmm.  Yup.  So, 'peg_rev' should really be "", right?

No. It is by design that this function returns the entire peg-specifier
part of the input string, rather than a canonical peg specifier derived
from it. The callers rely on this.

Now, it's maybe not a great design, but that's how it is at the moment.

And the doc string is not clear on this point.

- Julian


> Should I just make the obvious fix -- that is, set 'peg_start' to NULL
> if we're on the first iteration of the loop -- or is there something
> deeper going on here?
[...]



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Incorrectness in svn_opt__split_arg_at_peg_revision() [was: regression from 1.4.3 to 1.5.1 in handling filenames containg at (@) signs]

Posted by Troy Curtis Jr <tr...@gmail.com>.
>
>   (gdb) n
>   (gdb) n
>   (gdb) n
>   svn_client_args_to_target_array (targets_p=0xbfe05988, \
>                                    os=0x9e45b38,         \
>                                    known_targets=0x0,    \
>                                    ctx=0x9e46328,        \
>                                    pool=0x9e45990)       \
>                                    at subversion/libsvn_client/cmdline.c:229
>   (gdb) p true_target
>   $11 = 0x9e47598 "dir/@file"
>   (gdb) p peg_rev
>   $12 = 0x9e475a8 "@"
>
>   ### Mmmm.  Yup.  So, 'peg_rev' should really be "", right?
>
> Should I just make the obvious fix -- that is, set 'peg_start' to NULL
> if we're on the first iteration of the loop -- or is there something
> deeper going on here?
>
> -Karl
>

The problem there is with URLs containing '@'. i.e.

http://host/myrepo/@file

To see it you use

svn info http://host/myrepo/@file@

Which will work because the 'info' command calls svn_opt_parse_path()
on its arguments because it knows it might have a peg revision.   The
problem with the 'add' subcommand is that is does not call
svn_opt_parse_path().

So if svn_opt__split_arg_at_peg_revision() returns NULL, then it will
break the URL case (because now the string 'http://host/myrepo/@file'
is passed to svn_opt_parse_path() which tries to parse the peg rev
'file')

If svn_opt__split_arg_at_peg_revision() returns '@' for the empty
string, then svn_opt_parse_path() will always work.

I think the better solution would be changing all the subcommands to
use svn_opt_parse_path(), even if they can't ever really accept peg
revisions.  In fact, calling svn_opt_parse_path() in those cases might
provide much more clear error messages.  For those subcommands that
cannot accept peg revisions, svn_opt_parse_path() returning anything
other than svn_opt_revision_unspecified would be an error that could
say:
"Peg revisions are not valid for the 'add' command."
Instead of the current:
"Path not found 'dir/myfile@234'"
when it interprets the entire string as a path.

Troy
--
"Beware of spyware. If you can, use the Firefox browser." - USA Today
Download now at http://getfirefox.com
Registered Linux User #354814 ( http://counter.li.org/)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org