You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2006/02/12 20:13:49 UTC

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Lieven Govaerts wrote:
> 
> I updated my patch of last week to fix issue 1780. 

Thanks.

> The algorithm implemented here isn't changed, so this patch will not solve
> the problem of usernames starting or ending with '$'. It only solves user-
> or filenames containing a '$'.

More precisely, it solves the problem of user names or URLs containing dollar 
signs, unless a dollar sign is the first character or follows a space.

I think this level of functionality is worth having, and I would commit it 
except that I have two concerns with the implementation.

1) It fails (with an assert) when searching for a keyword in a line which is 
longer than SVN_KEYWORD_MAX_LEN.  Although we don't support arbitrarily long 
keywords, we must continue to support arbitrarily long lines.  To reproduce, 
just let it try keyword translation on a line consisting of "$" plus (MAX - 2) 
spaces plus "$$".

2) It takes order(N-squared) time to process a line containing N dollar signs. 
  On my system I found many text files with more than 25 dollar signs in the 
same line, including:

* Text representations of binary data, e.g. /etc/X11/xdm/pixmaps/xorg.xpm
* Shell scripts, makefiles, etc., e.g. subversion/build/libtool.m4
* Config files referencing env-vars, e.g. .kde/share/config/kdeglobals
* Our test for cases like this: subversion/tests/libsvn_wc/translate-test.c

I wouldn't think it strange for someone to use a line of about eighty dollar 
signs to separate sections in a text file.  Processing ten lines, each 
consisting of 250 dollar signs, took over a second on my rather slow system. 
Certainly that's not a huge amount of time for a rare case, but we ought to 
cope well with unusual cases if we can.

I note that "svn status" processes the locally-modified file every time I run 
"svn status"; that's a separate problem that we might want to look at.

The bug (1) should be straightforward to fix.  I'm not sure what to say about 
the efficiency issue (2).  I'd very much like to see it improved, but I might 
not insist on it.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Julian Foad <ju...@btopenworld.com>.
Lieven Govaerts wrote:
> Aha, this makes sense, and explains why random replacing of code didn't
> trigger the tests to fail.
[...]
> The new tests seem to confirm your assumption on
> how the algorithm should behave, so I include them in the attached patch.

I've committed this as r18465, with just a small tweak (I changed a "*len" 
parameter to "len" where it was used only as input).

Thank you very much for your work.


Other folks, should I close issue #1780 ("Keyword values with dollar signs 
cause badness"), saying something like:

> Committed a patch by Lieven Govaerts that mostly fixes this (r18465).  Any '$' 
> signs in a keyword value are handled correctly except as the first character or 
> immediately after a space, which is all that can be done without a quoting 
> mechanism.
> 
> This fix covers most likely occurrences, and therefore I'm marking this issue as 
> fixed.  If some more extensive fix is desired, perhaps involving an escaping 
> mechanism or limiting the set of valid keyword values, that should be filed as a 
> separate enhancement.

... or should I leave it open, changing it to Enhancement and perhaps lowering 
the priority or setting the target milestone to "2.0"?

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Lieven Govaerts <lg...@mobsol.be>.
Aha, this makes sense, and explains why random replacing of code didn't
trigger the tests to fail.

I changed the test lines 82-84 to contain the $$$ signs. 

When I run the tests now, I see that we need the '>=' instead of '>' (
translate-test run will crash otherwise ). Moving the test "b->keyword_off
>= SVN_KEYWORD_MAX_LEN" before the translate will fail atleast the
translate-test in line 80. The new tests seem to confirm your assumption on
how the algorithm should behave, so I include them in the attached patch.

Thanks for taking your time on this patch.

Lieven.

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@btopenworld.com] 
> Sent: dinsdag 14 februari 2006 23:52
> To: Lieven Govaerts
> Cc: dev@subversion.tigris.org
> Subject: Re: [PATCH] issue 1780: Keyword values with dollar 
> signs causebadness.
> 
> Lieven Govaerts wrote:
> >>From: Julian Foad [mailto:julianfoad@btopenworld.com]
> > 
> >>>              if (b->keyword_off > SVN_KEYWORD_MAX_LEN ||
> >>
> >>Should that be ">="?  By the time it's greater, we'll already have 
> >>written at least one character off the end of the buffer.
> > 
> > I don't think so. b-keyword_off was post-incremented in the 
> previous 
> > line, so we can't write off the end of the buffer.
> 
> If it's MAX+1 at this point, then the character was written 
> into buffer[MAX], which is off the end of the buffer:
> 
>    char keyword_buf[SVN_KEYWORD_MAX_LEN];
> 
>    b->keyword_buf[b->keyword_off++] = *p++;
> 
> Yes?  (I notice that the "keyword_name" buffer has a length 
> of MAX+1, but "keyword_buf" doesn't.  I don't think 
> keyword_name needs to be that long; I think MAX-2 characters, 
> plus one for a null, is enough.)
> 
> 
> >>Check what's going to happen at the boundary condition: 
> will it do the 
> >>right thing for a valid keyword-pattern of exactly MAX_LEN?  
> >>MAX_LEN-1?  MAX_LEN+1?
> >>What about a non-keyword pattern of those lengths?
> > 
> > I've added lines 76-82 in translate-test.c for testing for these 
> > border cases. They all seem to work as expected.
> 
> Thanks, but I don't think the too-long patterns (lines 81 and 
> 82) are caught by your code, they're caught by the subsequent 
> existing test (because a non-'$' 
> character appears at position MAX).  I think you need dollar 
> signs in the 254th/255th/256th positions so that your code is 
> executed at those positions. 
> Something like:
> 
>    $Author: aaaaaa$$$ $
> 
> ... where the length up to the end of "$$$" is 254/255/256.
> 
> 
> >>I think "b->keyword_off >= SVN_KEYWORD_MAX_LEN" will need 
> to go after 
> >>"translate_keyword()", but see what you think.
> > 
> > No problem with that.
> > 
> > However, I don't see a change in behaviour when I implement the >= 
> > change nor the line-order change.
> 
> This is what I expect:
> 
> If we change ">" to ">=" only, then a keyword of length 
> exactly MAX will not be processed because "off >= MAX" will 
> be true and will prevent the "translate_keyword" call.
> 
> If we change ">" to ">=" and move the test to after the 
> "translate_keyword" 
> call, then the keyword of length MAX will be processed.
> 
> 
> > The new tests should have captured this change. I guess that 
> > b->keyword_off == SVN_KEYWORD_MAX_LEN will not be reached there 
> > because it's already handled earlier in the else block at line 998.
> 
> > The attached patch is based on yours and includes these two changes 
> > and some extra tests, no new changes in the algorithm itself.
> 
> Thanks.  Let me know what you think about the above.
> 
> - Julian

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Julian Foad <ju...@btopenworld.com>.
Lieven Govaerts wrote:
>>From: Julian Foad [mailto:julianfoad@btopenworld.com] 
> 
>>>              if (b->keyword_off > SVN_KEYWORD_MAX_LEN ||
>>
>>Should that be ">="?  By the time it's greater, we'll already 
>>have written at least one character off the end of the buffer.
> 
> I don't think so. b-keyword_off was post-incremented in the previous line,
> so we can't write off the end of the buffer. 

If it's MAX+1 at this point, then the character was written into buffer[MAX], 
which is off the end of the buffer:

   char keyword_buf[SVN_KEYWORD_MAX_LEN];

   b->keyword_buf[b->keyword_off++] = *p++;

Yes?  (I notice that the "keyword_name" buffer has a length of MAX+1, but 
"keyword_buf" doesn't.  I don't think keyword_name needs to be that long; I 
think MAX-2 characters, plus one for a null, is enough.)


>>Check what's going to happen at the boundary condition: will 
>>it do the right thing for a valid keyword-pattern of exactly 
>>MAX_LEN?  MAX_LEN-1?  MAX_LEN+1? 
>>What about a non-keyword pattern of those lengths?
> 
> I've added lines 76-82 in translate-test.c for testing for these border
> cases. They all seem to work as expected. 

Thanks, but I don't think the too-long patterns (lines 81 and 82) are caught by 
your code, they're caught by the subsequent existing test (because a non-'$' 
character appears at position MAX).  I think you need dollar signs in the 
254th/255th/256th positions so that your code is executed at those positions. 
Something like:

   $Author: aaaaaa$$$ $

... where the length up to the end of "$$$" is 254/255/256.


>>I think "b->keyword_off >= SVN_KEYWORD_MAX_LEN" will need to 
>>go after "translate_keyword()", but see what you think.
> 
> No problem with that.
> 
> However, I don't see a change in behaviour when I implement the >= change
> nor the line-order change.

This is what I expect:

If we change ">" to ">=" only, then a keyword of length exactly MAX will not be 
processed because "off >= MAX" will be true and will prevent the 
"translate_keyword" call.

If we change ">" to ">=" and move the test to after the "translate_keyword" 
call, then the keyword of length MAX will be processed.


> The new tests should have captured this change. I
> guess that b->keyword_off == SVN_KEYWORD_MAX_LEN will not be reached there
> because it's already handled earlier in the else block at line 998.

> The attached patch is based on yours and includes these two changes and some
> extra tests, no new changes in the algorithm itself.

Thanks.  Let me know what you think about the above.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Lieven Govaerts <lg...@mobsol.be>.
 

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@btopenworld.com] 

> 
> >               if (b->keyword_off > SVN_KEYWORD_MAX_LEN ||
> 
> Should that be ">="?  By the time it's greater, we'll already 
> have written at least one character off the end of the buffer.

I don't think so. b-keyword_off was post-incremented in the previous line,
so we can't write off the end of the buffer. 

> Check what's going to happen at the boundary condition: will 
> it do the right thing for a valid keyword-pattern of exactly 
> MAX_LEN?  MAX_LEN-1?  MAX_LEN+1? 
> What about a non-keyword pattern of those lengths?

I've added lines 76-82 in translate-test.c for testing for these border
cases. They all seem to work as expected. 

> I think "b->keyword_off >= SVN_KEYWORD_MAX_LEN" will need to 
> go after "translate_keyword()", but see what you think.

No problem with that.

However, I don't see a change in behaviour when I implement the >= change
nor the line-order change. The new tests should have captured this change. I
guess that b->keyword_off == SVN_KEYWORD_MAX_LEN will not be reached there
because it's already handled earlier in the else block at line 998.

> 
> >                   keyword_matches == FALSE ||
> >                   translate_keyword (b->keyword_buf, 
> &b->keyword_off,
> >                                      keyword_name, 
> b->expand, b->keywords))
> >                 {
> 
> 
> Since I've done some tidying while studying it, I attach my 
> version of the patch and also a diff from yours to mine.  The 
> changes are mostly issues of style and comments.  The 
> significant changes are: use "apr_size_t" for 
> "next_sign_off", because that is the type used for other 
> values to which it relates (b->keyword_off); don't initialise 
> the new flag at its definition because it is initialised at 
> its first use; add a missing comma in the array of test lines.
> 
> If you think the two things above (changing the limit test to 
> ">=" and moving it to the end of the condition) are the only 
> further changes needed, I can do that and commit it for you, 
> otherwise I look forward to seeing the next version.

The attached patch is based on yours and includes these two changes and some
extra tests, no new changes in the algorithm itself.

> - Julian
> 

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Julian Foad <ju...@btopenworld.com>.
Lieven Govaerts wrote:
> Julian, 
> 
> attached you'll find a new patch for issue 1780, that I think adresses your
> remarks. 

Thanks.


> I added line 76 in translate-test.c to test for this problem. Solved it in
> the attached patch.

Good.

>>2) It takes order(N-squared) time to process a line 
>>containing N dollar signs. 
> 
> I tried to improve performance, by now only using the matching algorithm
> when a valid keyword has been found. So it will only skip dollar signs when
> a keyword was matched first. Whenever something like '$testblahblah$' is
> found, it'll just skip that part of text. 

Good.

> I had to separate the keyword matching functionality in translate_keyword in
> its own function match_keyword, to support this performance improvement.
> Since these are internal functions and not used outside subst.c this is no
> problem. Correct assumption?

Correct.

> That makes the patch a bit larger, and maybe a bit less clear.

It's fine.  Well, actually, the structural part:

   if (!matches)
     {X}
   if (A || !matches || C)
     {Y}
   else
     {Z}

was quite difficult to understand, but it's OK.


> [[[
> Fix issue #1780: correct keyword expansion when URL or Author value contains
> '$'.
> 
> * subversion/libsvn_subr/subst.c
>   (translate_chunk): add algorithm that keeps searching for next '$' in the
> value of a valid expanded keyword until a correct value is found or search
> limits are reached.
>   (match_keyword): new local function that matches a valid keyword in a char
> buffer. 
>   (translate_keyword): added keyword_name parameter, translate a keyword
> keyword_name with its value. Keyword matching was put in match_keyword.
> 
> * subversion/tests/libsvn_wc/translate-test.c
>   (lines[]): added lines 74-76 to test on '$' in Author and URL keywords.
> ]]]

>               if (b->keyword_off > SVN_KEYWORD_MAX_LEN ||

Should that be ">="?  By the time it's greater, we'll already have written at 
least one character off the end of the buffer.

Check what's going to happen at the boundary condition: will it do the right 
thing for a valid keyword-pattern of exactly MAX_LEN?  MAX_LEN-1?  MAX_LEN+1? 
What about a non-keyword pattern of those lengths?

I think "b->keyword_off >= SVN_KEYWORD_MAX_LEN" will need to go after 
"translate_keyword()", but see what you think.

>                   keyword_matches == FALSE ||
>                   translate_keyword (b->keyword_buf, &b->keyword_off,
>                                      keyword_name, b->expand, b->keywords))
>                 {


Since I've done some tidying while studying it, I attach my version of the 
patch and also a diff from yours to mine.  The changes are mostly issues of 
style and comments.  The significant changes are: use "apr_size_t" for 
"next_sign_off", because that is the type used for other values to which it 
relates (b->keyword_off); don't initialise the new flag at its definition 
because it is initialised at its first use; add a missing comma in the array of 
test lines.

If you think the two things above (changing the limit test to ">=" and moving 
it to the end of the condition) are the only further changes needed, I can do 
that and commit it for you, otherwise I look forward to seeing the next version.

- Julian

RE: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Lieven Govaerts <lg...@mobsol.be>.
Julian, 

attached you'll find a new patch for issue 1780, that I think adresses your
remarks. 

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@btopenworld.com] 
> Sent: zondag 12 februari 2006 21:14
> 
> More precisely, it solves the problem of user names or URLs 
> containing dollar signs, unless a dollar sign is the first 
> character or follows a space.
> 
> I think this level of functionality is worth having, and I 
> would commit it except that I have two concerns with the 
> implementation.
> 
> 1) It fails (with an assert) when searching for a keyword in 
> a line which is longer than SVN_KEYWORD_MAX_LEN.  Although we 
> don't support arbitrarily long keywords, we must continue to 
> support arbitrarily long lines.  To reproduce, just let it 
> try keyword translation on a line consisting of "$" plus (MAX 
> - 2) spaces plus "$$".

I added line 76 in translate-test.c to test for this problem. Solved it in
the attached patch.

> 2) It takes order(N-squared) time to process a line 
> containing N dollar signs. 
>   On my system I found many text files with more than 25 
> dollar signs in the same line, including:
> 
> * Text representations of binary data, e.g. 
> /etc/X11/xdm/pixmaps/xorg.xpm
> * Shell scripts, makefiles, etc., e.g. subversion/build/libtool.m4
> * Config files referencing env-vars, e.g. .kde/share/config/kdeglobals
> * Our test for cases like this: 
> subversion/tests/libsvn_wc/translate-test.c
> 
> I wouldn't think it strange for someone to use a line of 
> about eighty dollar signs to separate sections in a text 
> file.  Processing ten lines, each consisting of 250 dollar 
> signs, took over a second on my rather slow system. 
> Certainly that's not a huge amount of time for a rare case, 
> but we ought to cope well with unusual cases if we can.

I tried to improve performance, by now only using the matching algorithm
when a valid keyword has been found. So it will only skip dollar signs when
a keyword was matched first. Whenever something like '$testblahblah$' is
found, it'll just skip that part of text. 
 
I had to separate the keyword matching functionality in translate_keyword in
its own function match_keyword, to support this performance improvement.
Since these are internal functions and not used outside subst.c this is no
problem. Correct assumption?
That makes the patch a bit larger, and maybe a bit less clear.

Log message:

[[[
Fix issue #1780: correct keyword expansion when URL or Author value contains
'$'.

* subversion/libsvn_subr/subst.c
  (translate_chunk): add algorithm that keeps searching for next '$' in the
value of a valid expanded keyword until a correct value is found or search
limits are reached.
  (match_keyword): new local function that matches a valid keyword in a char
buffer. 
  (translate_keyword): added keyword_name parameter, translate a keyword
keyword_name with its value. Keyword matching was put in match_keyword.

* subversion/tests/libsvn_wc/translate-test.c
  (lines[]): added lines 74-76 to test on '$' in Author and URL keywords.
]]]


> I note that "svn status" processes the locally-modified file 
> every time I run "svn status"; that's a separate problem that 
> we might want to look at.
> 
> The bug (1) should be straightforward to fix.  I'm not sure 
> what to say about the efficiency issue (2).  I'd very much 
> like to see it improved, but I might not insist on it.
> 
> - Julian

Lieven.