You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Lieven Govaerts <lg...@mobsol.be> on 2006/01/09 23:21:33 UTC

[PATCH] issue 1780: Keyword values with dollar signs cause badness.

Hi, 


attached is a patch that solves issue 1780. This issue reports problems
when filenames containing '$' chars are used with keyword expansion.

The original algorithm for finding expanded keywords is changed a bit, the
result is this:

 1. - check for a '$' sign
    - if one is found, get a list of characters up to the next '$'
    
 2. try to translate the resulting string as a keyword
 
 3a. if that succeeds, go to the boring state
 
 3b  - if translation does not succeed, make a note of the location of
       the ending '$' sign, but continue looking for the next ending '$'
     - if a next '$' is found, go back to step 2.
     
 4. If the string is getting larger than 255 chars or EOL is encounterd, 
    stop finding the ending '$', but instead continue with the '$' on 
    our stored location.
 
Can you agree with this change in algorithm? I don't see any problems in 
the algorithm itself, but there may be some work on the implementation. 
Performance impact is very low, due to the max. keyword length of 255 chars.

As soon as I get the unit tests running I'll try to provide one which tests
some common ( and less common ) scenarios.

All input is welcome.

regards,

Lieven.

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.15/223 - Release Date: 6/01/2006
 
  

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Julian Foad <ju...@btopenworld.com>.
Lieven Govaerts wrote:
> Aha, this makes sense, and explains why random replacing of code didn't
> trigger the tests to fail.
[...]
> The new tests seem to confirm your assumption on
> how the algorithm should behave, so I include them in the attached patch.

I've committed this as r18465, with just a small tweak (I changed a "*len" 
parameter to "len" where it was used only as input).

Thank you very much for your work.


Other folks, should I close issue #1780 ("Keyword values with dollar signs 
cause badness"), saying something like:

> Committed a patch by Lieven Govaerts that mostly fixes this (r18465).  Any '$' 
> signs in a keyword value are handled correctly except as the first character or 
> immediately after a space, which is all that can be done without a quoting 
> mechanism.
> 
> This fix covers most likely occurrences, and therefore I'm marking this issue as 
> fixed.  If some more extensive fix is desired, perhaps involving an escaping 
> mechanism or limiting the set of valid keyword values, that should be filed as a 
> separate enhancement.

... or should I leave it open, changing it to Enhancement and perhaps lowering 
the priority or setting the target milestone to "2.0"?

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Lieven Govaerts <lg...@mobsol.be>.
Aha, this makes sense, and explains why random replacing of code didn't
trigger the tests to fail.

I changed the test lines 82-84 to contain the $$$ signs. 

When I run the tests now, I see that we need the '>=' instead of '>' (
translate-test run will crash otherwise ). Moving the test "b->keyword_off
>= SVN_KEYWORD_MAX_LEN" before the translate will fail atleast the
translate-test in line 80. The new tests seem to confirm your assumption on
how the algorithm should behave, so I include them in the attached patch.

Thanks for taking your time on this patch.

Lieven.

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@btopenworld.com] 
> Sent: dinsdag 14 februari 2006 23:52
> To: Lieven Govaerts
> Cc: dev@subversion.tigris.org
> Subject: Re: [PATCH] issue 1780: Keyword values with dollar 
> signs causebadness.
> 
> Lieven Govaerts wrote:
> >>From: Julian Foad [mailto:julianfoad@btopenworld.com]
> > 
> >>>              if (b->keyword_off > SVN_KEYWORD_MAX_LEN ||
> >>
> >>Should that be ">="?  By the time it's greater, we'll already have 
> >>written at least one character off the end of the buffer.
> > 
> > I don't think so. b-keyword_off was post-incremented in the 
> previous 
> > line, so we can't write off the end of the buffer.
> 
> If it's MAX+1 at this point, then the character was written 
> into buffer[MAX], which is off the end of the buffer:
> 
>    char keyword_buf[SVN_KEYWORD_MAX_LEN];
> 
>    b->keyword_buf[b->keyword_off++] = *p++;
> 
> Yes?  (I notice that the "keyword_name" buffer has a length 
> of MAX+1, but "keyword_buf" doesn't.  I don't think 
> keyword_name needs to be that long; I think MAX-2 characters, 
> plus one for a null, is enough.)
> 
> 
> >>Check what's going to happen at the boundary condition: 
> will it do the 
> >>right thing for a valid keyword-pattern of exactly MAX_LEN?  
> >>MAX_LEN-1?  MAX_LEN+1?
> >>What about a non-keyword pattern of those lengths?
> > 
> > I've added lines 76-82 in translate-test.c for testing for these 
> > border cases. They all seem to work as expected.
> 
> Thanks, but I don't think the too-long patterns (lines 81 and 
> 82) are caught by your code, they're caught by the subsequent 
> existing test (because a non-'$' 
> character appears at position MAX).  I think you need dollar 
> signs in the 254th/255th/256th positions so that your code is 
> executed at those positions. 
> Something like:
> 
>    $Author: aaaaaa$$$ $
> 
> ... where the length up to the end of "$$$" is 254/255/256.
> 
> 
> >>I think "b->keyword_off >= SVN_KEYWORD_MAX_LEN" will need 
> to go after 
> >>"translate_keyword()", but see what you think.
> > 
> > No problem with that.
> > 
> > However, I don't see a change in behaviour when I implement the >= 
> > change nor the line-order change.
> 
> This is what I expect:
> 
> If we change ">" to ">=" only, then a keyword of length 
> exactly MAX will not be processed because "off >= MAX" will 
> be true and will prevent the "translate_keyword" call.
> 
> If we change ">" to ">=" and move the test to after the 
> "translate_keyword" 
> call, then the keyword of length MAX will be processed.
> 
> 
> > The new tests should have captured this change. I guess that 
> > b->keyword_off == SVN_KEYWORD_MAX_LEN will not be reached there 
> > because it's already handled earlier in the else block at line 998.
> 
> > The attached patch is based on yours and includes these two changes 
> > and some extra tests, no new changes in the algorithm itself.
> 
> Thanks.  Let me know what you think about the above.
> 
> - Julian

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Julian Foad <ju...@btopenworld.com>.
Lieven Govaerts wrote:
>>From: Julian Foad [mailto:julianfoad@btopenworld.com] 
> 
>>>              if (b->keyword_off > SVN_KEYWORD_MAX_LEN ||
>>
>>Should that be ">="?  By the time it's greater, we'll already 
>>have written at least one character off the end of the buffer.
> 
> I don't think so. b-keyword_off was post-incremented in the previous line,
> so we can't write off the end of the buffer. 

If it's MAX+1 at this point, then the character was written into buffer[MAX], 
which is off the end of the buffer:

   char keyword_buf[SVN_KEYWORD_MAX_LEN];

   b->keyword_buf[b->keyword_off++] = *p++;

Yes?  (I notice that the "keyword_name" buffer has a length of MAX+1, but 
"keyword_buf" doesn't.  I don't think keyword_name needs to be that long; I 
think MAX-2 characters, plus one for a null, is enough.)


>>Check what's going to happen at the boundary condition: will 
>>it do the right thing for a valid keyword-pattern of exactly 
>>MAX_LEN?  MAX_LEN-1?  MAX_LEN+1? 
>>What about a non-keyword pattern of those lengths?
> 
> I've added lines 76-82 in translate-test.c for testing for these border
> cases. They all seem to work as expected. 

Thanks, but I don't think the too-long patterns (lines 81 and 82) are caught by 
your code, they're caught by the subsequent existing test (because a non-'$' 
character appears at position MAX).  I think you need dollar signs in the 
254th/255th/256th positions so that your code is executed at those positions. 
Something like:

   $Author: aaaaaa$$$ $

... where the length up to the end of "$$$" is 254/255/256.


>>I think "b->keyword_off >= SVN_KEYWORD_MAX_LEN" will need to 
>>go after "translate_keyword()", but see what you think.
> 
> No problem with that.
> 
> However, I don't see a change in behaviour when I implement the >= change
> nor the line-order change.

This is what I expect:

If we change ">" to ">=" only, then a keyword of length exactly MAX will not be 
processed because "off >= MAX" will be true and will prevent the 
"translate_keyword" call.

If we change ">" to ">=" and move the test to after the "translate_keyword" 
call, then the keyword of length MAX will be processed.


> The new tests should have captured this change. I
> guess that b->keyword_off == SVN_KEYWORD_MAX_LEN will not be reached there
> because it's already handled earlier in the else block at line 998.

> The attached patch is based on yours and includes these two changes and some
> extra tests, no new changes in the algorithm itself.

Thanks.  Let me know what you think about the above.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Lieven Govaerts <lg...@mobsol.be>.
 

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@btopenworld.com] 

> 
> >               if (b->keyword_off > SVN_KEYWORD_MAX_LEN ||
> 
> Should that be ">="?  By the time it's greater, we'll already 
> have written at least one character off the end of the buffer.

I don't think so. b-keyword_off was post-incremented in the previous line,
so we can't write off the end of the buffer. 

> Check what's going to happen at the boundary condition: will 
> it do the right thing for a valid keyword-pattern of exactly 
> MAX_LEN?  MAX_LEN-1?  MAX_LEN+1? 
> What about a non-keyword pattern of those lengths?

I've added lines 76-82 in translate-test.c for testing for these border
cases. They all seem to work as expected. 

> I think "b->keyword_off >= SVN_KEYWORD_MAX_LEN" will need to 
> go after "translate_keyword()", but see what you think.

No problem with that.

However, I don't see a change in behaviour when I implement the >= change
nor the line-order change. The new tests should have captured this change. I
guess that b->keyword_off == SVN_KEYWORD_MAX_LEN will not be reached there
because it's already handled earlier in the else block at line 998.

> 
> >                   keyword_matches == FALSE ||
> >                   translate_keyword (b->keyword_buf, 
> &b->keyword_off,
> >                                      keyword_name, 
> b->expand, b->keywords))
> >                 {
> 
> 
> Since I've done some tidying while studying it, I attach my 
> version of the patch and also a diff from yours to mine.  The 
> changes are mostly issues of style and comments.  The 
> significant changes are: use "apr_size_t" for 
> "next_sign_off", because that is the type used for other 
> values to which it relates (b->keyword_off); don't initialise 
> the new flag at its definition because it is initialised at 
> its first use; add a missing comma in the array of test lines.
> 
> If you think the two things above (changing the limit test to 
> ">=" and moving it to the end of the condition) are the only 
> further changes needed, I can do that and commit it for you, 
> otherwise I look forward to seeing the next version.

The attached patch is based on yours and includes these two changes and some
extra tests, no new changes in the algorithm itself.

> - Julian
> 

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Julian Foad <ju...@btopenworld.com>.
Lieven Govaerts wrote:
> Julian, 
> 
> attached you'll find a new patch for issue 1780, that I think adresses your
> remarks. 

Thanks.


> I added line 76 in translate-test.c to test for this problem. Solved it in
> the attached patch.

Good.

>>2) It takes order(N-squared) time to process a line 
>>containing N dollar signs. 
> 
> I tried to improve performance, by now only using the matching algorithm
> when a valid keyword has been found. So it will only skip dollar signs when
> a keyword was matched first. Whenever something like '$testblahblah$' is
> found, it'll just skip that part of text. 

Good.

> I had to separate the keyword matching functionality in translate_keyword in
> its own function match_keyword, to support this performance improvement.
> Since these are internal functions and not used outside subst.c this is no
> problem. Correct assumption?

Correct.

> That makes the patch a bit larger, and maybe a bit less clear.

It's fine.  Well, actually, the structural part:

   if (!matches)
     {X}
   if (A || !matches || C)
     {Y}
   else
     {Z}

was quite difficult to understand, but it's OK.


> [[[
> Fix issue #1780: correct keyword expansion when URL or Author value contains
> '$'.
> 
> * subversion/libsvn_subr/subst.c
>   (translate_chunk): add algorithm that keeps searching for next '$' in the
> value of a valid expanded keyword until a correct value is found or search
> limits are reached.
>   (match_keyword): new local function that matches a valid keyword in a char
> buffer. 
>   (translate_keyword): added keyword_name parameter, translate a keyword
> keyword_name with its value. Keyword matching was put in match_keyword.
> 
> * subversion/tests/libsvn_wc/translate-test.c
>   (lines[]): added lines 74-76 to test on '$' in Author and URL keywords.
> ]]]

>               if (b->keyword_off > SVN_KEYWORD_MAX_LEN ||

Should that be ">="?  By the time it's greater, we'll already have written at 
least one character off the end of the buffer.

Check what's going to happen at the boundary condition: will it do the right 
thing for a valid keyword-pattern of exactly MAX_LEN?  MAX_LEN-1?  MAX_LEN+1? 
What about a non-keyword pattern of those lengths?

I think "b->keyword_off >= SVN_KEYWORD_MAX_LEN" will need to go after 
"translate_keyword()", but see what you think.

>                   keyword_matches == FALSE ||
>                   translate_keyword (b->keyword_buf, &b->keyword_off,
>                                      keyword_name, b->expand, b->keywords))
>                 {


Since I've done some tidying while studying it, I attach my version of the 
patch and also a diff from yours to mine.  The changes are mostly issues of 
style and comments.  The significant changes are: use "apr_size_t" for 
"next_sign_off", because that is the type used for other values to which it 
relates (b->keyword_off); don't initialise the new flag at its definition 
because it is initialised at its first use; add a missing comma in the array of 
test lines.

If you think the two things above (changing the limit test to ">=" and moving 
it to the end of the condition) are the only further changes needed, I can do 
that and commit it for you, otherwise I look forward to seeing the next version.

- Julian

RE: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Lieven Govaerts <lg...@mobsol.be>.
Julian, 

attached you'll find a new patch for issue 1780, that I think adresses your
remarks. 

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@btopenworld.com] 
> Sent: zondag 12 februari 2006 21:14
> 
> More precisely, it solves the problem of user names or URLs 
> containing dollar signs, unless a dollar sign is the first 
> character or follows a space.
> 
> I think this level of functionality is worth having, and I 
> would commit it except that I have two concerns with the 
> implementation.
> 
> 1) It fails (with an assert) when searching for a keyword in 
> a line which is longer than SVN_KEYWORD_MAX_LEN.  Although we 
> don't support arbitrarily long keywords, we must continue to 
> support arbitrarily long lines.  To reproduce, just let it 
> try keyword translation on a line consisting of "$" plus (MAX 
> - 2) spaces plus "$$".

I added line 76 in translate-test.c to test for this problem. Solved it in
the attached patch.

> 2) It takes order(N-squared) time to process a line 
> containing N dollar signs. 
>   On my system I found many text files with more than 25 
> dollar signs in the same line, including:
> 
> * Text representations of binary data, e.g. 
> /etc/X11/xdm/pixmaps/xorg.xpm
> * Shell scripts, makefiles, etc., e.g. subversion/build/libtool.m4
> * Config files referencing env-vars, e.g. .kde/share/config/kdeglobals
> * Our test for cases like this: 
> subversion/tests/libsvn_wc/translate-test.c
> 
> I wouldn't think it strange for someone to use a line of 
> about eighty dollar signs to separate sections in a text 
> file.  Processing ten lines, each consisting of 250 dollar 
> signs, took over a second on my rather slow system. 
> Certainly that's not a huge amount of time for a rare case, 
> but we ought to cope well with unusual cases if we can.

I tried to improve performance, by now only using the matching algorithm
when a valid keyword has been found. So it will only skip dollar signs when
a keyword was matched first. Whenever something like '$testblahblah$' is
found, it'll just skip that part of text. 
 
I had to separate the keyword matching functionality in translate_keyword in
its own function match_keyword, to support this performance improvement.
Since these are internal functions and not used outside subst.c this is no
problem. Correct assumption?
That makes the patch a bit larger, and maybe a bit less clear.

Log message:

[[[
Fix issue #1780: correct keyword expansion when URL or Author value contains
'$'.

* subversion/libsvn_subr/subst.c
  (translate_chunk): add algorithm that keeps searching for next '$' in the
value of a valid expanded keyword until a correct value is found or search
limits are reached.
  (match_keyword): new local function that matches a valid keyword in a char
buffer. 
  (translate_keyword): added keyword_name parameter, translate a keyword
keyword_name with its value. Keyword matching was put in match_keyword.

* subversion/tests/libsvn_wc/translate-test.c
  (lines[]): added lines 74-76 to test on '$' in Author and URL keywords.
]]]


> I note that "svn status" processes the locally-modified file 
> every time I run "svn status"; that's a separate problem that 
> we might want to look at.
> 
> The bug (1) should be straightforward to fix.  I'm not sure 
> what to say about the efficiency issue (2).  I'd very much 
> like to see it improved, but I might not insist on it.
> 
> - Julian

Lieven.

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Julian Foad <ju...@btopenworld.com>.
Lieven Govaerts wrote:
> 
> I updated my patch of last week to fix issue 1780. 

Thanks.

> The algorithm implemented here isn't changed, so this patch will not solve
> the problem of usernames starting or ending with '$'. It only solves user-
> or filenames containing a '$'.

More precisely, it solves the problem of user names or URLs containing dollar 
signs, unless a dollar sign is the first character or follows a space.

I think this level of functionality is worth having, and I would commit it 
except that I have two concerns with the implementation.

1) It fails (with an assert) when searching for a keyword in a line which is 
longer than SVN_KEYWORD_MAX_LEN.  Although we don't support arbitrarily long 
keywords, we must continue to support arbitrarily long lines.  To reproduce, 
just let it try keyword translation on a line consisting of "$" plus (MAX - 2) 
spaces plus "$$".

2) It takes order(N-squared) time to process a line containing N dollar signs. 
  On my system I found many text files with more than 25 dollar signs in the 
same line, including:

* Text representations of binary data, e.g. /etc/X11/xdm/pixmaps/xorg.xpm
* Shell scripts, makefiles, etc., e.g. subversion/build/libtool.m4
* Config files referencing env-vars, e.g. .kde/share/config/kdeglobals
* Our test for cases like this: subversion/tests/libsvn_wc/translate-test.c

I wouldn't think it strange for someone to use a line of about eighty dollar 
signs to separate sections in a text file.  Processing ten lines, each 
consisting of 250 dollar signs, took over a second on my rather slow system. 
Certainly that's not a huge amount of time for a rare case, but we ought to 
cope well with unusual cases if we can.

I note that "svn status" processes the locally-modified file every time I run 
"svn status"; that's a separate problem that we might want to look at.

The bug (1) should be straightforward to fix.  I'm not sure what to say about 
the efficiency issue (2).  I'd very much like to see it improved, but I might 
not insist on it.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Lieven Govaerts <lg...@mobsol.be>.
Hi, 


I updated my patch of last week to fix issue 1780. 

The algorithm implemented here isn't changed, so this patch will not solve
the problem of usernames starting or ending with '$'. It only solves user-
or filenames containing a '$'.

This patch contains:
- code changes to subst.c implementing the new algorithm
- unit tests in translate-test.c

Log message:

[[[
Fix issue #1780: correct keyword expansion when URL or Author value contains
'$'.

* subversion/libsvn_subr/subst.c
  (translate_chunk): add algorithm that keeps searching for next '$' until
valid keyword is found or search limits are reached.

* subversion/tests/libsvn_wc/translate-test.c
  (lines[]): added two lines to test on '$' in Author and URL keywords.
]]]

regards,

Lieven.



> -----Original Message-----
> From: Lieven Govaerts [mailto:lgo@mobsol.be] 
> Sent: dinsdag 10 januari 2006 21:07
> To: 'John Peacock'; 'Michael Pilato'
> Cc: 'svn-dev'
> Subject: RE: [PATCH] issue 1780: Keyword values with dollar 
> signs causebadness.
> 
> Michael, John,
> 
> > John Peacock wrote:
> > 
> > Michael Pilato wrote:
> > > The algorithm is a pretty good one, though I think it still
> > falls down
> > > if, say, an Author's name begins with a '$'.  The expanded
> > form would
> > > be '$ Author: $username $', which happens to be a superset of the 
> > > completely valid format '$ Author: $'.
> 
> You're absolutely right, this patch doesn't solve that 
> situation, it will give the same result as the code without the patch.
> 
> > Which brings up the topic of a generalized escaping mechanism for 
> > keyword *values*, rather than trying larger and larger 
> strings (as the 
> > proposed patch does).
> 
...
> Concerning performance, a keyword is not only limited by 
> SVN_KEYWORD_MAX_LEN chars, but also by a '\r' or '\n'. So the 
> only scenario where the performance is affected by this patch is with:
>   - a file with keyword expasion enabled
>   - containing very long lines without CR's
>   - using filenames with $ signs.
> 
> And even then, SVN_KEYWORD_MAX_LEN is defined as 255 for now, 
> it will never exceed eg. 1024 right?
> 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.17/229 - Release Date: 13/01/2006
 
  

RE: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Lieven Govaerts <lg...@mobsol.be>.
Michael, John,

> John Peacock wrote:
> 
> Michael Pilato wrote:
> > The algorithm is a pretty good one, though I think it still 
> falls down 
> > if, say, an Author's name begins with a '$'.  The expanded 
> form would 
> > be '$ Author: $username $', which happens to be a superset of the 
> > completely valid format '$ Author: $'.

You're absolutely right, this patch doesn't solve that situation, it will
give the same result as the code without the patch.

> Which brings up the topic of a generalized escaping mechanism 
> for keyword *values*, rather than trying larger and larger 
> strings (as the proposed patch does).

Yep, that's the choice between one step closer to handling the most common
situations, or providing a definitive but incompatible solution.

The situation that I encounter the most is that people use a $ sign in the
middle of the filenames, not at the beginning or the end. My patch solves
that scenario just fine. 

Concerning performance, a keyword is not only limited by SVN_KEYWORD_MAX_LEN
chars, but also by a '\r' or '\n'. So the only scenario where the
performance is affected by this patch is with:
  - a file with keyword expasion enabled
  - containing very long lines without CR's
  - using filenames with $ signs.

And even then, SVN_KEYWORD_MAX_LEN is defined as 255 for now, it will never
exceed eg. 1024 right?

I'm willing to discuss and implement and alternative solution btw, but
anything that we come up with has to be implemented so it can parse both
current and new types of keyword expansion.

We can use double quotes to contain the keyword values:

$Date: "2002-07-22 21:42:37 -0700 (Mon, 22 Jul 2002)" $
$Id: "calc.c 148 2002-07-28 21:30:43Z sally" $
$HeadURL: "http://svn.collab.net/repos/trunk/README" $
$Revision: "144" $

Easy to implement and clear for the users. The only people that will be
affected are those that actually use these values in code instead of as
comments. 

Lieven.

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.16/225 - Release Date: 9/01/2006
 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by John Peacock <jp...@rowman.com>.
Michael Pilato wrote:
> The algorithm is a pretty good one, though I think it still falls down
> if, say, an Author's name begins with a '$'.  The expanded form would be
> '$ Author: $username $', which happens to be a superset of the
> completely valid format '$ Author: $'.

Which brings up the topic of a generalized escaping mechanism for 
keyword *values*, rather than trying larger and larger strings (as the 
proposed patch does).

John

-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Boulevard
Suite H
Lanham, MD  20706
301-459-3366 x.5010
fax 301-429-5748

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue 1780: Keyword values with dollar signs causebadness.

Posted by Michael Pilato <cm...@collab.net>.
On Mon, 2006-01-09 at 21:22 -0500, John Peacock wrote:
> Lieven Govaerts wrote:
> > Can you agree with this change in algorithm? I don't see any problems in 
> > the algorithm itself, but there may be some work on the implementation. 
> > Performance impact is very low, due to the max. keyword length of 255 chars.
> 
> At a cursory glance, it seems like the change would work, but one thing you may
> not be aware of was that SVN_KEYWORD_MAX_LEN #define may change.  The reason is
> that it is surprisingly easy to get a path longer than 255 characters in big
> projects (like GCC, to use a real life example).  I don't know if the #define is
> going to be eliminated entirely or just increase by a couple powers of two.
> Some testing is going to have to take place as to whether this extra looping
> will have any performance issues.
> 
> In any case, some test cases and a good log message (see HACKING) are essential
> to get a thorough review...

Two things:

The algorithm is a pretty good one, though I think it still falls down
if, say, an Author's name begins with a '$'.  The expanded form would be
'$ Author: $username $', which happens to be a superset of the
completely valid format '$ Author: $'.

Definitely should use SVN_KEYWORD_MAX_LEN, not a hard-coded 255.  I'll
reserve comment on what SVN_KEYWORD_MAX_LEN's numerical equivalent
should be (255 or otherwise).

-- 
C. Michael Pilato <cm...@collab.net> 
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: [PATCH] issue 1780: Keyword values with dollar signs cause badness.

Posted by John Peacock <jp...@rowman.com>.
Lieven Govaerts wrote:
> Can you agree with this change in algorithm? I don't see any problems in 
> the algorithm itself, but there may be some work on the implementation. 
> Performance impact is very low, due to the max. keyword length of 255 chars.

At a cursory glance, it seems like the change would work, but one thing you may
not be aware of was that SVN_KEYWORD_MAX_LEN #define may change.  The reason is
that it is surprisingly easy to get a path longer than 255 characters in big
projects (like GCC, to use a real life example).  I don't know if the #define is
going to be eliminated entirely or just increase by a couple powers of two.
Some testing is going to have to take place as to whether this extra looping
will have any performance issues.

In any case, some test cases and a good log message (see HACKING) are essential
to get a thorough review...

John

-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4720 Boston Way
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5747

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org