You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by VK Sameer <sa...@collab.net> on 2005/01/26 05:43:04 UTC

[PATCH] issue #2147 - v1

Hello,

Attached is a fix for issue #2147, with much help from Karl Fogel.

Regards
Sameer

Re: [PATCH] issue #2147 - v1

Posted by Julian Foad <ju...@btopenworld.com>.
VK Sameer wrote:
> Julian Foad wrote on 01/26/2005 09:07 AM:
>> Julian Foad wrote:
>>> VK Sameer wrote:
>>>>      SVN_ERR( dav_svn__send_xml(lrb->bb, lrb->output,
>>>>                                 "<D:comment>%s</D:comment>" DEBUG_CR,
>>>> -                               apr_xml_quote_string(pool, msg, 0)) );
>>>> +                               svn_xml_fuzzy_escape (
>>>> +                                 apr_xml_quote_string (pool, msg, 0),
>>>> +                                 pool)) );
>>>
>>> No.  You should strip control characters before you XML-quote it, 
>>> otherwise you are relying on (1) your function producing validly 
>>> quoted XML (which it may do at present but is not documented to do), 
>>> and (2) the XML-quoting function accepting and passing through 
>>> control characters (which such a function need not be expected to do).
>>
>> (1)
>> Thinking some more about this, I see now that you originally intended 
>> svn_xml_fuzzy_escape to accept nearly-well-formed XML text (but with 
>> some control characters) and return well-formed XML text, and that's 
>> why it is named "svn_xml_...".  Ideally you wanted it to perform some 
>> sort of XML quoting which would encode the control characters in XML 
>> such that they would be decoded by the XML receiver, but you have 
>> found that this is not possible.  (I agree: the XML spec says control 
>> characters cannot be represented, encoded or otherwise.)  Therefore 
>> you have had to settle for performing an ad-hoc, mostly 
>> human-readable, non-reversible encoding.  Because of this, I stand by 
>> what I wrote above: your fuzzy function is not, and should not be 
>> documented for use on well-formed XML text; it should be applied to 
>> the actual (non-escaped) log message.
> 
> Wouldn't that cause unnecessary encoding for those transport layers that 
> _can_ handle control characters? Control characters in a log message are 
> not a problem when using file:// or svn://.

No, I just meant swapping the order of fuzzy_escape and xml_quote where it is 
used in dav_svn__send_xml(...).  I'm not suggesting it be used in other RA 
methods or any higher up the call chain than this; sorry if that wasn't clear.


>> Whether the fuzzy function's name and/or doc-string should change to 
>> reflect that it doesn't produce XML but does produce XML-safe text, I 
>> don't know.
> 
> I'm a little lost now. Why shouldn't the output be considered XML?

Your function doesn't do XML escaping (quoting), so the output can only be 
well-formed XML if the input was.  And yet the input can't usefully be 
well-formed XML because well-formed XML doesn't contain control characters 
(except TAB, NL, CR).

The way your patch uses it currently is:

   Arbitrary UTF-8 text: the log message
       ->  apr_xml_quote_string()  ->
   Fairly-well-formed XML, but not quite
       ->  svn_xml_fuzzy_escape()  ->
   Well-formed XML.

I believe that apr_xml_quote_string() is not guaranteed to accept arbitrary 
UTF-8 text and not intended to produce nearly-XML, but to take XML-safe text 
and produce well-formed XML.  I know its doc-string doesn't say this explicitly.

I recommend:

   Arbitrary UTF-8 text: the log message
       ->  svn_xml_fuzzy_escape()  ->
   XML-safe text
       ->  apr_xml_quote_string()  ->
   Well-formed XML.

In other words, let apr_xml_quote_string work with XML-safe text and generate 
well-formed XML, and define your svn_xml_fuzzy_escape as converting arbitrary 
UTF-8 text into XML-safe UTF-8 text.  In this way, the input and output data 
formats of both functions are formats that are commonly used and understood.

Note the notion of "XML-safe text": text that can be represented in XML by 
XML-escaping it.  The function svn_xml_is_xml_safe guarantees that a string is 
XML-safe (though it's a rather restrictive implementation, only supporting the 
ASCII subset of XML-safe strings, so it's not good enough for your use).


>> (2)
>> Now I'm confused about what you are escaping.  You are escaping all 
>> ASCII control characters (as defined by svn_ctype_iscntrl).  That 
>> includes valid XML characters CR, LF and TAB.  Shouldn't you be 
>> escaping only non-XML control characters?
> 
> Fixed, thanks for pointing that out.

Sorry, but it's not fixed in patch v2.  The set of valid XML control characters 
is TAB, NL, CR; this is not the same as the set of ASCII white space characters 
(which includes VT, for example).

See 'svn_xml_is_xml_safe', but note that it is not what you want in terms of 
characters above 127 (it only accepts ASCII, not UTF-8).

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue #2147 - v1

Posted by VK Sameer <sa...@collab.net>.
Julian Foad wrote on 01/26/2005 09:07 AM:
> Julian Foad wrote:
> 
>> VK Sameer wrote:
>>
>>>      SVN_ERR( dav_svn__send_xml(lrb->bb, lrb->output,
>>>                                 "<D:comment>%s</D:comment>" DEBUG_CR,
>>> -                               apr_xml_quote_string(pool, msg, 0)) );
>>> +                               svn_xml_fuzzy_escape (
>>> +                                 apr_xml_quote_string (pool, msg, 0),
>>> +                                 pool)) );
>>
>>
>> No.  You should strip control characters before you XML-quote it, 
>> otherwise you are relying on (1) your function producing validly 
>> quoted XML (which it may do at present but is not documented to do), 
>> and (2) the XML-quoting function accepting and passing through control 
>> characters (which such a function need not be expected to do).
> 
> 
> (1)
> Thinking some more about this, I see now that you originally intended 
> svn_xml_fuzzy_escape to accept nearly-well-formed XML text (but with 
> some control characters) and return well-formed XML text, and that's why 
> it is named "svn_xml_...".  Ideally you wanted it to perform some sort 
> of XML quoting which would encode the control characters in XML such 
> that they would be decoded by the XML receiver, but you have found that 
> this is not possible.  (I agree: the XML spec says control characters 
> cannot be represented, encoded or otherwise.)  Therefore you have had to 
> settle for performing an ad-hoc, mostly human-readable, non-reversible 
> encoding.  Because of this, I stand by what I wrote above: your fuzzy 
> function is not, and should not be documented for use on well-formed XML 
> text; it should be applied to the actual (non-escaped) log message.

Wouldn't that cause unnecessary encoding for those transport layers that 
_can_ handle control characters? Control characters in a log message are 
not a problem when using file:// or svn://.

> Whether the fuzzy function's name and/or doc-string should change to 
> reflect that it doesn't produce XML but does produce XML-safe text, I 
> don't know.

I'm a little lost now. Why shouldn't the output be considered XML?

> (2)
> Now I'm confused about what you are escaping.  You are escaping all 
> ASCII control characters (as defined by svn_ctype_iscntrl).  That 
> includes valid XML characters CR, LF and TAB.  Shouldn't you be escaping 
> only non-XML control characters?

Fixed, thanks for pointing that out.

Regards
Sameer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue #2147 - v1

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Fri, 28 Jan 2005, VK Sameer wrote:

> Peter N. Lundblad wrote on 01/26/2005 12:29 PM:
> > On Wed, 26 Jan 2005, Julian Foad wrote:
> >
> >
> >>(2)
> >>Now I'm confused about what you are escaping.  You are escaping all ASCII
> >>control characters (as defined by svn_ctype_iscntrl).  That includes valid XML
> >>characters CR, LF and TAB.  Shouldn't you be escaping only non-XML control
> >>characters?
> >>
> >
> > And non-ASCII invalid XML characters as well (in the future, when we have
> > functions to convert to Unicode scalars). Could you design the function so
> > it can be extended for that later? (Thinking of the API, not the
> > implementation.)
> >
>
> Sorry, Peter, I don't think I understood.
> Unless you mean accepting a set of bytes instead of a single char?
>
You mean my message was somewhat cryptic? Can never be that way;) Hmmm,
maybe.

What I mean is that what you want to do is eliminate characters that can't
be represented in XML, and you want to specify that in the docstring. As
Julian points out, this is not the same as ASCII control characters. I
think you should drop using svn_ctype.h functions for this purpose and
just list the invalid characters in the function in some way.

Also, note that XML lists FFFE and FFFF as invalid in XML. I am not sure
that these can be represented in vaid UTF8, but if they can, you ideally
should handle them too. (XML also excludes the surrogate characters, but
they are forbidden by UTF8 anyway). So, FFFE and FFFF was the non-ASCII
part.

Hope this clarifies,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue #2147 - v1

Posted by VK Sameer <sa...@collab.net>.
Peter N. Lundblad wrote on 01/26/2005 12:29 PM:
> On Wed, 26 Jan 2005, Julian Foad wrote:
> 
> 
>>Julian Foad wrote:
>>
>>>VK Sameer wrote:
>>>
>>>>     SVN_ERR( dav_svn__send_xml(lrb->bb, lrb->output,
>>>>                                "<D:comment>%s</D:comment>" DEBUG_CR,
>>>>-                               apr_xml_quote_string(pool, msg, 0)) );
>>>>+                               svn_xml_fuzzy_escape (
>>>>+                                 apr_xml_quote_string (pool, msg, 0),
>>>>+                                 pool)) );
>>>
>>>No.  You should strip control characters before you XML-quote it,
>>>otherwise you are relying on (1) your function producing validly quoted
>>>XML (which it may do at present but is not documented to do), and (2)
>>>the XML-quoting function accepting and passing through control
>>>characters (which such a function need not be expected to do).
>>
>>(2)
>>Now I'm confused about what you are escaping.  You are escaping all ASCII
>>control characters (as defined by svn_ctype_iscntrl).  That includes valid XML
>>characters CR, LF and TAB.  Shouldn't you be escaping only non-XML control
>>characters?
>>
> 
> And non-ASCII invalid XML characters as well (in the future, when we have
> functions to convert to Unicode scalars). Could you design the function so
> it can be extended for that later? (Thinking of the API, not the
> implementation.)
> 

Sorry, Peter, I don't think I understood.
Unless you mean accepting a set of bytes instead of a single char?

Regards
Sameer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue #2147 - v1

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Wed, 26 Jan 2005, Julian Foad wrote:

> Julian Foad wrote:
> > VK Sameer wrote:
> >>      SVN_ERR( dav_svn__send_xml(lrb->bb, lrb->output,
> >>                                 "<D:comment>%s</D:comment>" DEBUG_CR,
> >> -                               apr_xml_quote_string(pool, msg, 0)) );
> >> +                               svn_xml_fuzzy_escape (
> >> +                                 apr_xml_quote_string (pool, msg, 0),
> >> +                                 pool)) );
> >
> > No.  You should strip control characters before you XML-quote it,
> > otherwise you are relying on (1) your function producing validly quoted
> > XML (which it may do at present but is not documented to do), and (2)
> > the XML-quoting function accepting and passing through control
> > characters (which such a function need not be expected to do).
>
> (2)
> Now I'm confused about what you are escaping.  You are escaping all ASCII
> control characters (as defined by svn_ctype_iscntrl).  That includes valid XML
> characters CR, LF and TAB.  Shouldn't you be escaping only non-XML control
> characters?
>
And non-ASCII invalid XML characters as well (in the future, when we have
functions to convert to Unicode scalars). Could you design the function so
it can be extended for that later? (Thinking of the API, not the
implementation.)

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue #2147 - v1

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:
> VK Sameer wrote:
>>      SVN_ERR( dav_svn__send_xml(lrb->bb, lrb->output,
>>                                 "<D:comment>%s</D:comment>" DEBUG_CR,
>> -                               apr_xml_quote_string(pool, msg, 0)) );
>> +                               svn_xml_fuzzy_escape (
>> +                                 apr_xml_quote_string (pool, msg, 0),
>> +                                 pool)) );
> 
> No.  You should strip control characters before you XML-quote it, 
> otherwise you are relying on (1) your function producing validly quoted 
> XML (which it may do at present but is not documented to do), and (2) 
> the XML-quoting function accepting and passing through control 
> characters (which such a function need not be expected to do).

(1)
Thinking some more about this, I see now that you originally intended 
svn_xml_fuzzy_escape to accept nearly-well-formed XML text (but with some 
control characters) and return well-formed XML text, and that's why it is named 
"svn_xml_...".  Ideally you wanted it to perform some sort of XML quoting which 
would encode the control characters in XML such that they would be decoded by 
the XML receiver, but you have found that this is not possible.  (I agree: the 
XML spec says control characters cannot be represented, encoded or otherwise.) 
  Therefore you have had to settle for performing an ad-hoc, mostly 
human-readable, non-reversible encoding.  Because of this, I stand by what I 
wrote above: your fuzzy function is not, and should not be documented for use 
on well-formed XML text; it should be applied to the actual (non-escaped) log 
message.

Whether the fuzzy function's name and/or doc-string should change to reflect 
that it doesn't produce XML but does produce XML-safe text, I don't know.


(2)
Now I'm confused about what you are escaping.  You are escaping all ASCII 
control characters (as defined by svn_ctype_iscntrl).  That includes valid XML 
characters CR, LF and TAB.  Shouldn't you be escaping only non-XML control 
characters?

(Sorry, I've not tested because I haven't been able to set up my own 
mod_dav_svn server.)


- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue #2147 - v1

Posted by Julian Foad <ju...@btopenworld.com>.
VK Sameer wrote:
> Julian Foad wrote on 01/26/2005 08:37 AM:
>> Why does this return "const char *" when all the other 
>> svn_xml_escape_... functions return void and append their output to a 
>> stringbuf?  (Not saying you necessarily should change it.)
> 
> So that it can be chained in the caller. The intent was to match the 
> existing call to apr_cml_quote_string.

OK.


>> "for (q = string; ...)" would be clearer.
> 
> Hmm, I'm repeating the assignment later, for different reasons, though. 
> This seems more consistent.

Well, it's a minor point.  I don't mind much.


>> No.  You should strip control characters before you XML-quote it, 
>> otherwise you are relying on (1) your function producing validly 
>> quoted XML (which it may do at present but is not documented to do), and
> 
> I don't think I understood your point.
> Does apr_xml_quote_string require a validly-quoted XML string?
> Also, svn_xml_fuzzy_escape() does not do XML in the sense of quoting, 
> unlike say svn_xml_escape_cdata().

I have responded in a separate e-mail.

>> (2) the XML-quoting function accepting and passing through control 
>> characters (which such a function need not be expected to do).
> 
> It does do it now, which is why there is a bug report filed :)
> Both the svn and apr functions seem to pass through bytes they are not
> looking for.

Yes, they do seem to, but you shouldn't rely on a function's undefined behaviour.


>>> +svn:log
>>> +V 37
>>> +this msg contains a Ctrl-T:  <- here
>>
>> Eww.  Is this likely to be preserved in people's editors?  Perhaps you 
>> could find a programmatic way of inserting it into the string without 
>> it having to be a literal control character in the source file.
> 
> OK, I did a quick search and didn't understand the Python code that 
> showed up. Could somebody with Python experience help? Perl interpolates
> the value of a variable into a here document. How do I do that in Python?

Sorry, I don't have much Python experience.  I could probably figure something 
out by looking at existing code and the Python documentation, but I don't think 
I could do that more quickly than you could.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue #2147 - v1

Posted by VK Sameer <sa...@collab.net>.
Julian Foad wrote on 01/26/2005 08:37 AM:
> VK Sameer wrote:
> 
> And here's my rather picky review. :-)

No problem. :)

>  > ------------------------------------------------------------------------
>  >
>  > Fixed issue #2147 - control characters in log message cause failure 
> (over DAV)
>  >
>  > * subversion/mod_dav_svn/log.c: modified log_receiver to call
>  >     svn_xml_fuzzy_escape()
> 
> But what is the intention or result of this change?  In other words, in 
> what way does calling that new function affect what this function does?  
> For example:
> 
> * subversion/mod_dav_svn/log.c (log_receiver): call svn_xml_fuzzy_escape
>     to ensure no control characters remain in the log message.

Fixed

> You should name in parentheses each top-level program object (function, 
> variable, etc.) that is modified/added/removed:

OK.

>> Index: subversion/include/svn_xml.h
>> +/** Return @c string on finding no ASCII control characters. Otherwise
>> + *  return a copy created with @c pool, with control characters
> 
> 
> "@a string" and "@a pool" - they are Arguments rather than 
> Cross-references (that's how I think of the "@a" and "@c" mnemonics, 
> even if that's not the formal definition).

My bad - fixed.

>> + *  replaced by "0x\d+" sequences.
> 
> 
> Presumably hex digits, not "\d".

Yup, confused me in the .c file as well. Fixed.

>> + */
>> +const char* svn_xml_fuzzy_escape (const char *string,
> 
> 
> Put space before "*", not after it.

OK.

> Why does this return "const char *" when all the other 
> svn_xml_escape_... functions return void and append their output to a 
> stringbuf?  (Not saying you necessarily should change it.)

So that it can be chained in the caller. The intent was to match the 
existing call to apr_cml_quote_string.

>>  /* Generalized Subversion XML Parsing */
>> Index: subversion/libsvn_subr/xml.c
>> ===================================================================
>> --- subversion/libsvn_subr/xml.c    (revision 12853)
>> +++ subversion/libsvn_subr/xml.c    (working copy)
>> @@ -25,6 +25,7 @@
>>  #include "svn_pools.h"
>>  #include "svn_xml.h"
>>  #include "svn_error.h"
>> +#include "svn_ctype.h"
>>  
>>  #ifdef SVN_HAVE_OLD_EXPAT
>>  #include "xmlparse.h"
>> @@ -265,7 +266,47 @@
>>    xml_escape_attr (outstr, string, (apr_size_t) strlen (string), pool);
>>  }
>>  
>> +/* Escape any ASCII control chars. */
> 
> 
> Comment not needed at implementation, because the description is given 
> at the prototype.

OK.

>> +const char*
> 
> 
> Space before "*".

OK.

>> +svn_xml_fuzzy_escape (const char *string, apr_pool_t *pool)
>> +{
>> +  const char *end = string + strlen (string);
>> +  const char *p = string, *q;
>> +  svn_stringbuf_t *outstr;
>>  
>> +  for (q = p; q < end; q++)
> 
> 
> "for (q = string; ...)" would be clearer.

Hmm, I'm repeating the assignment later, for different reasons, though. 
This seems more consistent.

>> +    if (svn_ctype_iscntrl(*q))
> 
> Space after function name.

OK.

>> +      while (q < end && !svn_ctype_iscntrl(*q))
> 
> Space.

OK.

>> +      /* Append the control character in 0x%x format. */
> 
> 
> (Format is described differently in the doc string.)

Fixed.

>> +      char escaped_char[7];
> 
> 
> You can't put declarations after code in C.  (Obviously your compiler 
> let you; perhaps you can increase its warnings level.)  Move it to the 
> top of the block.

Done.

> Seven characters: '0', 'x', xdigit, [xdigit,] '\0', and what?  (It's not 
> that having a couple of extra bytes is a waste of memory, it's that it 
> confuses the reader: I'm now looking around trying to find what might 
> use the extra bytes.)

I'm guessing the \d got me thinking there might be 6 bytes. Fixed.

>> +++ subversion/mod_dav_svn/log.c    (working copy)
>> @@ -98,9 +98,10 @@
>>    if (msg)
>>      SVN_ERR( dav_svn__send_xml(lrb->bb, lrb->output,
>>                                 "<D:comment>%s</D:comment>" DEBUG_CR,
>> -                               apr_xml_quote_string(pool, msg, 0)) );
>> +                               svn_xml_fuzzy_escape (
>> +                                 apr_xml_quote_string (pool, msg, 0),
>> +                                 pool)) );
> 
> 
> No.  You should strip control characters before you XML-quote it, 
> otherwise you are relying on (1) your function producing validly quoted 
> XML (which it may do at present but is not documented to do), and

I don't think I understood your point.
Does apr_xml_quote_string require a validly-quoted XML string?
Also, svn_xml_fuzzy_escape() does not do XML in the sense of quoting, 
unlike say svn_xml_escape_cdata().

> (2) the XML-quoting function accepting and passing through control 
> characters (which such a function need not be expected to do).

It does do it now, which is why there is a bug report filed :)
Both the svn and apr functions seem to pass through bytes they are not
looking for.

Changing the order is not a big deal. It seems logical, though, to call 
the fuzzy_escaper as the last thing before sending the XML data.

>>        apr_hash_index_t *hi;
>> Index: subversion/tests/clients/cmdline/log_tests.py
>> ===================================================================
>> --- subversion/tests/clients/cmdline/log_tests.py    (revision 12853)
>> +++ subversion/tests/clients/cmdline/log_tests.py    (working copy)
>> @@ -516,6 +516,70 @@
>>    svntest.actions.run_and_verify_svn (None, [], SVNAnyOutput,
>>                                        'log', '-r', '2', mu2_URL)
>>    
>> +#----------------------------------------------------------------------
>> +def escape_control_chars(sbox):
>> +  "make sure mod_dav_svn escapes control chars"
>> +
>> +  dump_str = """SVN-fs-dump-format-version: 2
...
>> +svn:log
>> +V 37
>> +this msg contains a Ctrl-T:  <- here
> 
> 
> Eww.  Is this likely to be preserved in people's editors?  Perhaps you 
> could find a programmatic way of inserting it into the string without it 
> having to be a literal control character in the source file.

OK, I did a quick search and didn't understand the Python code that 
showed up. Could somebody with Python experience help? Perl interpolates
the value of a variable into a here document. How do I do that in Python?

Thanks for the detailed review.

Sameer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] issue #2147 - v1

Posted by Julian Foad <ju...@btopenworld.com>.
VK Sameer wrote:
> Attached is a fix for issue #2147, with much help from Karl Fogel.

And here's my rather picky review. :-)

- Julian


 > ------------------------------------------------------------------------
 >
 > Fixed issue #2147 - control characters in log message cause failure (over DAV)
 >
 > * subversion/mod_dav_svn/log.c: modified log_receiver to call
 >     svn_xml_fuzzy_escape()

But what is the intention or result of this change?  In other words, in what 
way does calling that new function affect what this function does?  For example:

* subversion/mod_dav_svn/log.c (log_receiver): call svn_xml_fuzzy_escape
     to ensure no control characters remain in the log message.

 > * subversion/include/svn_xml.h,
 >   subversion/libsvn_subr/xml.c: added svn_xml_fuzzy_escape()

You should name in parentheses each top-level program object (function, 
variable, etc.) that is modified/added/removed:

   (svn_xml_fuzzy_escape): New function.

 > * subversion/tests/clients/cmdline/log_tests.py:
 >     added escape_control_chars test

   (escape_control_chars): New test.
   (test_list): Added the new test.

 > ------------------------------------------------------------------------

> Index: subversion/include/svn_xml.h
> ===================================================================
> --- subversion/include/svn_xml.h	(revision 12853)
> +++ subversion/include/svn_xml.h	(working copy)
> @@ -117,7 +117,14 @@
>                                    const char *string,
>                                    apr_pool_t *pool);
>  
> +/** Return @c string on finding no ASCII control characters. Otherwise
> + *  return a copy created with @c pool, with control characters

"@a string" and "@a pool" - they are Arguments rather than Cross-references 
(that's how I think of the "@a" and "@c" mnemonics, even if that's not the 
formal definition).

> + *  replaced by "0x\d+" sequences.

Presumably hex digits, not "\d".

Marking the sequence with ordinary characters '0' and 'x' doesn't seem like the 
most friendly quoting method for human-readable text (except obviously it's 
friendly to C programmers).  I would have thought a notation starting with an 
unusual character like "\" or "&" and having a fixed number of digits (two, 
since it's always escaping a single byte) would be better.  But hey, I suppose 
the exact method used is a minor detail in the context of this whole patch - 
it's evidently not intended to be reversible or parseable, and is just a 
work-around to avoid failure.

> + */
> +const char* svn_xml_fuzzy_escape (const char *string,

Put space before "*", not after it.

Why does this return "const char *" when all the other svn_xml_escape_... 
functions return void and append their output to a stringbuf?  (Not saying you 
necessarily should change it.)

> +                                  apr_pool_t *pool);
>  
> +
>  /*---------------------------------------------------------------*/
>  
>  /* Generalized Subversion XML Parsing */
> Index: subversion/libsvn_subr/xml.c
> ===================================================================
> --- subversion/libsvn_subr/xml.c	(revision 12853)
> +++ subversion/libsvn_subr/xml.c	(working copy)
> @@ -25,6 +25,7 @@
>  #include "svn_pools.h"
>  #include "svn_xml.h"
>  #include "svn_error.h"
> +#include "svn_ctype.h"
>  
>  #ifdef SVN_HAVE_OLD_EXPAT
>  #include "xmlparse.h"
> @@ -265,7 +266,47 @@
>    xml_escape_attr (outstr, string, (apr_size_t) strlen (string), pool);
>  }
>  
> +/* Escape any ASCII control chars. */

Comment not needed at implementation, because the description is given at the 
prototype.

> +const char*

Space before "*".

> +svn_xml_fuzzy_escape (const char *string, apr_pool_t *pool)
> +{
> +  const char *end = string + strlen (string);
> +  const char *p = string, *q;
> +  svn_stringbuf_t *outstr;
>  
> +  for (q = p; q < end; q++)

"for (q = string; ...)" would be clearer.

> +    if (svn_ctype_iscntrl(*q))

Space after function name.

> +      break;
> +
> +  /* return original string if no control characters found*/
> +  if (q == end)
> +    return string;
> +
> +  outstr = svn_stringbuf_create ("", pool);
> +  while (1)
> +    {
> +      q = p;
> +      /* Traverse till either control character or eos */
> +      while (q < end && !svn_ctype_iscntrl(*q))

Space.

> +        q++;
> +
> +      /* copy chunk before marker */
> +      svn_stringbuf_appendbytes (outstr, p, q - p);
> +
> +      if (q == end)
> +        break;
> +
> +      /* Append the control character in 0x%x format. */

(Format is described differently in the doc string.)

> +      char escaped_char[7];

You can't put declarations after code in C.  (Obviously your compiler let you; 
perhaps you can increase its warnings level.)  Move it to the top of the block.

Seven characters: '0', 'x', xdigit, [xdigit,] '\0', and what?  (It's not that 
having a couple of extra bytes is a waste of memory, it's that it confuses the 
reader: I'm now looking around trying to find what might use the extra bytes.)

> +      sprintf (escaped_char, "0x%x", *q);
> +      svn_stringbuf_appendcstr (outstr, escaped_char);
> +
> +      p = q + 1;
> +    }
> +
> +  return outstr->data;
> +}
> +
>  
>  /*** Map from the Expat callback types to the SVN XML types. ***/
>  
> Index: subversion/mod_dav_svn/log.c
> ===================================================================
> --- subversion/mod_dav_svn/log.c	(revision 12853)
> +++ subversion/mod_dav_svn/log.c	(working copy)
> @@ -98,9 +98,10 @@
>    if (msg)
>      SVN_ERR( dav_svn__send_xml(lrb->bb, lrb->output,
>                                 "<D:comment>%s</D:comment>" DEBUG_CR,
> -                               apr_xml_quote_string(pool, msg, 0)) );
> +                               svn_xml_fuzzy_escape (
> +                                 apr_xml_quote_string (pool, msg, 0),
> +                                 pool)) );

No.  You should strip control characters before you XML-quote it, otherwise you 
are relying on (1) your function producing validly quoted XML (which it may do 
at present but is not documented to do), and (2) the XML-quoting function 
accepting and passing through control characters (which such a function need 
not be expected to do).

>  
> -
>    if (changed_paths)
>      {
>        apr_hash_index_t *hi;
> Index: subversion/tests/clients/cmdline/log_tests.py
> ===================================================================
> --- subversion/tests/clients/cmdline/log_tests.py	(revision 12853)
> +++ subversion/tests/clients/cmdline/log_tests.py	(working copy)
> @@ -516,6 +516,70 @@
>    svntest.actions.run_and_verify_svn (None, [], SVNAnyOutput,
>                                        'log', '-r', '2', mu2_URL)
>    
> +#----------------------------------------------------------------------
> +def escape_control_chars(sbox):
> +  "make sure mod_dav_svn escapes control chars"
> +
> +  dump_str = """SVN-fs-dump-format-version: 2
> +
> +UUID: ffcae364-69ee-0310-a980-ca5f10462af2
> +
> +Revision-number: 0
> +Prop-content-length: 56
> +Content-length: 56
> +
> +K 8
> +svn:date
> +V 27
> +2005-01-24T10:09:21.759592Z
> +PROPS-END
> +
> +Revision-number: 1
> +Prop-content-length: 128
> +Content-length: 128
> +
> +K 7
> +svn:log
> +V 37
> +this msg contains a Ctrl-T:  <- here

Eww.  Is this likely to be preserved in people's editors?  Perhaps you could 
find a programmatic way of inserting it into the string without it having to be 
a literal control character in the source file.

> +K 10
> +svn:author
> +V 7
> +jrandom
> +K 8
> +svn:date
> +V 27
> +2005-01-24T10:09:22.012524Z
> +PROPS-END
> +"""
> +
> +  # Create virgin repos and working copy
> +  svntest.main.safe_rmtree(sbox.repo_dir, 1)
> +  svntest.main.create_repos(sbox.repo_dir)
> +  svntest.main.set_repos_paths(sbox.repo_dir)
> +
> +  URL = svntest.main.current_repo_url
> +
> +  # load dumpfile with control character into repos to get
> +  # a log with control char content
> +  output, errput = \
> +    svntest.main.run_command_stdin(
> +    "%s load --quiet %s" % (svntest.main.svnadmin_binary, sbox.repo_dir),
> +    None, 1, dump_str)
> +
> +  # run log
> +  output, errput = svntest.actions.run_and_verify_svn ("", None, [], 'log', URL)
> +
> +  # verify output contains expected fuzzy escape sequence
> +  match_re = "this msg contains a Ctrl-T.*"
> +  for line in output:
> +    if re.match (match_re, line):
> +      break
> +  else:
> +    raise svntest.Failure ("Failed to find " + str(match_re) + " in " + str(output),
> +                           "error: " + str(errput))
> +
> +
>  ########################################################################
>  # Run the tests
>  
> @@ -530,6 +594,7 @@
>                log_with_path_args,
>                url_missing_in_head,
>                log_through_copyfrom_history,
> +              escape_control_chars,
>               ]
>  
>  if __name__ == '__main__':

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org