You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2020/05/14 13:55:43 UTC

Re: svn commit: r1877712 - in /subversion/trunk/subversion/tests/cmdline: entries-dump.c svntest/main.py

futatuki@apache.org wrote on Thu, 14 May 2020 01:46 -0000:
> Log:
> Use safe bytes literals when set string values in working copy entries. 
> 

If someone reads this log message in a year, I don't think it'd be clear
to them what the problem being fixed was.  I think it would be useful to
be more specific about the circumstances; for example:

    entries-dump: Escape string-typed attribute values when serializing
    them as Python string literals.

    Before this commit, a filesystem node named «foo\bar» (a single,
    7-character path component) would cause «e.name = 'foo\bar'» to be
    emitted.  The unescaped backslash would manifest as a test failure or
    a SyntaxError, depending on the following characters.

    This was triggered by svnadmin_tests 42 verify_metadata_only under
    Python 3 on Windows.

The reference to "svnadmin_tests 42 verify_metadata_only" is only
a placeholder, to be replaced by the correct reference.

I specifically used the term "filesystem node" to clarify that «foo\bar»
was to be taken as an svn_fspath__* path (where backslash has no
special meaning), as opposed to, say, an svn_dirent_* path (where
backslash is a delimiter on Windows).

> * subversion/tests/cmdline/entries-dump.c
>   (print_prefix): New function.
>   (str_value):
>   - Add argument to specify pool.
>   - Print human readable value of "value" as is in comment, then set it
>     as str value by using hex escaped bytes literal.
>   (entries_dump): Add pool argument to str_value() calls.
>   (main):
>   - Print "Entry" class definition as prefix before entry_dump() or tree_dump() 
>   - Style fix on if statement (using blocks). 
>   (): Add include files for assert() and svn_xml_escape_attr_cstring()
> * subversion/tests/cmdline/svntest/main.py
>   (run_entiresdump, run_entriesdump_tree): Move definition of "Entry" class
>   into generated code by entries-dump execution. 
> 
> Review By: danielsh

For future reference, the "${Verb}ed by" phrases in log messages are
case-sensitive:

    tools/dev/contribulyze.py:521:field_re = re.compile(
    tools/dev/contribulyze.py:522:           '^(Patch|Review(ed)?|Suggested|Found|Inspired|Tested|Reported) by:'
    tools/dev/contribulyze.py:523:           '\s*\S.*$')
    ⋮
    tools/dev/contribulyze.py:566:            m = field_re.match(line)
    ⋮
    tools/dev/contribulyze.py:604:                  m = field_re.match(line)

This doesn't matter in this case, of course, since the person being
credited is already a full committer; however, when crediting people who
aren't already full committers, the correct case form should be used so
the regexp matches and the contribulyzer pages are updated.  (Unless we
change the script)

Cheers,

Daniel


Re: svn commit: r1877712 - in /subversion/trunk/subversion/tests/cmdline: entries-dump.c svntest/main.py

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Yasuhito FUTATSUKI wrote on Mon, 18 May 2020 04:53 +0900:
> On 2020/05/15 10:32, Daniel Shahaf wrote:
> > Something is not clear to me in this paragraph.  I do see that if we had
> > written «printf("'%s'", value)», SyntaxError's could still have resulted;
> > however, I don't see how that is a reason not to choose single quotes.
> > We _could_ still have chosen single quotes if we had escaped backslashes,
> > single quotes, and newlines in «value» before printing it, couldn't we?
> > 
> > It's not clear to me what this paragraph is trying to explain: whether
> > it's trying to explain why the code generates a «bytes» object as
> > opposed to a «str» object, or to explain the choice to escape every byte
> > (even those that don't _require_ escaping according to Python's bytes
> > literals syntax), or something else.
> > 
> > Also, s/ether/either/.  
> 
> Thanks for the review, again. I intended to to explain there can be more
> case that we should use escape expression, and I'm sure it wasn't
> achieved.
> 

Thanks for clarifying this.  The new log message is clear.  I have only
minor tweaks to propose.

> Before this commit, a filesystem node named "foo\bar" (a single,
> 7-character path component) would cause "e.name = 'foo\bar'" to be
> emitted.  The unescaped backslash would manifest as a test failure or
> a SyntaxError, depending on the following characters.

Suggest s/The/In general, the/, otherwise the reader would be left
wondering how the Python literal «'foo\bar'» could possibly generate
a SyntaxError, since «\b» specifically happens to be a valid escape
sequence.

> This was triggered by update_tests.py 76 windows_update_backslash under
> Python 3 on Windows.
> 
> There can be some other characters that should be escaped.  For example,
> user names can contain "'" (a single quote character) and/or """ (a
> double quote character), which would potentially cause a SyntaxError
> even if we choose either of them to quote string literals.  To avoid to
> overlook such potentially unsafe characters, I decide to use hex value
> escape for all characters.  

Grammar: s/to overlook/overlooking/, s/decide/decided/.

> Furthermore, to ensure that values are decoded to Unicode as UTF-8 byte
> sequences when we use hex value escape under Python 3, we print them as
> bytes value and then encode it.

s/bytes value/bytes values/.  (Furthermore, note that the singular form
would have required an indefinite article: "a bytes value" rather than
*"bytes value".)

> * subversion/tests/cmdline/entries-dump.c
>   (print_prefix): New function.
>   (str_value):
>   - Add argument to specify pool.
>   - Print human readable value of "value" as is in comment, then set it
>     as str value by using hex escaped bytes literal.
>   (entries_dump): Add pool argument to str_value() calls.
>   (main):
>   - Print "Entry" class definition as prefix before entry_dump() or tree_dump()
>   - Style fix on if statement (using blocks).
>   (): Add include files for assert() and svn_xml_escape_attr_cstring()
> * subversion/tests/cmdline/svntest/main.py
>   (run_entiresdump, run_entriesdump_tree): Move definition of "Entry" class
>   into generated code by entries-dump execution.

Please add an empty line above the "* …/main.py" line to make reading
easier.

+1 to commit^W propedit.  And thanks!

Cheers,

Daniel

> Found by: svn-windows-ra buildbot
> Review by: danielsh
> ]]]
> 
> Thanks,


Re: svn commit: r1877712 - in /subversion/trunk/subversion/tests/cmdline: entries-dump.c svntest/main.py

Posted by Yasuhito FUTATSUKI <fu...@poem.co.jp>.
On 2020/05/15 10:32, Daniel Shahaf wrote:
> Yasuhito FUTATSUKI wrote on Thu, 14 May 2020 21:44 +00:00:
>> Thank you for the review and lesson on my commit message. I intended
>> to send this commit message with the patch before commit, but missed
>> by mistakes.
>>
>> I added note about quotation characters which could also cause SyntaxError.
>>
>> Before replace this log message, I'd like to get a review again.
>>
> 
> Thanks for your diligence.
> 
>> [[[
>> entries-dump: Escape string-typed attribute values when serializing
>> them as Python string literals.
>>
>> Before this commit, a filesystem node named "foo\bar" (a single,
>> 7-character path component) would cause "e.name = 'foo\bar'" to be
>> emitted.  The unescaped backslash would manifest as a test failure or
>> a SyntaxError, depending on the following characters.
>>
> 
> *nod* Changing the Unicode quotes to double quotes is fine.
> 
> (I generally use Unicode quotes so it's clear which quotes delimit code
> from English prose and which quotes are literal parts of the code.  
> However, that's just me.)
> 
>> Also, user names can contain "'" (a single quote character) and/or """
>> (a double quote character), which would potentially cause a SyntaxError
>> even if we choose ether of them to quote string literals.
> 
> Something is not clear to me in this paragraph.  I do see that if we had
> written «printf("'%s'", value)», SyntaxError's could still have resulted;
> however, I don't see how that is a reason not to choose single quotes.
> We _could_ still have chosen single quotes if we had escaped backslashes,
> single quotes, and newlines in «value» before printing it, couldn't we?
> 
> It's not clear to me what this paragraph is trying to explain: whether
> it's trying to explain why the code generates a «bytes» object as
> opposed to a «str» object, or to explain the choice to escape every byte
> (even those that don't _require_ escaping according to Python's bytes
> literals syntax), or something else.
> 
> Also, s/ether/either/.

Thanks for the review, again. I intended to to explain there can be more
case that we should use escape expression, and I'm sure it wasn't
achieved.

[[[
entries-dump: Escape string-typed attribute values when serializing
them as Python string literals.

Before this commit, a filesystem node named "foo\bar" (a single,
7-character path component) would cause "e.name = 'foo\bar'" to be
emitted.  The unescaped backslash would manifest as a test failure or
a SyntaxError, depending on the following characters.

This was triggered by update_tests.py 76 windows_update_backslash under
Python 3 on Windows.

There can be some other characters that should be escaped.  For example,
user names can contain "'" (a single quote character) and/or """ (a
double quote character), which would potentially cause a SyntaxError
even if we choose either of them to quote string literals.  To avoid to
overlook such potentially unsafe characters, I decide to use hex value
escape for all characters.  

Furthermore, to ensure that values are decoded to Unicode as UTF-8 byte
sequences when we use hex value escape under Python 3, we print them as
bytes value and then encode it.

* subversion/tests/cmdline/entries-dump.c
  (print_prefix): New function.
  (str_value):
  - Add argument to specify pool.
  - Print human readable value of "value" as is in comment, then set it
    as str value by using hex escaped bytes literal.
  (entries_dump): Add pool argument to str_value() calls.
  (main):
  - Print "Entry" class definition as prefix before entry_dump() or tree_dump()
  - Style fix on if statement (using blocks).
  (): Add include files for assert() and svn_xml_escape_attr_cstring()
* subversion/tests/cmdline/svntest/main.py
  (run_entiresdump, run_entriesdump_tree): Move definition of "Entry" class
  into generated code by entries-dump execution.

Found by: svn-windows-ra buildbot
Review by: danielsh
]]]

Thanks,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>

Re: svn commit: r1877712 - in /subversion/trunk/subversion/tests/cmdline: entries-dump.c svntest/main.py

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Yasuhito FUTATSUKI wrote on Thu, 14 May 2020 21:44 +00:00:
> Thank you for the review and lesson on my commit message. I intended
> to send this commit message with the patch before commit, but missed
> by mistakes.
> 
> I added note about quotation characters which could also cause SyntaxError.
> 
> Before replace this log message, I'd like to get a review again.
> 

Thanks for your diligence.

> [[[
> entries-dump: Escape string-typed attribute values when serializing
> them as Python string literals.
> 
> Before this commit, a filesystem node named "foo\bar" (a single,
> 7-character path component) would cause "e.name = 'foo\bar'" to be
> emitted.  The unescaped backslash would manifest as a test failure or
> a SyntaxError, depending on the following characters.
> 

*nod* Changing the Unicode quotes to double quotes is fine.

(I generally use Unicode quotes so it's clear which quotes delimit code
from English prose and which quotes are literal parts of the code.  
However, that's just me.)

> Also, user names can contain "'" (a single quote character) and/or """
> (a double quote character), which would potentially cause a SyntaxError
> even if we choose ether of them to quote string literals.

Something is not clear to me in this paragraph.  I do see that if we had
written «printf("'%s'", value)», SyntaxError's could still have resulted;
however, I don't see how that is a reason not to choose single quotes.
We _could_ still have chosen single quotes if we had escaped backslashes,
single quotes, and newlines in «value» before printing it, couldn't we?

It's not clear to me what this paragraph is trying to explain: whether
it's trying to explain why the code generates a «bytes» object as
opposed to a «str» object, or to explain the choice to escape every byte
(even those that don't _require_ escaping according to Python's bytes
literals syntax), or something else.

Also, s/ether/either/.

> * subversion/tests/cmdline/entries-dump.c
>   (print_prefix): New function.
>   (str_value):
>   - Add argument to specify pool.
>   - Print human readable value of "value" as is in comment, then set it
>     as str value by using hex escaped bytes literal.
>   (entries_dump): Add pool argument to str_value() calls.
>   (main):
>   - Print "Entry" class definition as prefix before entry_dump() or tree_dump()
>   - Style fix on if statement (using blocks).
>   (): Add include files for assert() and svn_xml_escape_attr_cstring()
> * subversion/tests/cmdline/svntest/main.py
>   (run_entiresdump, run_entriesdump_tree): Move definition of "Entry" class
>   into generated code by entries-dump execution.
> 
> Found by: svn-windows-ra buildbot
> Review by: danielsh

Looks good.

Thanks!

Daniel

> ]]]

Re: svn commit: r1877712 - in /subversion/trunk/subversion/tests/cmdline: entries-dump.c svntest/main.py

Posted by Yasuhito FUTATSUKI <fu...@poem.co.jp>.
On 2020/05/14 22:55, Daniel Shahaf wrote:
> futatuki@apache.org wrote on Thu, 14 May 2020 01:46 -0000:
>> Log:
>> Use safe bytes literals when set string values in working copy entries. 
>>
> 
> If someone reads this log message in a year, I don't think it'd be clear
> to them what the problem being fixed was.  I think it would be useful to
> be more specific about the circumstances; for example:> 
>     entries-dump: Escape string-typed attribute values when serializing>     them as Python string literals.
> 
>     Before this commit, a filesystem node named «foo\bar» (a single,
>     7-character path component) would cause «e.name = 'foo\bar'» to be
>     emitted.  The unescaped backslash would manifest as a test failure or
>     a SyntaxError, depending on the following characters.
> 
>     This was triggered by svnadmin_tests 42 verify_metadata_only under
>     Python 3 on Windows.
> 
> The reference to "svnadmin_tests 42 verify_metadata_only" is only
> a placeholder, to be replaced by the correct reference.
> 
> I specifically used the term "filesystem node" to clarify that «foo\bar»
> was to be taken as an svn_fspath__* path (where backslash has no
> special meaning), as opposed to, say, an svn_dirent_* path (where
> backslash is a delimiter on Windows).

Thank you for the review and lesson on my commit message. I intended
to send this commit message with the patch before commit, but missed
by mistakes.

I added note about quotation characters which could also cause SyntaxError.

Before replace this log message, I'd like to get a review again.

[[[
entries-dump: Escape string-typed attribute values when serializing
them as Python string literals.

Before this commit, a filesystem node named "foo\bar" (a single,
7-character path component) would cause "e.name = 'foo\bar'" to be
emitted.  The unescaped backslash would manifest as a test failure or
a SyntaxError, depending on the following characters.

This was triggered by update_tests.py 76 windows_update_backslash under
Python 3 on Windows.

Also, user names can contain "'" (a single quote character) and/or """
(a double quote character), which would potentially cause a SyntaxError
even if we choose ether of them to quote string literals.

* subversion/tests/cmdline/entries-dump.c
  (print_prefix): New function.
  (str_value):
  - Add argument to specify pool.
  - Print human readable value of "value" as is in comment, then set it
    as str value by using hex escaped bytes literal.
  (entries_dump): Add pool argument to str_value() calls.
  (main):
  - Print "Entry" class definition as prefix before entry_dump() or tree_dump()
  - Style fix on if statement (using blocks).
  (): Add include files for assert() and svn_xml_escape_attr_cstring()
* subversion/tests/cmdline/svntest/main.py
  (run_entiresdump, run_entriesdump_tree): Move definition of "Entry" class
  into generated code by entries-dump execution.

Found by: svn-windows-ra buildbot
Review by: danielsh
]]]
 
Thanks,
-- 
Yasuhito FUTATSUKI <fu...@yf.bsdclub.org>