You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Lars Volker <lv...@cloudera.com> on 2017/02/01 01:53:01 UTC

Re: Python string formatting

Thanks for pointing this out. Yes, I thought of trying autopep8 and running
an exhaustive build. I assumed it wouldn't break tests, but that is
probably a naive assumption. However, if nothing breaks in an exhaustive
run I'd be quite confident that nothing else will. That leaves the risk
that a change made by autopep8 disables a test so we don't notice the
breakage. Other than manually reviewing the code I don't have an idea how
to prevent that.

Thoughts?

On Tue, Jan 31, 2017 at 8:29 PM, Jim Apple <jb...@cloudera.com> wrote:

> Will you use autopep8? If so, how will you check that it doesn't break
> something on an infrequently-used codepath?
>
> On Tue, Jan 31, 2017 at 11:12 AM, Michael Brown <mi...@cloudera.com>
> wrote:
> > I agree.
> >
> > On Tue, Jan 31, 2017 at 10:44 AM, Lars Volker <lv...@cloudera.com> wrote:
> >> Thanks for the feedback here and in the review.
> >>
> >> I agree that we should aim for a style as close to PEP8 as possible.
> >> However, I also didn't want to overshoot and my first goal was to get
> some
> >> useful tooling set up, so that I don't have to constantly worry about
> >> formatting. Once I had figured out some tooling, I thought I might as
> well
> >> share it and solicit feedback.
> >>
> >> Regarding the next steps, I'm open for anything really. I didn't know
> about
> >> the --diff switch of flake8, that looks very useful. Even better of
> course
> >> would be, if all python code could be converted to PEP8.
> >>
> >> Here is a list of all PEP8 violations and their count, obtained with
> >> "pycodestyle --statistics -qq tests":
> >>
> >> 9017    E111 indentation is not a multiple of four
> >> 902     E114 indentation is not a multiple of four (comment)
> >> 2       E116 unexpected indentation (comment)
> >> 24      E122 continuation line missing indentation or outdented
> >> 5       E124 closing bracket does not match visual indentation
> >> 105     E125 continuation line with same indent as next logical line
> >> 43      E127 continuation line over-indented for visual indent
> >> 1038    E128 continuation line under-indented for visual indent
> >> 7       E131 continuation line unaligned for hanging indent
> >> 13      E201 whitespace after '('
> >> 8       E202 whitespace before ']'
> >> 55      E203 whitespace before ':'
> >> 5       E211 whitespace before '['
> >> 5       E221 multiple spaces before operator
> >> 7       E222 multiple spaces after operator
> >> 9       E225 missing whitespace around operator
> >> 1       E227 missing whitespace around bitwise or shift operator
> >> 127     E231 missing whitespace after ':'
> >> 157     E251 unexpected spaces around keyword / parameter equals
> >> 20      E261 at least two spaces before inline comment
> >> 21      E265 block comment should start with '# '
> >> 1       E266 too many leading '#' for block comment
> >> 1       E271 multiple spaces after keyword
> >> 4       E301 expected 1 blank line, found 0
> >> 313     E302 expected 2 blank lines, found 1
> >> 16      E303 too many blank lines (2)
> >> 13      E305 expected 2 blank lines after class or function definition,
> >> found 1
> >> 6       E306 expected 1 blank line before a nested definition, found 0
> >> 7       E402 module level import not at top of file
> >> 3800    E501 line too long (80 > 79 characters)
> >> 278     E502 the backslash is redundant between brackets
> >> 87      E701 multiple statements on one line (colon)
> >> 74      E703 statement ends with a semicolon
> >> 12      E711 comparison to None should be 'if cond is None:'
> >> 9       E712 comparison to False should be 'if cond is False:' or 'if
> not
> >> cond:'
> >> 2       E713 test for membership should be 'not in'
> >> 2       E741 ambiguous variable name 'l'
> >> 1       W292 no newline at end of file
> >> 9       W391 blank line at end of file
> >> 2       W601 .has_key() is deprecated, use 'in'
> >> 19      W602 deprecated form of raising exception
> >>
> >> If we take out the well known ones (indent, line width), it does not
> look
> >> too far fetched to me to change it all to PEP8.
> >>
> >> Thoughts?
> >>
> >>
> >>
> >> On Tue, Jan 31, 2017 at 5:59 PM, Michael Brown <mi...@cloudera.com>
> wrote:
> >>
> >>> Thanks. I made some comments on the review, but I see now I should
> >>> probably share my general view here.
> >>>
> >>> My general view is, if we are going to codify our Python style guide,
> >>> I would rather we codify style conventions that are closer to standard
> >>> Python style conventions, rather than codify what is currently done. I
> >>> am willing to keep 2-space indents and 90-char lines, but I don't
> >>> think anything else should be part of the conventions when those
> >>> conventions involves ignoring PEP-008. My instinct tells me the Python
> >>> conventions weren't conventions at all, but came up organically
> >>> without regard to actually reading conventions or using tooling.
> >>> Otherwise, we'd have already had a Python style guide, right?
> >>>
> >>> If the concern is "But there are too many noisy errors if I am editing
> >>> an existing, large file, so we should ignore these anyway", something
> >>> like this is possible:
> >>>
> >>> git diff | flake8 --diff
> >>>
> >>> This will only show PEP-008 problems on changed code, not whole files.
> >>>
> >>>
> >>>
> >>> On Mon, Jan 30, 2017 at 3:20 PM, Lars Volker <lv...@cloudera.com> wrote:
> >>> > Cool, thanks Michael for the reply. I added a section on Python to
> the
> >>> Impala
> >>> > Style Guide
> >>> > <https://cwiki.apache.org/confluence/display/IMPALA/
> Impala+Style+Guide>.
> >>> > Please feel free to edit it or let me know if I should make changes.
> I
> >>> will
> >>> > also send out a review to add a .pep8rc file to the repository.
> >>> >
> >>> > On Fri, Jan 27, 2017 at 11:56 PM, Michael Brown <mi...@cloudera.com>
> >>> wrote:
> >>> >
> >>> >> I prefer str.format() over the % operator, because:
> >>> >>
> >>> >> https://docs.python.org/2.7/library/stdtypes.html#str.format
> >>> >>
> >>> >> "This method of string formatting is the new standard in Python 3,
> and
> >>> >> should be preferred to the % formatting described in String
> Formatting
> >>> >> Operations in new code."
> >>> >>
> >>> >> Without an Impala Python style guide, I tend to use what I see on
> >>> >> docs.python.org, modulo our 2-space indent and 90-char line policy.
> >>> >>
> >>> >>
> >>> >> On Fri, Jan 27, 2017 at 2:44 PM, Lars Volker <lv...@cloudera.com>
> wrote:
> >>> >> > Hi All,
> >>> >> >
> >>> >> > do we have a strong preference for either old style or new style
> >>> string
> >>> >> > formatting in Python?
> >>> >> >
> >>> >> > "Hello %s!" % ("world") *vs* "Hello {0}!".format("world")
> >>> >> >
> >>> >> > The Impala Style Guide
> >>> >> > <https://cwiki.apache.org/confluence/display/IMPALA/
> >>> Impala+Style+Guide>
> >>> >> doesn't
> >>> >> > mention Python at all.
> >>> >> >
> >>> >> > Thanks, Lars
> >>> >>
> >>>
>

Re: Python string formatting

Posted by Lars Volker <lv...@cloudera.com>.
Thanks for the feedback. Printing all tests and making sure the names stay
the same is a very good idea.

I also just stumbled over yapf (https://github.com/google/yapf) which is a
python code formatter based on clang-format's logic. I will check that one
out too and try to come up with a patch for review when I have some spare
time.

On Wed, Feb 1, 2017 at 3:33 AM, Jim Apple <jb...@cloudera.com> wrote:

> 1. Autopep8 might break a part of a test that is rarely run, so even
> if exhaustive works, something may have gotten messed up.
>
> 2. David pointed out that you can print the tests that are scheduled
> to be run by using pytest's dry-run option. You can then check that
> the number of tests (and maybe even the configuration and names of
> them?) stay the same after autopep8/
>
> 3. You could use https://docs.python.org/2/library/ast.html to make
> sure that only the whitespace is being changed and the AST is staying
> the same. That might be overkill.
>
> On Tue, Jan 31, 2017 at 5:53 PM, Lars Volker <lv...@cloudera.com> wrote:
> > Thanks for pointing this out. Yes, I thought of trying autopep8 and
> running
> > an exhaustive build. I assumed it wouldn't break tests, but that is
> > probably a naive assumption. However, if nothing breaks in an exhaustive
> > run I'd be quite confident that nothing else will. That leaves the risk
> > that a change made by autopep8 disables a test so we don't notice the
> > breakage. Other than manually reviewing the code I don't have an idea how
> > to prevent that.
> >
> > Thoughts?
> >
> > On Tue, Jan 31, 2017 at 8:29 PM, Jim Apple <jb...@cloudera.com> wrote:
> >
> >> Will you use autopep8? If so, how will you check that it doesn't break
> >> something on an infrequently-used codepath?
> >>
> >> On Tue, Jan 31, 2017 at 11:12 AM, Michael Brown <mi...@cloudera.com>
> >> wrote:
> >> > I agree.
> >> >
> >> > On Tue, Jan 31, 2017 at 10:44 AM, Lars Volker <lv...@cloudera.com>
> wrote:
> >> >> Thanks for the feedback here and in the review.
> >> >>
> >> >> I agree that we should aim for a style as close to PEP8 as possible.
> >> >> However, I also didn't want to overshoot and my first goal was to get
> >> some
> >> >> useful tooling set up, so that I don't have to constantly worry about
> >> >> formatting. Once I had figured out some tooling, I thought I might as
> >> well
> >> >> share it and solicit feedback.
> >> >>
> >> >> Regarding the next steps, I'm open for anything really. I didn't know
> >> about
> >> >> the --diff switch of flake8, that looks very useful. Even better of
> >> course
> >> >> would be, if all python code could be converted to PEP8.
> >> >>
> >> >> Here is a list of all PEP8 violations and their count, obtained with
> >> >> "pycodestyle --statistics -qq tests":
> >> >>
> >> >> 9017    E111 indentation is not a multiple of four
> >> >> 902     E114 indentation is not a multiple of four (comment)
> >> >> 2       E116 unexpected indentation (comment)
> >> >> 24      E122 continuation line missing indentation or outdented
> >> >> 5       E124 closing bracket does not match visual indentation
> >> >> 105     E125 continuation line with same indent as next logical line
> >> >> 43      E127 continuation line over-indented for visual indent
> >> >> 1038    E128 continuation line under-indented for visual indent
> >> >> 7       E131 continuation line unaligned for hanging indent
> >> >> 13      E201 whitespace after '('
> >> >> 8       E202 whitespace before ']'
> >> >> 55      E203 whitespace before ':'
> >> >> 5       E211 whitespace before '['
> >> >> 5       E221 multiple spaces before operator
> >> >> 7       E222 multiple spaces after operator
> >> >> 9       E225 missing whitespace around operator
> >> >> 1       E227 missing whitespace around bitwise or shift operator
> >> >> 127     E231 missing whitespace after ':'
> >> >> 157     E251 unexpected spaces around keyword / parameter equals
> >> >> 20      E261 at least two spaces before inline comment
> >> >> 21      E265 block comment should start with '# '
> >> >> 1       E266 too many leading '#' for block comment
> >> >> 1       E271 multiple spaces after keyword
> >> >> 4       E301 expected 1 blank line, found 0
> >> >> 313     E302 expected 2 blank lines, found 1
> >> >> 16      E303 too many blank lines (2)
> >> >> 13      E305 expected 2 blank lines after class or function
> definition,
> >> >> found 1
> >> >> 6       E306 expected 1 blank line before a nested definition, found
> 0
> >> >> 7       E402 module level import not at top of file
> >> >> 3800    E501 line too long (80 > 79 characters)
> >> >> 278     E502 the backslash is redundant between brackets
> >> >> 87      E701 multiple statements on one line (colon)
> >> >> 74      E703 statement ends with a semicolon
> >> >> 12      E711 comparison to None should be 'if cond is None:'
> >> >> 9       E712 comparison to False should be 'if cond is False:' or 'if
> >> not
> >> >> cond:'
> >> >> 2       E713 test for membership should be 'not in'
> >> >> 2       E741 ambiguous variable name 'l'
> >> >> 1       W292 no newline at end of file
> >> >> 9       W391 blank line at end of file
> >> >> 2       W601 .has_key() is deprecated, use 'in'
> >> >> 19      W602 deprecated form of raising exception
> >> >>
> >> >> If we take out the well known ones (indent, line width), it does not
> >> look
> >> >> too far fetched to me to change it all to PEP8.
> >> >>
> >> >> Thoughts?
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Jan 31, 2017 at 5:59 PM, Michael Brown <mi...@cloudera.com>
> >> wrote:
> >> >>
> >> >>> Thanks. I made some comments on the review, but I see now I should
> >> >>> probably share my general view here.
> >> >>>
> >> >>> My general view is, if we are going to codify our Python style
> guide,
> >> >>> I would rather we codify style conventions that are closer to
> standard
> >> >>> Python style conventions, rather than codify what is currently
> done. I
> >> >>> am willing to keep 2-space indents and 90-char lines, but I don't
> >> >>> think anything else should be part of the conventions when those
> >> >>> conventions involves ignoring PEP-008. My instinct tells me the
> Python
> >> >>> conventions weren't conventions at all, but came up organically
> >> >>> without regard to actually reading conventions or using tooling.
> >> >>> Otherwise, we'd have already had a Python style guide, right?
> >> >>>
> >> >>> If the concern is "But there are too many noisy errors if I am
> editing
> >> >>> an existing, large file, so we should ignore these anyway",
> something
> >> >>> like this is possible:
> >> >>>
> >> >>> git diff | flake8 --diff
> >> >>>
> >> >>> This will only show PEP-008 problems on changed code, not whole
> files.
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, Jan 30, 2017 at 3:20 PM, Lars Volker <lv...@cloudera.com>
> wrote:
> >> >>> > Cool, thanks Michael for the reply. I added a section on Python to
> >> the
> >> >>> Impala
> >> >>> > Style Guide
> >> >>> > <https://cwiki.apache.org/confluence/display/IMPALA/
> >> Impala+Style+Guide>.
> >> >>> > Please feel free to edit it or let me know if I should make
> changes.
> >> I
> >> >>> will
> >> >>> > also send out a review to add a .pep8rc file to the repository.
> >> >>> >
> >> >>> > On Fri, Jan 27, 2017 at 11:56 PM, Michael Brown <
> mikeb@cloudera.com>
> >> >>> wrote:
> >> >>> >
> >> >>> >> I prefer str.format() over the % operator, because:
> >> >>> >>
> >> >>> >> https://docs.python.org/2.7/library/stdtypes.html#str.format
> >> >>> >>
> >> >>> >> "This method of string formatting is the new standard in Python
> 3,
> >> and
> >> >>> >> should be preferred to the % formatting described in String
> >> Formatting
> >> >>> >> Operations in new code."
> >> >>> >>
> >> >>> >> Without an Impala Python style guide, I tend to use what I see on
> >> >>> >> docs.python.org, modulo our 2-space indent and 90-char line
> policy.
> >> >>> >>
> >> >>> >>
> >> >>> >> On Fri, Jan 27, 2017 at 2:44 PM, Lars Volker <lv...@cloudera.com>
> >> wrote:
> >> >>> >> > Hi All,
> >> >>> >> >
> >> >>> >> > do we have a strong preference for either old style or new
> style
> >> >>> string
> >> >>> >> > formatting in Python?
> >> >>> >> >
> >> >>> >> > "Hello %s!" % ("world") *vs* "Hello {0}!".format("world")
> >> >>> >> >
> >> >>> >> > The Impala Style Guide
> >> >>> >> > <https://cwiki.apache.org/confluence/display/IMPALA/
> >> >>> Impala+Style+Guide>
> >> >>> >> doesn't
> >> >>> >> > mention Python at all.
> >> >>> >> >
> >> >>> >> > Thanks, Lars
> >> >>> >>
> >> >>>
> >>
>

Re: Python string formatting

Posted by Jim Apple <jb...@cloudera.com>.
1. Autopep8 might break a part of a test that is rarely run, so even
if exhaustive works, something may have gotten messed up.

2. David pointed out that you can print the tests that are scheduled
to be run by using pytest's dry-run option. You can then check that
the number of tests (and maybe even the configuration and names of
them?) stay the same after autopep8/

3. You could use https://docs.python.org/2/library/ast.html to make
sure that only the whitespace is being changed and the AST is staying
the same. That might be overkill.

On Tue, Jan 31, 2017 at 5:53 PM, Lars Volker <lv...@cloudera.com> wrote:
> Thanks for pointing this out. Yes, I thought of trying autopep8 and running
> an exhaustive build. I assumed it wouldn't break tests, but that is
> probably a naive assumption. However, if nothing breaks in an exhaustive
> run I'd be quite confident that nothing else will. That leaves the risk
> that a change made by autopep8 disables a test so we don't notice the
> breakage. Other than manually reviewing the code I don't have an idea how
> to prevent that.
>
> Thoughts?
>
> On Tue, Jan 31, 2017 at 8:29 PM, Jim Apple <jb...@cloudera.com> wrote:
>
>> Will you use autopep8? If so, how will you check that it doesn't break
>> something on an infrequently-used codepath?
>>
>> On Tue, Jan 31, 2017 at 11:12 AM, Michael Brown <mi...@cloudera.com>
>> wrote:
>> > I agree.
>> >
>> > On Tue, Jan 31, 2017 at 10:44 AM, Lars Volker <lv...@cloudera.com> wrote:
>> >> Thanks for the feedback here and in the review.
>> >>
>> >> I agree that we should aim for a style as close to PEP8 as possible.
>> >> However, I also didn't want to overshoot and my first goal was to get
>> some
>> >> useful tooling set up, so that I don't have to constantly worry about
>> >> formatting. Once I had figured out some tooling, I thought I might as
>> well
>> >> share it and solicit feedback.
>> >>
>> >> Regarding the next steps, I'm open for anything really. I didn't know
>> about
>> >> the --diff switch of flake8, that looks very useful. Even better of
>> course
>> >> would be, if all python code could be converted to PEP8.
>> >>
>> >> Here is a list of all PEP8 violations and their count, obtained with
>> >> "pycodestyle --statistics -qq tests":
>> >>
>> >> 9017    E111 indentation is not a multiple of four
>> >> 902     E114 indentation is not a multiple of four (comment)
>> >> 2       E116 unexpected indentation (comment)
>> >> 24      E122 continuation line missing indentation or outdented
>> >> 5       E124 closing bracket does not match visual indentation
>> >> 105     E125 continuation line with same indent as next logical line
>> >> 43      E127 continuation line over-indented for visual indent
>> >> 1038    E128 continuation line under-indented for visual indent
>> >> 7       E131 continuation line unaligned for hanging indent
>> >> 13      E201 whitespace after '('
>> >> 8       E202 whitespace before ']'
>> >> 55      E203 whitespace before ':'
>> >> 5       E211 whitespace before '['
>> >> 5       E221 multiple spaces before operator
>> >> 7       E222 multiple spaces after operator
>> >> 9       E225 missing whitespace around operator
>> >> 1       E227 missing whitespace around bitwise or shift operator
>> >> 127     E231 missing whitespace after ':'
>> >> 157     E251 unexpected spaces around keyword / parameter equals
>> >> 20      E261 at least two spaces before inline comment
>> >> 21      E265 block comment should start with '# '
>> >> 1       E266 too many leading '#' for block comment
>> >> 1       E271 multiple spaces after keyword
>> >> 4       E301 expected 1 blank line, found 0
>> >> 313     E302 expected 2 blank lines, found 1
>> >> 16      E303 too many blank lines (2)
>> >> 13      E305 expected 2 blank lines after class or function definition,
>> >> found 1
>> >> 6       E306 expected 1 blank line before a nested definition, found 0
>> >> 7       E402 module level import not at top of file
>> >> 3800    E501 line too long (80 > 79 characters)
>> >> 278     E502 the backslash is redundant between brackets
>> >> 87      E701 multiple statements on one line (colon)
>> >> 74      E703 statement ends with a semicolon
>> >> 12      E711 comparison to None should be 'if cond is None:'
>> >> 9       E712 comparison to False should be 'if cond is False:' or 'if
>> not
>> >> cond:'
>> >> 2       E713 test for membership should be 'not in'
>> >> 2       E741 ambiguous variable name 'l'
>> >> 1       W292 no newline at end of file
>> >> 9       W391 blank line at end of file
>> >> 2       W601 .has_key() is deprecated, use 'in'
>> >> 19      W602 deprecated form of raising exception
>> >>
>> >> If we take out the well known ones (indent, line width), it does not
>> look
>> >> too far fetched to me to change it all to PEP8.
>> >>
>> >> Thoughts?
>> >>
>> >>
>> >>
>> >> On Tue, Jan 31, 2017 at 5:59 PM, Michael Brown <mi...@cloudera.com>
>> wrote:
>> >>
>> >>> Thanks. I made some comments on the review, but I see now I should
>> >>> probably share my general view here.
>> >>>
>> >>> My general view is, if we are going to codify our Python style guide,
>> >>> I would rather we codify style conventions that are closer to standard
>> >>> Python style conventions, rather than codify what is currently done. I
>> >>> am willing to keep 2-space indents and 90-char lines, but I don't
>> >>> think anything else should be part of the conventions when those
>> >>> conventions involves ignoring PEP-008. My instinct tells me the Python
>> >>> conventions weren't conventions at all, but came up organically
>> >>> without regard to actually reading conventions or using tooling.
>> >>> Otherwise, we'd have already had a Python style guide, right?
>> >>>
>> >>> If the concern is "But there are too many noisy errors if I am editing
>> >>> an existing, large file, so we should ignore these anyway", something
>> >>> like this is possible:
>> >>>
>> >>> git diff | flake8 --diff
>> >>>
>> >>> This will only show PEP-008 problems on changed code, not whole files.
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Jan 30, 2017 at 3:20 PM, Lars Volker <lv...@cloudera.com> wrote:
>> >>> > Cool, thanks Michael for the reply. I added a section on Python to
>> the
>> >>> Impala
>> >>> > Style Guide
>> >>> > <https://cwiki.apache.org/confluence/display/IMPALA/
>> Impala+Style+Guide>.
>> >>> > Please feel free to edit it or let me know if I should make changes.
>> I
>> >>> will
>> >>> > also send out a review to add a .pep8rc file to the repository.
>> >>> >
>> >>> > On Fri, Jan 27, 2017 at 11:56 PM, Michael Brown <mi...@cloudera.com>
>> >>> wrote:
>> >>> >
>> >>> >> I prefer str.format() over the % operator, because:
>> >>> >>
>> >>> >> https://docs.python.org/2.7/library/stdtypes.html#str.format
>> >>> >>
>> >>> >> "This method of string formatting is the new standard in Python 3,
>> and
>> >>> >> should be preferred to the % formatting described in String
>> Formatting
>> >>> >> Operations in new code."
>> >>> >>
>> >>> >> Without an Impala Python style guide, I tend to use what I see on
>> >>> >> docs.python.org, modulo our 2-space indent and 90-char line policy.
>> >>> >>
>> >>> >>
>> >>> >> On Fri, Jan 27, 2017 at 2:44 PM, Lars Volker <lv...@cloudera.com>
>> wrote:
>> >>> >> > Hi All,
>> >>> >> >
>> >>> >> > do we have a strong preference for either old style or new style
>> >>> string
>> >>> >> > formatting in Python?
>> >>> >> >
>> >>> >> > "Hello %s!" % ("world") *vs* "Hello {0}!".format("world")
>> >>> >> >
>> >>> >> > The Impala Style Guide
>> >>> >> > <https://cwiki.apache.org/confluence/display/IMPALA/
>> >>> Impala+Style+Guide>
>> >>> >> doesn't
>> >>> >> > mention Python at all.
>> >>> >> >
>> >>> >> > Thanks, Lars
>> >>> >>
>> >>>
>>