You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ismaël Mejía <ie...@gmail.com> on 2020/01/21 20:51:33 UTC

Re: [DISCUSS] Autoformat python code with Black

Last time we discussed this there seems not to be much progress into
autoformatting.
This tool looks more tweakable, so maybe it could be more appropriate for
Beam's use case.
https://github.com/google/yapf/
WDYT?


On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:

> +1 for any autoformatter for Python SDK that does the job. My experience
> is that since spotless in Java SDK I would never start a new Java project
> without it. So many great benefits not only for one person coding but for
> all community.
>
> It is a GitHub UI issue that you cannot easily browse past the reformat.
> It is not actually that hard, but does take a couple extra clicks to get
> GitHub to display blame before a reformat. It is easier with the command
> line. I do a lot of code history digging and the global Java reformat is
> not really a problem.
>
> It's actually one more click on Github but I agree it's not the best way
> to search the history. The most convenient and clear one I've found so far
> is in Jetbrains IDEs (Intelij) where you can:
>
> right click on line number -> "annotate" -> click again -> "annotate
> previous revision" -> ...
>
> You can also use "compare with" to see the diff between two revisions.
>
> Łukasz
>
>
>
>
>
> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
>
>> +1 pending good enough tooling (I can't quite tell - seems there are some
>> issues?)
>>
>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>> ka.kucharczyk@gmail.com> wrote:
>>
>>> What else actually we gain? My guess is faster PR review iteration. We
>>> will skip some of conversations about code style.
>>>
>> ...
>>
>>> Last but not least, new contributor may be less discouraged. When I
>>> started contribute I didn’t know how to format my code and I lost a lot of
>>> time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>>> write code intuitively and when I don’t forget I rerun tox.
>>>
>>
>> This is a huge benefit. This is why I supported it so much for Java. It
>> is a community benefit. You do not have to be a contributor to the Python
>> SDK to support this. That is why I am writing here. Just eliminate all
>> discussion of formatting. It doesn't really matter what the resulting
>> format is, if it is not crazy to read. I strongly oppose maintaining a
>> non-default format.
>>
>> Reformating 20k lines or 200k is not hard. The Java global reformat
>> touched 50k lines. It does not really matter how big it is. Definitely do
>> it all at once if you think the tool is good enough. And you should pin a
>> version, so churn is not a problem. You can upgrade the version and
>> reformat in a PR later and that is also easy.
>>
>> It is a GitHub UI issue that you cannot easily browse past the reformat.
>> It is not actually that hard, but does take a couple extra clicks to get
>> GitHub to display blame before a reformat. It is easier with the command
>> line. I do a lot of code history digging and the global Java reformat is
>> not really a problem.
>>
>> Kenn
>>
>>
>>
>>> Also everything will be formatted in a same way, so eventually it would
>>> be easier to read.
>>>
>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins
>>> failures won’t take place, so we save time and resources.
>>>
>>>
>>> One of disadvantages is that our pipelines has custom syntax and after
>>> formatting they looks a little bit weird, but maybe extending the only
>>> configurable option in Black - lines, from 88 to 110 would be solution.
>>>
>>> Second one is that Black requires Python 3 to be run. I don’t know how
>>> big obstacle it would be.
>>>
>>> I believe there are two options how it would be possible to introduce
>>> Black. First: just do it, it will hurt but then it would be ok (same as a
>>> dentist appointment). Of course it may require some work to adjust linters.
>>> On the other hand we can do it gradually and start including sdk parts one
>>> by one - maybe it will be less painful?
>>>
>>> As an example I can share one of projects [2] I know that uses Black
>>> (they use also other cool checkers and pre-commit [3]). This is how looks
>>> their build with all checks [4].
>>>
>>> To sum up I believe that if we want improve our coding experience, we
>>> should improve our toolset. Black seems be recent and quite popular tool
>>> what makes think they won’t stop developing it.
>>>
>>> [1]
>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>
>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>
>>> [3] https://pre-commit.com
>>>
>>> [4]
>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>
>>>
>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change
>>>> nearly every single line in the codebase (and the loss of all context as
>>>> well as that particular line).
>>>>
>>>> This is probably why the 2-space fork exists. However, we don't conform
>>>> to that either--we use 2 spaces for indentation, but 4 for continuation
>>>> indentation. (As for the history of this, this goes back to Google's
>>>> internal style guide, probably motivated by consistency with C++, Java, ...
>>>> and the fact that with an indent level of 4 one ends up wrapping lines
>>>> quite frequently (it's telling that black's default line length is 88)).
>>>> This turns out to be an easy change to the codebase.
>>>>
>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this
>>>> tool introduces a huge amount of vertical whitespace (e.g. closing
>>>> parentheses on their own line), e.g.
>>>>
>>>> def foo(
>>>>     args
>>>> ):
>>>>   if (
>>>>       long expression)
>>>>   ):
>>>>     func(
>>>>         args
>>>>     )
>>>>
>>>> I wrote a simple post-processor to put closing parentheses on the same
>>>> lines, as well as omit the newline after "if (", and disabling formatting
>>>> of strings, which reduce the churn in our codebase to 15k lines (adding
>>>> about 4k) out of 200k total.
>>>>
>>>> https://github.com/apache/beam/pull/8712/files
>>>>
>>>> It's still very opinionated, often in different ways then me, and
>>>> doesn't understand the semantics of the code, but possibly something we
>>>> could live with given the huge advantages of an autoformatter.
>>>>
>>>> An intermediate point would be to allow, but not require,
>>>> autoformatting of changed lines.
>>>>
>>>> As for being beta quality, it looks like it's got a decent number of
>>>> contributors and in my book being in the python github project is a strong
>>>> positive signal. But, due to the above issues, I think we'd have to
>>>> maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue
>>>> is a 2-line change, and the rest implemented as a post-processing step (for
>>>> now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>
>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com>
>>>> wrote:
>>>> >
>>>> > > I think the question is if it can be configured in a way to fit our
>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>> > > entire Python SDK.
>>>> >
>>>> > It cannot be configured to do what we actually do because Black is
>>>> > configurable only to support the standard python codestyle guidelines
>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>> > python world use.
>>>> >
>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>> > > that this might be less of a problem with Python because the linter
>>>> has
>>>> > > more rules than Checkstyle had.
>>>> >
>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>> > to deal with.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>>> wrote:
>>>> > >
>>>> > > I think the question is if it can be configured in a way to fit our
>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>> > > entire Python SDK.
>>>> > >
>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>> > > that this might be less of a problem with Python because the linter
>>>> has
>>>> > > more rules than Checkstyle had.
>>>> > >
>>>> > > -Max
>>>> > >
>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>> > > >> My concerns are:
>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>> > > >> - It looks like mostly a single person project. For the same
>>>> reason I also strongly prefer not using a fork for a specific setting. Fork
>>>> will only have less people looking at it.
>>>> > > >
>>>> > > > I suppose the project is marked as beta because it is recent, it
>>>> was
>>>> > > > presented in 2018’s pycon, and because some things can change
>>>> since
>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that
>>>> case is
>>>> > > > like our own ‘@Experimental’. If you look at the contribution
>>>> page [1]
>>>> > > > you can notice that it is less and less a single person project,
>>>> there
>>>> > > > have been 93 independent contributions since the project became
>>>> > > > public, and the fact that it is hosted in the python organization
>>>> > > > github [2] gives some confidence on the project continuity.
>>>> > > >
>>>> > > > You are right however about the fact that the main author seems
>>>> to be
>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>> > > > arbitrary, but he is just following pep8 style guide
>>>> recommendations
>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>> > > > recommendation of PEP-8 or even Google's own Python style guide
>>>> [4],
>>>> > > > So, probably it should be to us to reconsider the current policy
>>>> to
>>>> > > > adapt to the standards (and the tool).
>>>> > > >
>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>> > > > sdks/python and got only 4 parsing errors which is positive given
>>>> the
>>>> > > > size of our code base.
>>>> > > >
>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to
>>>> reformat.
>>>> > > >
>>>> > > > error: cannot format
>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>> > > > error: cannot format
>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>> > > > error: cannot format
>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>> file=sys.stderr)
>>>> > > > error: cannot format
>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '
>>>>  ',
>>>> > > > end=' ')
>>>> > > >
>>>> > > > I still think this can be positive for the project but well I am
>>>> > > > barely a contributor to the python code base so I let you the
>>>> python
>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>> > > > improvement for the project.
>>>> > > >
>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>> > > > [2] https://github.com/python
>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>> > > > [4]
>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>> > > >
>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com>
>>>> wrote:
>>>> > > >>
>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>> > > >> - It looks like mostly a single person project. For the same
>>>> reason I also strongly prefer not using a fork for a specific setting. Fork
>>>> will only have less people looking at it.
>>>> > > >>
>>>> > > >> IMO, this is in an early stage for us. That said lint issues are
>>>> real as pointed in the thread. If someone would like to give it a try and
>>>> see how it would look like for us that would be interesting.
>>>> > > >>
>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>> ka.kucharczyk@gmail.com> wrote:
>>>> > > >>>
>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are
>>>> caused by lint problems.
>>>> > > >>> I think it would be great to have something similar to Spotless
>>>> in Java SDK (I heard there is problem with configuring Black with IntelliJ).
>>>> > > >>>
>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>> robertwb@google.com> wrote:
>>>> > > >>>>
>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't
>>>> looked at
>>>> > > >>>> how well this particular one works. We might have to go with
>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>> > > >>>>
>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>> pabloem@google.com> wrote:
>>>> > > >>>>>
>>>> > > >>>>> This looks pretty good:) I know at least a couple people
>>>> (myself included) who've been annoyed by having to take care of lint issues
>>>> that maybe a code formatter could save us.
>>>> > > >>>>> Thanks for sharing Ismael.
>>>> > > >>>>> -P.
>>>> > > >>>>>
>>>> > > >>>>>
>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>> iemejia@gmail.com> wrote:
>>>> > > >>>>>>
>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto
>>>> formatter that
>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and
>>>> wanted to
>>>> > > >>>>>> bring to the ML Is there interest from the python people to
>>>> get this
>>>> > > >>>>>> into the build?
>>>> > > >>>>>>
>>>> > > >>>>>> The introduction of spotless for Java has been a good
>>>> improvement and
>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>> > > >>>>>>
>>>> > > >>>>>> WDYT?
>>>> > > >>>>>>
>>>> > > >>>>>> [1] https://github.com/python/black
>>>>
>>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Yichi Zhang <zy...@google.com>.
+1 to yapf, I normally run yapf before submitting PR and it seems to be
able to solve most of my lint errors (indentations most of the time) with
the correct style file.
it will be great if yapf has an option to only autoformat the diff change,
but seems not officially supported. (I use a vim format-selected plugin
with yapf to avoid formatting the whole file, but it is annoying since not
fully automated)
https://github.com/google/yapf/issues/190 seems relevant.

On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:

> Last time we discussed this there seems not to be much progress into
> autoformatting.
> This tool looks more tweakable, so maybe it could be more appropriate for
> Beam's use case.
> https://github.com/google/yapf/
> WDYT?
>
>
> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:
>
>> +1 for any autoformatter for Python SDK that does the job. My experience
>> is that since spotless in Java SDK I would never start a new Java project
>> without it. So many great benefits not only for one person coding but for
>> all community.
>>
>> It is a GitHub UI issue that you cannot easily browse past the reformat.
>> It is not actually that hard, but does take a couple extra clicks to get
>> GitHub to display blame before a reformat. It is easier with the command
>> line. I do a lot of code history digging and the global Java reformat is
>> not really a problem.
>>
>> It's actually one more click on Github but I agree it's not the best way
>> to search the history. The most convenient and clear one I've found so far
>> is in Jetbrains IDEs (Intelij) where you can:
>>
>> right click on line number -> "annotate" -> click again -> "annotate
>> previous revision" -> ...
>>
>> You can also use "compare with" to see the diff between two revisions.
>>
>> Łukasz
>>
>>
>>
>>
>>
>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
>>
>>> +1 pending good enough tooling (I can't quite tell - seems there are
>>> some issues?)
>>>
>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>> ka.kucharczyk@gmail.com> wrote:
>>>
>>>> What else actually we gain? My guess is faster PR review iteration. We
>>>> will skip some of conversations about code style.
>>>>
>>> ...
>>>
>>>> Last but not least, new contributor may be less discouraged. When I
>>>> started contribute I didn’t know how to format my code and I lost a lot of
>>>> time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>>>> write code intuitively and when I don’t forget I rerun tox.
>>>>
>>>
>>> This is a huge benefit. This is why I supported it so much for Java. It
>>> is a community benefit. You do not have to be a contributor to the Python
>>> SDK to support this. That is why I am writing here. Just eliminate all
>>> discussion of formatting. It doesn't really matter what the resulting
>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>> non-default format.
>>>
>>> Reformating 20k lines or 200k is not hard. The Java global reformat
>>> touched 50k lines. It does not really matter how big it is. Definitely do
>>> it all at once if you think the tool is good enough. And you should pin a
>>> version, so churn is not a problem. You can upgrade the version and
>>> reformat in a PR later and that is also easy.
>>>
>>> It is a GitHub UI issue that you cannot easily browse past the reformat.
>>> It is not actually that hard, but does take a couple extra clicks to get
>>> GitHub to display blame before a reformat. It is easier with the command
>>> line. I do a lot of code history digging and the global Java reformat is
>>> not really a problem.
>>>
>>> Kenn
>>>
>>>
>>>
>>>> Also everything will be formatted in a same way, so eventually it would
>>>> be easier to read.
>>>>
>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins
>>>> failures won’t take place, so we save time and resources.
>>>>
>>>>
>>>> One of disadvantages is that our pipelines has custom syntax and after
>>>> formatting they looks a little bit weird, but maybe extending the only
>>>> configurable option in Black - lines, from 88 to 110 would be solution.
>>>>
>>>> Second one is that Black requires Python 3 to be run. I don’t know how
>>>> big obstacle it would be.
>>>>
>>>> I believe there are two options how it would be possible to introduce
>>>> Black. First: just do it, it will hurt but then it would be ok (same as a
>>>> dentist appointment). Of course it may require some work to adjust linters.
>>>> On the other hand we can do it gradually and start including sdk parts one
>>>> by one - maybe it will be less painful?
>>>>
>>>> As an example I can share one of projects [2] I know that uses Black
>>>> (they use also other cool checkers and pre-commit [3]). This is how looks
>>>> their build with all checks [4].
>>>>
>>>> To sum up I believe that if we want improve our coding experience, we
>>>> should improve our toolset. Black seems be recent and quite popular tool
>>>> what makes think they won’t stop developing it.
>>>>
>>>> [1]
>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>
>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>
>>>> [3] https://pre-commit.com
>>>>
>>>> [4]
>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>
>>>>
>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change
>>>>> nearly every single line in the codebase (and the loss of all context as
>>>>> well as that particular line).
>>>>>
>>>>> This is probably why the 2-space fork exists. However, we don't
>>>>> conform to that either--we use 2 spaces for indentation, but 4 for
>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>> lines quite frequently (it's telling that black's default line length is
>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>>
>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this
>>>>> tool introduces a huge amount of vertical whitespace (e.g. closing
>>>>> parentheses on their own line), e.g.
>>>>>
>>>>> def foo(
>>>>>     args
>>>>> ):
>>>>>   if (
>>>>>       long expression)
>>>>>   ):
>>>>>     func(
>>>>>         args
>>>>>     )
>>>>>
>>>>> I wrote a simple post-processor to put closing parentheses on the same
>>>>> lines, as well as omit the newline after "if (", and disabling formatting
>>>>> of strings, which reduce the churn in our codebase to 15k lines (adding
>>>>> about 4k) out of 200k total.
>>>>>
>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>
>>>>> It's still very opinionated, often in different ways then me, and
>>>>> doesn't understand the semantics of the code, but possibly something we
>>>>> could live with given the huge advantages of an autoformatter.
>>>>>
>>>>> An intermediate point would be to allow, but not require,
>>>>> autoformatting of changed lines.
>>>>>
>>>>> As for being beta quality, it looks like it's got a decent number of
>>>>> contributors and in my book being in the python github project is a strong
>>>>> positive signal. But, due to the above issues, I think we'd have to
>>>>> maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue
>>>>> is a 2-line change, and the rest implemented as a post-processing step (for
>>>>> now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>
>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>> > > current linter's style. I don't think it is feasible to reformat
>>>>> the
>>>>> > > entire Python SDK.
>>>>> >
>>>>> > It cannot be configured to do what we actually do because Black is
>>>>> > configurable only to support the standard python codestyle guidelines
>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>>> > python world use.
>>>>> >
>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>> > > effect is still visible in the Java SDK. However, I have the
>>>>> feeling
>>>>> > > that this might be less of a problem with Python because the
>>>>> linter has
>>>>> > > more rules than Checkstyle had.
>>>>> >
>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>>> > to deal with.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>>>> wrote:
>>>>> > >
>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>> > > current linter's style. I don't think it is feasible to reformat
>>>>> the
>>>>> > > entire Python SDK.
>>>>> > >
>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>> > > effect is still visible in the Java SDK. However, I have the
>>>>> feeling
>>>>> > > that this might be less of a problem with Python because the
>>>>> linter has
>>>>> > > more rules than Checkstyle had.
>>>>> > >
>>>>> > > -Max
>>>>> > >
>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>> > > >> My concerns are:
>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>> > > >> - It looks like mostly a single person project. For the same
>>>>> reason I also strongly prefer not using a fork for a specific setting. Fork
>>>>> will only have less people looking at it.
>>>>> > > >
>>>>> > > > I suppose the project is marked as beta because it is recent, it
>>>>> was
>>>>> > > > presented in 2018’s pycon, and because some things can change
>>>>> since
>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that
>>>>> case is
>>>>> > > > like our own ‘@Experimental’. If you look at the contribution
>>>>> page [1]
>>>>> > > > you can notice that it is less and less a single person project,
>>>>> there
>>>>> > > > have been 93 independent contributions since the project became
>>>>> > > > public, and the fact that it is hosted in the python organization
>>>>> > > > github [2] gives some confidence on the project continuity.
>>>>> > > >
>>>>> > > > You are right however about the fact that the main author seems
>>>>> to be
>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>> recommendations
>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide
>>>>> [4],
>>>>> > > > So, probably it should be to us to reconsider the current policy
>>>>> to
>>>>> > > > adapt to the standards (and the tool).
>>>>> > > >
>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>>> > > > sdks/python and got only 4 parsing errors which is positive
>>>>> given the
>>>>> > > > size of our code base.
>>>>> > > >
>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed
>>>>> to reformat.
>>>>> > > >
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>> file=sys.stderr)
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '
>>>>>    ',
>>>>> > > > end=' ')
>>>>> > > >
>>>>> > > > I still think this can be positive for the project but well I am
>>>>> > > > barely a contributor to the python code base so I let you the
>>>>> python
>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>>> > > > improvement for the project.
>>>>> > > >
>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>> > > > [2] https://github.com/python
>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>> > > > [4]
>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>> > > >
>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com>
>>>>> wrote:
>>>>> > > >>
>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>> > > >> - It looks like mostly a single person project. For the same
>>>>> reason I also strongly prefer not using a fork for a specific setting. Fork
>>>>> will only have less people looking at it.
>>>>> > > >>
>>>>> > > >> IMO, this is in an early stage for us. That said lint issues
>>>>> are real as pointed in the thread. If someone would like to give it a try
>>>>> and see how it would look like for us that would be interesting.
>>>>> > > >>
>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>> > > >>>
>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are
>>>>> caused by lint problems.
>>>>> > > >>> I think it would be great to have something similar to
>>>>> Spotless in Java SDK (I heard there is problem with configuring Black with
>>>>> IntelliJ).
>>>>> > > >>>
>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>> robertwb@google.com> wrote:
>>>>> > > >>>>
>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't
>>>>> looked at
>>>>> > > >>>> how well this particular one works. We might have to go with
>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>> > > >>>>
>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>> pabloem@google.com> wrote:
>>>>> > > >>>>>
>>>>> > > >>>>> This looks pretty good:) I know at least a couple people
>>>>> (myself included) who've been annoyed by having to take care of lint issues
>>>>> that maybe a code formatter could save us.
>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>> > > >>>>> -P.
>>>>> > > >>>>>
>>>>> > > >>>>>
>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>> iemejia@gmail.com> wrote:
>>>>> > > >>>>>>
>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto
>>>>> formatter that
>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and
>>>>> wanted to
>>>>> > > >>>>>> bring to the ML Is there interest from the python people to
>>>>> get this
>>>>> > > >>>>>> into the build?
>>>>> > > >>>>>>
>>>>> > > >>>>>> The introduction of spotless for Java has been a good
>>>>> improvement and
>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>> > > >>>>>>
>>>>> > > >>>>>> WDYT?
>>>>> > > >>>>>>
>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>
>>>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Chad Dombrova <ch...@gmail.com>.
I have a PR I'm working on to allow users to easily setup yapf to run on
pre-commit.  Is that something that interests people?

-chad



On Fri, Feb 7, 2020 at 12:29 PM Robert Bradshaw <ro...@google.com> wrote:

> Yeah, that's a lot worse. This looks like
> https://github.com/google/yapf/issues/530 . In the meantime,
> https://pypi.org/project/yapf/#potentially-frequently-asked-questions
>
> On Fri, Feb 7, 2020 at 12:17 PM David Yan <da...@google.com> wrote:
> >
> > Hi, I just tried out the yapf formatter and I noticed that sometimes
> it's making the original code a lot less readable.
> > In the below example, - is the original, + is after running the yapf
> formatter. Looks like the problem is with the method chaining pattern.
> > How feasible is it to have yapf identify such a pattern and format it
> better?
> > Before this can be fixed, Is it possible to have a directive in the code
> comment to bypass yapf?
> >
> > Thanks!
> >
> > -    test_stream = (TestStream()
> > -                   .advance_watermark_to(0)
> > -                   .add_elements(['a', 'b', 'c'])
> > -                   .advance_processing_time(1)
> > -                   .advance_processing_time(1)
> > -                   .advance_processing_time(1)
> > -                   .advance_processing_time(1)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(5)
> > -                   .add_elements(['1', '2', '3'])
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(6)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(7)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(8)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(9)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(10)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(11)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(12)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(13)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(14)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(15)
> > -                   .advance_processing_time(1)
> > -                   )
> > +    test_stream = (
> > +        TestStream().advance_watermark_to(0).add_elements(
> > +            ['a', 'b',
> 'c']).advance_processing_time(1).advance_processing_time(
> > +
> 1).advance_processing_time(1).advance_processing_time(1).
> > +        advance_processing_time(1).advance_watermark_to(5).add_elements(
> > +            ['1', '2',
> '3']).advance_processing_time(1).advance_watermark_to(
> > +                6).advance_processing_time(1).advance_watermark_to(
> > +                    7).advance_processing_time(1).advance_watermark_to(
> > +
> 8).advance_processing_time(1).advance_watermark_to(9).
> > +        advance_processing_time(1).advance_watermark_to(
> > +            10).advance_processing_time(1).advance_watermark_to(
> > +                11).advance_processing_time(1).advance_watermark_to(
> > +                    12).advance_processing_time(1).advance_watermark_to(
> > +
> 13).advance_processing_time(1).advance_watermark_to(
> > +
> 14).advance_processing_time(1).advance_watermark_to(
> > +                                15).advance_processing_time(1))
> >
> > On Thu, Feb 6, 2020 at 1:50 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> Thanks!
> >>
> >> On Thu, Feb 6, 2020 at 1:29 PM Ismaël Mejía <ie...@gmail.com> wrote:
> >>>
> >>> Thanks Kamil and Michał for taking care of this.
> >>> Excellent job!
> >>>
> >>> On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
> >>>>
> >>>> Thanks to everyone involved in the discussion.
> >>>>
> >>>> I've taken a look at the first 50 recently updated Pull Requests.
> Only few of them were affected. I hope it wouldn't be too hard to fix them.
> >>>>
> >>>> In any case, here you can find instructions on how to run formatter:
> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips (section
> "Formatting").
> >>>>
> >>>> On Thu, Feb 6, 2020 at 12:42 PM Michał Walenia <
> michal.walenia@polidea.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>> the PR is merged, all checks were green :)
> >>>>> Enjoy prettier Python!
> >>>>>
> >>>>> On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> Agree no need for vote for this because the consensus is clear and
> the sole
> >>>>>> impact I can think of are pending PRs that will be broken. In the
> Java case
> >>>>>> what we did was to just notice every PR that was affected by the
> change.
> >>>>>> And clearly document how to validate and autoformat the code.
> >>>>>>
> >>>>>> So the earlier the better, go go autoformat!
> >>>>>>
> >>>>>> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com>
> wrote:
> >>>>>>>
> >>>>>>> No, perhaps not. I agree there's consensus, just wondering what the
> >>>>>>> next steps should be to get this in. (The presubmits look like
> they're
> >>>>>>> all passing, with the exception of some breakage in java that
> should
> >>>>>>> be completely unrelated. Of course there's already merge
> conflicts...)
> >>>>>>>
> >>>>>>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com>
> wrote:
> >>>>>>> >
> >>>>>>> > Do we need a formal vote? There is consensus on this thread and
> on the PR.
> >>>>>>> >
> >>>>>>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>> >>
> >>>>>>> >> The PR is looking good. Should we call a vote?
> >>>>>>> >>
> >>>>>>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>> >> >
> >>>>>>> >> > Thanks. I commented on the PR. I think if we're going this
> route we
> >>>>>>> >> > should add a pre-commit, plus instructions on how to run the
> tool
> >>>>>>> >> > (similar to spotless).
> >>>>>>> >> >
> >>>>>>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com>
> wrote:
> >>>>>>> >> > >
> >>>>>>> >> > > I've done a pass on the PR on code I'm familiar with.
> >>>>>>> >> > > Please make a pass and add your suggestions on the PR.
> >>>>>>> >> > >
> >>>>>>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> >>>>>>> >> > >>
> >>>>>>> >> > >> Java build fails on any unformatted code so python
> probably should be like that.
> >>>>>>> >> > >> We have to ensure however that it fails early on that.
> >>>>>>> >> > >> As Robert said time to debate the knobs :)
> >>>>>>> >> > >>
> >>>>>>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
> >>>>>>> >> > >>>
> >>>>>>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684.
> Please share your comments ;-) I've managed to reduce the impact a bit:
> >>>>>>> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
> >>>>>>> >> > >>>
> >>>>>>> >> > >>> We still need to consider how to enforce the usage of
> autoformatter. Pre-commit sounds like a nice addition, but it still needs
> to be installed manually by a developer. On the other hand, Jenkins
> precommit job that fails if any unformatted code is detected looks like too
> strict. What do you think?
> >>>>>>> >> > >>>
> >>>>>>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> FYI, I did a simple run (just pushed to
> >>>>>>> >> > >>>>
> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
> >>>>>>> >> > >>>> the impact. The diff is
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>>     $ git diff --stat master
> >>>>>>> >> > >>>>     ...
> >>>>>>> >> > >>>>      547 files changed, 22118 insertions(+), 21129
> deletions(-)
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> For reference
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs
> wc
> >>>>>>> >> > >>>>     ...
> >>>>>>> >> > >>>>     200424  612002 7431637 total
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> which means a little over 10% of lines get touched. I
> think there are
> >>>>>>> >> > >>>> some options, such as
> SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
> >>>>>>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style
> we are already
> >>>>>>> >> > >>>> (mostly) following.
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
> >>>>>>> >> > >>>> <ka...@polidea.com> wrote:
> >>>>>>> >> > >>>> >
> >>>>>>> >> > >>>> > Thank you Michał for creating the ticket. I have some
> free time and I'd like to volunteer myself for this task.
> >>>>>>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so
> I'll try `yapf` first.
> >>>>>>> >> > >>>> >
> >>>>>>> >> > >>>> > Best,
> >>>>>>> >> > >>>> > Kamil
> >>>>>>> >> > >>>> >
> >>>>>>> >> > >>>> >
> >>>>>>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
> michal.walenia@polidea.com> wrote:
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> Hi all,
> >>>>>>> >> > >>>> >> I created a JIRA issue for this and summarized the
> available tools
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> Cheers,
> >>>>>>> >> > >>>> >> Michal
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <
> ehudm@google.com> wrote:
> >>>>>>> >> > >>>> >>>
> >>>>>>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
> >>>>>>> >> > >>>> >>>
> >>>>>>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <
> ehudm@google.com> wrote:
> >>>>>>> >> > >>>> >>>>
> >>>>>>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I
> volunteer to take this on
> >>>>>>> >> > >>>> >>>>
> >>>>>>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <
> ehudm@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>
> >>>>>>> >> > >>>> >>>>> +1 to autoformatting
> >>>>>>> >> > >>>> >>>>>
> >>>>>>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
> lcwik@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went
> through a one time pass to apply the spotless formatting.
> >>>>>>> >> > >>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
> altay@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>
> >>>>>>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be
> a well maintained project. I do support making a one time pass to apply
> formatting the whole code base.
> >>>>>>> >> > >>>> >>>>>>>
> >>>>>>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
> chadrik@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply
> to violating (or at
> >>>>>>> >> > >>>> >>>>>>>>> least changed) lines.
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert
> all of the code in one go, since it’s a very safe operation. Did you have
> something else in mind?
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>> -chad
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
> chadrik@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > +1 to autoformatting
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
> formatters:  those which take the original formatting into consideration
> (autopep8) and those which disregard it (yapf, black).
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you
> have plenty of options to tweak with yapf (enough to make the output a
> pretty close match to the current Beam style), and you can mark areas to
> preserve the original formatting, which could be very useful with Pipeline
> building with pipe operators.  Please don't pick black.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless
> in Java -- it only corrects code that breaks the project's style rules.
> The big problem with Beam's current style is that it is so esoteric that
> autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which
> I don't really have a problem with -- the problem is the use of either 2 or
> 4 spaces depending on context (expression start vs hanging indent, etc).
> This is my *biggest* gripe about the current style.  PyCharm doesn't have
> enough control either.  So, if we can choose a style that can be expressed
> by flake8 or pycodestyle then we can use autopep8 to enforce it.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like
> having a little wiggle room to influence the style, but on a big project
> like Beam all that wiggle room ends up to minor but noticeable
> inconsistencies in style throughout the project.  yapf ensures completely
> consistent style, but the tradeoff is that it's sometimes ugly, especially
> in scenarios with similar repeated entries like argparse, where yapf might
> insert line breaks in visually inconsistent and unappealing ways depending
> on the lengths of the keywords and expressions involved.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose
> yapf) I think it'd be a nice addition to setup a pre-commit [1] config so
> that people can opt in to running *lightweight* autofixers prior to
> commit.  This will not only reduce dev frustration but will also reduce the
> amount of cpu cycles that Jenkins spends pointing out lint errors.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > -chad
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël
> Mejía <ie...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>
> >>>>>>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not
> to be much progress into autoformatting.
> >>>>>>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it
> could be more appropriate for Beam's use case.
> >>>>>>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
> >>>>>>> >> > >>>> >>>>>>>>> >> WDYT?
> >>>>>>> >> > >>>> >>>>>>>>> >>
> >>>>>>> >> > >>>> >>>>>>>>> >>
> >>>>>>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz
> Gajowy <lg...@apache.org> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK
> that does the job. My experience is that since spotless in Java SDK I would
> never start a new Java project without it. So many great benefits not only
> for one person coding but for all community.
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot
> easily browse past the reformat. It is not actually that hard, but does
> take a couple extra clicks to get GitHub to display blame before a
> reformat. It is easier with the command line. I do a lot of code history
> digging and the global Java reformat is not really a problem.
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but
> I agree it's not the best way to search the history. The most convenient
> and clear one I've found so far is in Jetbrains IDEs (Intelij) where you
> can:
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate"
> -> click again -> "annotate previous revision" -> ...
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the
> diff between two revisions.
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> Łukasz
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
> kenn@apache.org> napisał(a):
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't
> quite tell - seems there are some issues?)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna
> Kucharczyk <ka...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is
> faster PR review iteration. We will skip some of conversations about code
> style.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> ...
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may
> be less discouraged. When I started contribute I didn’t know how to format
> my code and I lost a lot of time to add pylint and adjust IntelliJ. I
> eventually failed. Currently I write code intuitively and when I don’t
> forget I rerun tox.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I
> supported it so much for Java. It is a community benefit. You do not have
> to be a contributor to the Python SDK to support this. That is why I am
> writing here. Just eliminate all discussion of formatting. It doesn't
> really matter what the resulting format is, if it is not crazy to read. I
> strongly oppose maintaining a non-default format.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not
> hard. The Java global reformat touched 50k lines. It does not really matter
> how big it is. Definitely do it all at once if you think the tool is good
> enough. And you should pin a version, so churn is not a problem. You can
> upgrade the version and reformat in a PR later and that is also easy.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot
> easily browse past the reformat. It is not actually that hard, but does
> take a couple extra clicks to get GitHub to display blame before a
> reformat. It is easier with the command line. I do a lot of code history
> digging and the global Java reformat is not really a problem.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> Kenn
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a
> same way, so eventually it would be easier to read.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in
> previous emails - a lot of Jenkins failures won’t take place, so we save
> time and resources.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our
> pipelines has custom syntax and after formatting they looks a little bit
> weird, but maybe extending the only configurable option in Black - lines,
> from 88 to 110 would be solution.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python
> 3 to be run. I don’t know how big obstacle it would be.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it
> would be possible to introduce Black. First: just do it, it will hurt but
> then it would be ok (same as a dentist appointment). Of course it may
> require some work to adjust linters. On the other hand we can do it
> gradually and start including sdk parts one by one - maybe it will be less
> painful?
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of
> projects [2] I know that uses Black (they use also other cool checkers and
> pre-commit [3]). This is how looks their build with all checks [4].
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want
> improve our coding experience, we should improve our toolset. Black seems
> be recent and quite popular tool what makes think they won’t stop
> developing it.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [1]
> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [2]
> https://github.com/GoogleCloudPlatform/oozie-to-airflow
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [4]
> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert
> Bradshaw <ro...@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a
> non-starter to me, as it would change nearly every single line in the
> codebase (and the loss of all context as well as that particular line).
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork
> exists. However, we don't conform to that either--we use 2 spaces for
> indentation, but 4 for continuation indentation. (As for the history of
> this, this goes back to Google's internal style guide, probably motivated
> by consistency with C++, Java, ... and the fact that with an indent level
> of 4 one ends up wrapping lines quite frequently (it's telling that black's
> default line length is 88)). This turns out to be an easy change to the
> codebase.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4
> whitespace thing, I found that this tool introduces a huge amount of
> vertical whitespace (e.g. closing parentheses on their own line), e.g.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> def foo(
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     args
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> ):
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>   if (
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>   ):
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     func(
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>         args
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     )
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put
> closing parentheses on the same lines, as well as omit the newline after
> "if (", and disabling formatting of strings, which reduce the churn in our
> codebase to 15k lines (adding about 4k) out of 200k total.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> https://github.com/apache/beam/pull/8712/files
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in
> different ways then me, and doesn't understand the semantics of the code,
> but possibly something we could live with given the huge advantages of an
> autoformatter.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to
> allow, but not require, autoformatting of changed lines.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks
> like it's got a decent number of contributors and in my book being in the
> python github project is a strong positive signal. But, due to the above
> issues, I think we'd have to maintain a fork. (The code is pretty
> lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest
> implemented as a post-processing step (for now, incomplete), so it'd be
> easy to stay in sync with upstream.)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël
> Mejía <ie...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can
> be configured in a way to fit our
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't
> think it is feasible to reformat the
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we
> actually do because Black is
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the
> standard python codestyle guidelines
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and
> is what most projects in the
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
> access to the Git history. This
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java
> SDK. However, I have the feeling
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a
> problem with Python because the linter has
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but
> there are always tradeoffs we have
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM
> Maximilian Michels <mx...@apache.org> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can
> be configured in a way to fit our
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't
> think it is feasible to reformat the
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
> access to the Git history. This
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java
> SDK. However, I have the feeling
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a
> problem with Python because the linter has
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía
> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked
> as beta with a big warning.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
> person project. For the same reason I also strongly prefer not using a fork
> for a specific setting. Fork will only have less people looking at it.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked
> as beta because it is recent, it was
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and
> because some things can change since
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky
> beasts, I think beta in that case is
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If
> you look at the contribution page [1]
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less
> and less a single person project, there
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent
> contributions since the project became
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is
> hosted in the python organization
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence
> on the project continuity.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the
> fact that the main author seems to be
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in
> the 2-spaces issue he can seem
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just
> following pep8 style guide recommendations
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we
> (Beam) do not follow the 4 spaces
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even
> Google's own Python style guide [4],
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us
> to reconsider the current policy to
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the
> tool).
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with
> python 2.7 compatibility on
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4
> parsing errors which is positive given the
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files
> left unchanged, 4 files failed to reformat.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:
>  _display_progress = print
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>    file=sys.stderr)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>  print(traceback_string, file=sys.stderr)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:
>  print('-->' if pc == last_pc else '    ',
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be
> positive for the project but well I am
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the
> python code base so I let you the python
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this,
> in any case it seems like a good
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
> https://github.com/python/black/graphs/contributors
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
> https://www.python.org/dev/peps/pep-0008/#indentation
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM
> Ahmet Altay <al...@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with
> Robert, I am in favor of autoformatters but I am not familiar with this
> one. My concerns are:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked
> as beta with a big warning.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
> person project. For the same reason I also strongly prefer not using a fork
> for a specific setting. Fork will only have less people looking at it.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage
> for us. That said lint issues are real as pointed in the thread. If someone
> would like to give it a try and see how it would look like for us that
> would be interesting.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot
> of Jenkins jobs failures are caused by lint problems.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to
> have something similar to Spotless in Java SDK (I heard there is problem
> with configuring Black with IntelliJ).
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52
> PM Robert Bradshaw <ro...@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
> autoformatters, though I haven't looked at
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one
> works. We might have to go with
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> https://github.com/desbma/black-2spaces given
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> https://github.com/python/black/issues/378 .
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43
> PM Pablo Estrada <pa...@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I
> know at least a couple people (myself included) who've been annoyed by
> having to take care of lint issues that maybe a code formatter could save
> us.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24
> PM Ismaël Mejía <ie...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into
> Black [1] a python code auto formatter that
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
> auto-formatter for python, and wanted to
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there
> interest from the python people to get this
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless
> for Java has been a good improvement and
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base
> may benefit of this too.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1]
> https://github.com/python/black
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> --
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> Michał Walenia
> >>>>>>> >> > >>>> >> Polidea | Software Engineer
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> M: +48 791 432 002
> >>>>>>> >> > >>>> >> E: michal.walenia@polidea.com
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> Unique Tech
> >>>>>>> >> > >>>> >> Check out our projects!
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Michał Walenia
> >>>>> Polidea | Software Engineer
> >>>>>
> >>>>> M: +48 791 432 002
> >>>>> E: michal.walenia@polidea.com
> >>>>>
> >>>>> Unique Tech
> >>>>> Check out our projects!
>
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Udi Meiri <eh...@google.com>.
Chad: yes. I also noticed that it's not running on the Jenkins lint
precommit job.

On Fri, Feb 7, 2020 at 12:59 PM David Yan <da...@google.com> wrote:

> Thank you Robert.
>
> https://github.com/google/yapf/issues/530 has been open for 2 years, but
> we will use `yapf: disable` and `yapf: enable` as a workaround for now.
>
> David
>
> On Fri, Feb 7, 2020 at 12:29 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> Yeah, that's a lot worse. This looks like
>> https://github.com/google/yapf/issues/530 . In the meantime,
>> https://pypi.org/project/yapf/#potentially-frequently-asked-questions
>>
>> On Fri, Feb 7, 2020 at 12:17 PM David Yan <da...@google.com> wrote:
>> >
>> > Hi, I just tried out the yapf formatter and I noticed that sometimes
>> it's making the original code a lot less readable.
>> > In the below example, - is the original, + is after running the yapf
>> formatter. Looks like the problem is with the method chaining pattern.
>> > How feasible is it to have yapf identify such a pattern and format it
>> better?
>> > Before this can be fixed, Is it possible to have a directive in the
>> code comment to bypass yapf?
>> >
>> > Thanks!
>> >
>> > -    test_stream = (TestStream()
>> > -                   .advance_watermark_to(0)
>> > -                   .add_elements(['a', 'b', 'c'])
>> > -                   .advance_processing_time(1)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(5)
>> > -                   .add_elements(['1', '2', '3'])
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(6)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(7)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(8)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(9)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(10)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(11)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(12)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(13)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(14)
>> > -                   .advance_processing_time(1)
>> > -                   .advance_watermark_to(15)
>> > -                   .advance_processing_time(1)
>> > -                   )
>> > +    test_stream = (
>> > +        TestStream().advance_watermark_to(0).add_elements(
>> > +            ['a', 'b',
>> 'c']).advance_processing_time(1).advance_processing_time(
>> > +
>> 1).advance_processing_time(1).advance_processing_time(1).
>> > +
>> advance_processing_time(1).advance_watermark_to(5).add_elements(
>> > +            ['1', '2',
>> '3']).advance_processing_time(1).advance_watermark_to(
>> > +                6).advance_processing_time(1).advance_watermark_to(
>> > +                    7).advance_processing_time(1).advance_watermark_to(
>> > +
>> 8).advance_processing_time(1).advance_watermark_to(9).
>> > +        advance_processing_time(1).advance_watermark_to(
>> > +            10).advance_processing_time(1).advance_watermark_to(
>> > +                11).advance_processing_time(1).advance_watermark_to(
>> > +
>> 12).advance_processing_time(1).advance_watermark_to(
>> > +
>> 13).advance_processing_time(1).advance_watermark_to(
>> > +
>> 14).advance_processing_time(1).advance_watermark_to(
>> > +                                15).advance_processing_time(1))
>> >
>> > On Thu, Feb 6, 2020 at 1:50 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >>
>> >> Thanks!
>> >>
>> >> On Thu, Feb 6, 2020 at 1:29 PM Ismaël Mejía <ie...@gmail.com> wrote:
>> >>>
>> >>> Thanks Kamil and Michał for taking care of this.
>> >>> Excellent job!
>> >>>
>> >>> On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski <
>> kamil.wasilewski@polidea.com> wrote:
>> >>>>
>> >>>> Thanks to everyone involved in the discussion.
>> >>>>
>> >>>> I've taken a look at the first 50 recently updated Pull Requests.
>> Only few of them were affected. I hope it wouldn't be too hard to fix them.
>> >>>>
>> >>>> In any case, here you can find instructions on how to run formatter:
>> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips (section
>> "Formatting").
>> >>>>
>> >>>> On Thu, Feb 6, 2020 at 12:42 PM Michał Walenia <
>> michal.walenia@polidea.com> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>> the PR is merged, all checks were green :)
>> >>>>> Enjoy prettier Python!
>> >>>>>
>> >>>>> On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>> Agree no need for vote for this because the consensus is clear and
>> the sole
>> >>>>>> impact I can think of are pending PRs that will be broken. In the
>> Java case
>> >>>>>> what we did was to just notice every PR that was affected by the
>> change.
>> >>>>>> And clearly document how to validate and autoformat the code.
>> >>>>>>
>> >>>>>> So the earlier the better, go go autoformat!
>> >>>>>>
>> >>>>>> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>>>
>> >>>>>>> No, perhaps not. I agree there's consensus, just wondering what
>> the
>> >>>>>>> next steps should be to get this in. (The presubmits look like
>> they're
>> >>>>>>> all passing, with the exception of some breakage in java that
>> should
>> >>>>>>> be completely unrelated. Of course there's already merge
>> conflicts...)
>> >>>>>>>
>> >>>>>>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com>
>> wrote:
>> >>>>>>> >
>> >>>>>>> > Do we need a formal vote? There is consensus on this thread and
>> on the PR.
>> >>>>>>> >
>> >>>>>>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>>> >>
>> >>>>>>> >> The PR is looking good. Should we call a vote?
>> >>>>>>> >>
>> >>>>>>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>>> >> >
>> >>>>>>> >> > Thanks. I commented on the PR. I think if we're going this
>> route we
>> >>>>>>> >> > should add a pre-commit, plus instructions on how to run the
>> tool
>> >>>>>>> >> > (similar to spotless).
>> >>>>>>> >> >
>> >>>>>>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com>
>> wrote:
>> >>>>>>> >> > >
>> >>>>>>> >> > > I've done a pass on the PR on code I'm familiar with.
>> >>>>>>> >> > > Please make a pass and add your suggestions on the PR.
>> >>>>>>> >> > >
>> >>>>>>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <
>> iemejia@gmail.com> wrote:
>> >>>>>>> >> > >>
>> >>>>>>> >> > >> Java build fails on any unformatted code so python
>> probably should be like that.
>> >>>>>>> >> > >> We have to ensure however that it fails early on that.
>> >>>>>>> >> > >> As Robert said time to debate the knobs :)
>> >>>>>>> >> > >>
>> >>>>>>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
>> kamil.wasilewski@polidea.com> wrote:
>> >>>>>>> >> > >>>
>> >>>>>>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684.
>> Please share your comments ;-) I've managed to reduce the impact a bit:
>> >>>>>>> >> > >>> 501 files changed, 18245 insertions(+), 19495
>> deletions(-)
>> >>>>>>> >> > >>>
>> >>>>>>> >> > >>> We still need to consider how to enforce the usage of
>> autoformatter. Pre-commit sounds like a nice addition, but it still needs
>> to be installed manually by a developer. On the other hand, Jenkins
>> precommit job that fails if any unformatted code is detected looks like too
>> strict. What do you think?
>> >>>>>>> >> > >>>
>> >>>>>>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>>> >> > >>>>
>> >>>>>>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
>> >>>>>>> >> > >>>>
>> >>>>>>> >> > >>>> FYI, I did a simple run (just pushed to
>> >>>>>>> >> > >>>>
>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
>> >>>>>>> >> > >>>> the impact. The diff is
>> >>>>>>> >> > >>>>
>> >>>>>>> >> > >>>>     $ git diff --stat master
>> >>>>>>> >> > >>>>     ...
>> >>>>>>> >> > >>>>      547 files changed, 22118 insertions(+), 21129
>> deletions(-)
>> >>>>>>> >> > >>>>
>> >>>>>>> >> > >>>> For reference
>> >>>>>>> >> > >>>>
>> >>>>>>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs
>> wc
>> >>>>>>> >> > >>>>     ...
>> >>>>>>> >> > >>>>     200424  612002 7431637 total
>> >>>>>>> >> > >>>>
>> >>>>>>> >> > >>>> which means a little over 10% of lines get touched. I
>> think there are
>> >>>>>>> >> > >>>> some options, such as
>> SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>> >>>>>>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style
>> we are already
>> >>>>>>> >> > >>>> (mostly) following.
>> >>>>>>> >> > >>>>
>> >>>>>>> >> > >>>>
>> >>>>>>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>> >>>>>>> >> > >>>> <ka...@polidea.com> wrote:
>> >>>>>>> >> > >>>> >
>> >>>>>>> >> > >>>> > Thank you Michał for creating the ticket. I have some
>> free time and I'd like to volunteer myself for this task.
>> >>>>>>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`,
>> so I'll try `yapf` first.
>> >>>>>>> >> > >>>> >
>> >>>>>>> >> > >>>> > Best,
>> >>>>>>> >> > >>>> > Kamil
>> >>>>>>> >> > >>>> >
>> >>>>>>> >> > >>>> >
>> >>>>>>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
>> michal.walenia@polidea.com> wrote:
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >> Hi all,
>> >>>>>>> >> > >>>> >> I created a JIRA issue for this and summarized the
>> available tools
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >> Cheers,
>> >>>>>>> >> > >>>> >> Michal
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <
>> ehudm@google.com> wrote:
>> >>>>>>> >> > >>>> >>>
>> >>>>>>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
>> >>>>>>> >> > >>>> >>>
>> >>>>>>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <
>> ehudm@google.com> wrote:
>> >>>>>>> >> > >>>> >>>>
>> >>>>>>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I
>> volunteer to take this on
>> >>>>>>> >> > >>>> >>>>
>> >>>>>>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <
>> ehudm@google.com> wrote:
>> >>>>>>> >> > >>>> >>>>>
>> >>>>>>> >> > >>>> >>>>> +1 to autoformatting
>> >>>>>>> >> > >>>> >>>>>
>> >>>>>>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
>> lcwik@google.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK
>> went through a one time pass to apply the spotless formatting.
>> >>>>>>> >> > >>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
>> altay@google.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>
>> >>>>>>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be
>> a well maintained project. I do support making a one time pass to apply
>> formatting the whole code base.
>> >>>>>>> >> > >>>> >>>>>>>
>> >>>>>>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
>> chadrik@gmail.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply
>> to violating (or at
>> >>>>>>> >> > >>>> >>>>>>>>> least changed) lines.
>> >>>>>>> >> > >>>> >>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert
>> all of the code in one go, since it’s a very safe operation. Did you have
>> something else in mind?
>> >>>>>>> >> > >>>> >>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>> -chad
>> >>>>>>> >> > >>>> >>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova
>> <ch...@gmail.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > +1 to autoformatting
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
>> formatters:  those which take the original formatting into consideration
>> (autopep8) and those which disregard it (yapf, black).
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you
>> have plenty of options to tweak with yapf (enough to make the output a
>> pretty close match to the current Beam style), and you can mark areas to
>> preserve the original formatting, which could be very useful with Pipeline
>> building with pipe operators.  Please don't pick black.
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of
>> spotless in Java -- it only corrects code that breaks the project's style
>> rules.  The big problem with Beam's current style is that it is so esoteric
>> that autopep8 can't enforce it -- and I'm not just talking about 2-spaces,
>> which I don't really have a problem with -- the problem is the use of
>> either 2 or 4 spaces depending on context (expression start vs hanging
>> indent, etc).  This is my *biggest* gripe about the current style.  PyCharm
>> doesn't have enough control either.  So, if we can choose a style that can
>> be expressed by flake8 or pycodestyle then we can use autopep8 to enforce
>> it.
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like
>> having a little wiggle room to influence the style, but on a big project
>> like Beam all that wiggle room ends up to minor but noticeable
>> inconsistencies in style throughout the project.  yapf ensures completely
>> consistent style, but the tradeoff is that it's sometimes ugly, especially
>> in scenarios with similar repeated entries like argparse, where yapf might
>> insert line breaks in visually inconsistent and unappealing ways depending
>> on the lengths of the keywords and expressions involved.
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose
>> yapf) I think it'd be a nice addition to setup a pre-commit [1] config so
>> that people can opt in to running *lightweight* autofixers prior to
>> commit.  This will not only reduce dev frustration but will also reduce the
>> amount of cpu cycles that Jenkins spends pointing out lint errors.
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > -chad
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël
>> Mejía <ie...@gmail.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>
>> >>>>>>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems
>> not to be much progress into autoformatting.
>> >>>>>>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe
>> it could be more appropriate for Beam's use case.
>> >>>>>>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
>> >>>>>>> >> > >>>> >>>>>>>>> >> WDYT?
>> >>>>>>> >> > >>>> >>>>>>>>> >>
>> >>>>>>> >> > >>>> >>>>>>>>> >>
>> >>>>>>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz
>> Gajowy <lg...@apache.org> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK
>> that does the job. My experience is that since spotless in Java SDK I would
>> never start a new Java project without it. So many great benefits not only
>> for one person coding but for all community.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot
>> easily browse past the reformat. It is not actually that hard, but does
>> take a couple extra clicks to get GitHub to display blame before a
>> reformat. It is easier with the command line. I do a lot of code history
>> digging and the global Java reformat is not really a problem.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github
>> but I agree it's not the best way to search the history. The most
>> convenient and clear one I've found so far is in Jetbrains IDEs (Intelij)
>> where you can:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate"
>> -> click again -> "annotate previous revision" -> ...
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see
>> the diff between two revisions.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>> Łukasz
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles
>> <ke...@apache.org> napisał(a):
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't
>> quite tell - seems there are some issues?)
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM
>> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is
>> faster PR review iteration. We will skip some of conversations about code
>> style.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>> ...
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may
>> be less discouraged. When I started contribute I didn’t know how to format
>> my code and I lost a lot of time to add pylint and adjust IntelliJ. I
>> eventually failed. Currently I write code intuitively and when I don’t
>> forget I rerun tox.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I
>> supported it so much for Java. It is a community benefit. You do not have
>> to be a contributor to the Python SDK to support this. That is why I am
>> writing here. Just eliminate all discussion of formatting. It doesn't
>> really matter what the resulting format is, if it is not crazy to read. I
>> strongly oppose maintaining a non-default format.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not
>> hard. The Java global reformat touched 50k lines. It does not really matter
>> how big it is. Definitely do it all at once if you think the tool is good
>> enough. And you should pin a version, so churn is not a problem. You can
>> upgrade the version and reformat in a PR later and that is also easy.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot
>> easily browse past the reformat. It is not actually that hard, but does
>> take a couple extra clicks to get GitHub to display blame before a
>> reformat. It is easier with the command line. I do a lot of code history
>> digging and the global Java reformat is not really a problem.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>> Kenn
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a
>> same way, so eventually it would be easier to read.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in
>> previous emails - a lot of Jenkins failures won’t take place, so we save
>> time and resources.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our
>> pipelines has custom syntax and after formatting they looks a little bit
>> weird, but maybe extending the only configurable option in Black - lines,
>> from 88 to 110 would be solution.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires
>> Python 3 to be run. I don’t know how big obstacle it would be.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it
>> would be possible to introduce Black. First: just do it, it will hurt but
>> then it would be ok (same as a dentist appointment). Of course it may
>> require some work to adjust linters. On the other hand we can do it
>> gradually and start including sdk parts one by one - maybe it will be less
>> painful?
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of
>> projects [2] I know that uses Black (they use also other cool checkers and
>> pre-commit [3]). This is how looks their build with all checks [4].
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want
>> improve our coding experience, we should improve our toolset. Black seems
>> be recent and quite popular tool what makes think they won’t stop
>> developing it.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [1]
>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [2]
>> https://github.com/GoogleCloudPlatform/oozie-to-airflow
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [4]
>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert
>> Bradshaw <ro...@google.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a
>> non-starter to me, as it would change nearly every single line in the
>> codebase (and the loss of all context as well as that particular line).
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork
>> exists. However, we don't conform to that either--we use 2 spaces for
>> indentation, but 4 for continuation indentation. (As for the history of
>> this, this goes back to Google's internal style guide, probably motivated
>> by consistency with C++, Java, ... and the fact that with an indent level
>> of 4 one ends up wrapping lines quite frequently (it's telling that black's
>> default line length is 88)). This turns out to be an easy change to the
>> codebase.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4
>> whitespace thing, I found that this tool introduces a huge amount of
>> vertical whitespace (e.g. closing parentheses on their own line), e.g.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> def foo(
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     args
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> ):
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>   if (
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>   ):
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     func(
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>         args
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     )
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put
>> closing parentheses on the same lines, as well as omit the newline after
>> "if (", and disabling formatting of strings, which reduce the churn in our
>> codebase to 15k lines (adding about 4k) out of 200k total.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> https://github.com/apache/beam/pull/8712/files
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in
>> different ways then me, and doesn't understand the semantics of the code,
>> but possibly something we could live with given the huge advantages of an
>> autoformatter.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to
>> allow, but not require, autoformatting of changed lines.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks
>> like it's got a decent number of contributors and in my book being in the
>> python github project is a strong positive signal. But, due to the above
>> issues, I think we'd have to maintain a fork. (The code is pretty
>> lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest
>> implemented as a post-processing step (for now, incomplete), so it'd be
>> easy to stay in sync with upstream.)
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM
>> Ismaël Mejía <ie...@gmail.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can
>> be configured in a way to fit our
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't
>> think it is feasible to reformat the
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what
>> we actually do because Black is
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the
>> standard python codestyle guidelines
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces
>> and is what most projects in the
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow
>> quick access to the Git history. This
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the
>> Java SDK. However, I have the feeling
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a
>> problem with Python because the linter has
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but
>> there are always tradeoffs we have
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM
>> Maximilian Michels <mx...@apache.org> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can
>> be configured in a way to fit our
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't
>> think it is feasible to reformat the
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow
>> quick access to the Git history. This
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the
>> Java SDK. However, I have the feeling
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a
>> problem with Python because the linter has
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía
>> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked
>> as beta with a big warning.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
>> person project. For the same reason I also strongly prefer not using a fork
>> for a specific setting. Fork will only have less people looking at it.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked
>> as beta because it is recent, it was
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and
>> because some things can change since
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty
>> tricky beasts, I think beta in that case is
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If
>> you look at the contribution page [1]
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less
>> and less a single person project, there
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent
>> contributions since the project became
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is
>> hosted in the python organization
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence
>> on the project continuity.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the
>> fact that the main author seems to be
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and
>> in the 2-spaces issue he can seem
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just
>> following pep8 style guide recommendations
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we
>> (Beam) do not follow the 4 spaces
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even
>> Google's own Python style guide [4],
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us
>> to reconsider the current policy to
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the
>> tool).
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with
>> python 2.7 compatibility on
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4
>> parsing errors which is positive given the
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files
>> left unchanged, 4 files failed to reformat.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:
>>  _display_progress = print
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>>    file=sys.stderr)
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>>  print(traceback_string, file=sys.stderr)
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:
>>  print('-->' if pc == last_pc else '    ',
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be
>> positive for the project but well I am
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the
>> python code base so I let you the python
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this,
>> in any case it seems like a good
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
>> https://github.com/python/black/graphs/contributors
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
>> https://www.python.org/dev/peps/pep-0008/#indentation
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM
>> Ahmet Altay <al...@google.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with
>> Robert, I am in favor of autoformatters but I am not familiar with this
>> one. My concerns are:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked
>> as beta with a big warning.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
>> person project. For the same reason I also strongly prefer not using a fork
>> for a specific setting. Fork will only have less people looking at it.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage
>> for us. That said lint issues are real as pointed in the thread. If someone
>> would like to give it a try and see how it would look like for us that
>> would be interesting.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
>> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot
>> of Jenkins jobs failures are caused by lint problems.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to
>> have something similar to Spotless in Java SDK (I heard there is problem
>> with configuring Black with IntelliJ).
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52
>> PM Robert Bradshaw <ro...@google.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
>> autoformatters, though I haven't looked at
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one
>> works. We might have to go with
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> https://github.com/desbma/black-2spaces given
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> https://github.com/python/black/issues/378 .
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43
>> PM Pablo Estrada <pa...@google.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I
>> know at least a couple people (myself included) who've been annoyed by
>> having to take care of lint issues that maybe a code formatter could save
>> us.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24
>> PM Ismaël Mejía <ie...@gmail.com> wrote:
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into
>> Black [1] a python code auto formatter that
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
>> auto-formatter for python, and wanted to
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there
>> interest from the python people to get this
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of
>> spotless for Java has been a good improvement and
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base
>> may benefit of this too.
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1]
>> https://github.com/python/black
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >> --
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >> Michał Walenia
>> >>>>>>> >> > >>>> >> Polidea | Software Engineer
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >> M: +48 791 432 002 <+48%20791%20432%20002>
>> >>>>>>> >> > >>>> >> E: michal.walenia@polidea.com
>> >>>>>>> >> > >>>> >>
>> >>>>>>> >> > >>>> >> Unique Tech
>> >>>>>>> >> > >>>> >> Check out our projects!
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Michał Walenia
>> >>>>> Polidea | Software Engineer
>> >>>>>
>> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
>> >>>>> E: michal.walenia@polidea.com
>> >>>>>
>> >>>>> Unique Tech
>> >>>>> Check out our projects!
>>
>>

Re: [DISCUSS] Autoformat python code with Black

Posted by David Yan <da...@google.com>.
Thank you Robert.

https://github.com/google/yapf/issues/530 has been open for 2 years, but we
will use `yapf: disable` and `yapf: enable` as a workaround for now.

David

On Fri, Feb 7, 2020 at 12:29 PM Robert Bradshaw <ro...@google.com> wrote:

> Yeah, that's a lot worse. This looks like
> https://github.com/google/yapf/issues/530 . In the meantime,
> https://pypi.org/project/yapf/#potentially-frequently-asked-questions
>
> On Fri, Feb 7, 2020 at 12:17 PM David Yan <da...@google.com> wrote:
> >
> > Hi, I just tried out the yapf formatter and I noticed that sometimes
> it's making the original code a lot less readable.
> > In the below example, - is the original, + is after running the yapf
> formatter. Looks like the problem is with the method chaining pattern.
> > How feasible is it to have yapf identify such a pattern and format it
> better?
> > Before this can be fixed, Is it possible to have a directive in the code
> comment to bypass yapf?
> >
> > Thanks!
> >
> > -    test_stream = (TestStream()
> > -                   .advance_watermark_to(0)
> > -                   .add_elements(['a', 'b', 'c'])
> > -                   .advance_processing_time(1)
> > -                   .advance_processing_time(1)
> > -                   .advance_processing_time(1)
> > -                   .advance_processing_time(1)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(5)
> > -                   .add_elements(['1', '2', '3'])
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(6)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(7)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(8)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(9)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(10)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(11)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(12)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(13)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(14)
> > -                   .advance_processing_time(1)
> > -                   .advance_watermark_to(15)
> > -                   .advance_processing_time(1)
> > -                   )
> > +    test_stream = (
> > +        TestStream().advance_watermark_to(0).add_elements(
> > +            ['a', 'b',
> 'c']).advance_processing_time(1).advance_processing_time(
> > +
> 1).advance_processing_time(1).advance_processing_time(1).
> > +        advance_processing_time(1).advance_watermark_to(5).add_elements(
> > +            ['1', '2',
> '3']).advance_processing_time(1).advance_watermark_to(
> > +                6).advance_processing_time(1).advance_watermark_to(
> > +                    7).advance_processing_time(1).advance_watermark_to(
> > +
> 8).advance_processing_time(1).advance_watermark_to(9).
> > +        advance_processing_time(1).advance_watermark_to(
> > +            10).advance_processing_time(1).advance_watermark_to(
> > +                11).advance_processing_time(1).advance_watermark_to(
> > +                    12).advance_processing_time(1).advance_watermark_to(
> > +
> 13).advance_processing_time(1).advance_watermark_to(
> > +
> 14).advance_processing_time(1).advance_watermark_to(
> > +                                15).advance_processing_time(1))
> >
> > On Thu, Feb 6, 2020 at 1:50 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> Thanks!
> >>
> >> On Thu, Feb 6, 2020 at 1:29 PM Ismaël Mejía <ie...@gmail.com> wrote:
> >>>
> >>> Thanks Kamil and Michał for taking care of this.
> >>> Excellent job!
> >>>
> >>> On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
> >>>>
> >>>> Thanks to everyone involved in the discussion.
> >>>>
> >>>> I've taken a look at the first 50 recently updated Pull Requests.
> Only few of them were affected. I hope it wouldn't be too hard to fix them.
> >>>>
> >>>> In any case, here you can find instructions on how to run formatter:
> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips (section
> "Formatting").
> >>>>
> >>>> On Thu, Feb 6, 2020 at 12:42 PM Michał Walenia <
> michal.walenia@polidea.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>> the PR is merged, all checks were green :)
> >>>>> Enjoy prettier Python!
> >>>>>
> >>>>> On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> Agree no need for vote for this because the consensus is clear and
> the sole
> >>>>>> impact I can think of are pending PRs that will be broken. In the
> Java case
> >>>>>> what we did was to just notice every PR that was affected by the
> change.
> >>>>>> And clearly document how to validate and autoformat the code.
> >>>>>>
> >>>>>> So the earlier the better, go go autoformat!
> >>>>>>
> >>>>>> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com>
> wrote:
> >>>>>>>
> >>>>>>> No, perhaps not. I agree there's consensus, just wondering what the
> >>>>>>> next steps should be to get this in. (The presubmits look like
> they're
> >>>>>>> all passing, with the exception of some breakage in java that
> should
> >>>>>>> be completely unrelated. Of course there's already merge
> conflicts...)
> >>>>>>>
> >>>>>>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com>
> wrote:
> >>>>>>> >
> >>>>>>> > Do we need a formal vote? There is consensus on this thread and
> on the PR.
> >>>>>>> >
> >>>>>>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>> >>
> >>>>>>> >> The PR is looking good. Should we call a vote?
> >>>>>>> >>
> >>>>>>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>> >> >
> >>>>>>> >> > Thanks. I commented on the PR. I think if we're going this
> route we
> >>>>>>> >> > should add a pre-commit, plus instructions on how to run the
> tool
> >>>>>>> >> > (similar to spotless).
> >>>>>>> >> >
> >>>>>>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com>
> wrote:
> >>>>>>> >> > >
> >>>>>>> >> > > I've done a pass on the PR on code I'm familiar with.
> >>>>>>> >> > > Please make a pass and add your suggestions on the PR.
> >>>>>>> >> > >
> >>>>>>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> >>>>>>> >> > >>
> >>>>>>> >> > >> Java build fails on any unformatted code so python
> probably should be like that.
> >>>>>>> >> > >> We have to ensure however that it fails early on that.
> >>>>>>> >> > >> As Robert said time to debate the knobs :)
> >>>>>>> >> > >>
> >>>>>>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
> >>>>>>> >> > >>>
> >>>>>>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684.
> Please share your comments ;-) I've managed to reduce the impact a bit:
> >>>>>>> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
> >>>>>>> >> > >>>
> >>>>>>> >> > >>> We still need to consider how to enforce the usage of
> autoformatter. Pre-commit sounds like a nice addition, but it still needs
> to be installed manually by a developer. On the other hand, Jenkins
> precommit job that fails if any unformatted code is detected looks like too
> strict. What do you think?
> >>>>>>> >> > >>>
> >>>>>>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> FYI, I did a simple run (just pushed to
> >>>>>>> >> > >>>>
> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
> >>>>>>> >> > >>>> the impact. The diff is
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>>     $ git diff --stat master
> >>>>>>> >> > >>>>     ...
> >>>>>>> >> > >>>>      547 files changed, 22118 insertions(+), 21129
> deletions(-)
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> For reference
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs
> wc
> >>>>>>> >> > >>>>     ...
> >>>>>>> >> > >>>>     200424  612002 7431637 total
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> which means a little over 10% of lines get touched. I
> think there are
> >>>>>>> >> > >>>> some options, such as
> SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
> >>>>>>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style
> we are already
> >>>>>>> >> > >>>> (mostly) following.
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>>
> >>>>>>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
> >>>>>>> >> > >>>> <ka...@polidea.com> wrote:
> >>>>>>> >> > >>>> >
> >>>>>>> >> > >>>> > Thank you Michał for creating the ticket. I have some
> free time and I'd like to volunteer myself for this task.
> >>>>>>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so
> I'll try `yapf` first.
> >>>>>>> >> > >>>> >
> >>>>>>> >> > >>>> > Best,
> >>>>>>> >> > >>>> > Kamil
> >>>>>>> >> > >>>> >
> >>>>>>> >> > >>>> >
> >>>>>>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
> michal.walenia@polidea.com> wrote:
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> Hi all,
> >>>>>>> >> > >>>> >> I created a JIRA issue for this and summarized the
> available tools
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> Cheers,
> >>>>>>> >> > >>>> >> Michal
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <
> ehudm@google.com> wrote:
> >>>>>>> >> > >>>> >>>
> >>>>>>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
> >>>>>>> >> > >>>> >>>
> >>>>>>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <
> ehudm@google.com> wrote:
> >>>>>>> >> > >>>> >>>>
> >>>>>>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I
> volunteer to take this on
> >>>>>>> >> > >>>> >>>>
> >>>>>>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <
> ehudm@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>
> >>>>>>> >> > >>>> >>>>> +1 to autoformatting
> >>>>>>> >> > >>>> >>>>>
> >>>>>>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
> lcwik@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went
> through a one time pass to apply the spotless formatting.
> >>>>>>> >> > >>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
> altay@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>
> >>>>>>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be
> a well maintained project. I do support making a one time pass to apply
> formatting the whole code base.
> >>>>>>> >> > >>>> >>>>>>>
> >>>>>>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
> chadrik@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply
> to violating (or at
> >>>>>>> >> > >>>> >>>>>>>>> least changed) lines.
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert
> all of the code in one go, since it’s a very safe operation. Did you have
> something else in mind?
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>> -chad
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
> chadrik@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > +1 to autoformatting
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
> formatters:  those which take the original formatting into consideration
> (autopep8) and those which disregard it (yapf, black).
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you
> have plenty of options to tweak with yapf (enough to make the output a
> pretty close match to the current Beam style), and you can mark areas to
> preserve the original formatting, which could be very useful with Pipeline
> building with pipe operators.  Please don't pick black.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless
> in Java -- it only corrects code that breaks the project's style rules.
> The big problem with Beam's current style is that it is so esoteric that
> autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which
> I don't really have a problem with -- the problem is the use of either 2 or
> 4 spaces depending on context (expression start vs hanging indent, etc).
> This is my *biggest* gripe about the current style.  PyCharm doesn't have
> enough control either.  So, if we can choose a style that can be expressed
> by flake8 or pycodestyle then we can use autopep8 to enforce it.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like
> having a little wiggle room to influence the style, but on a big project
> like Beam all that wiggle room ends up to minor but noticeable
> inconsistencies in style throughout the project.  yapf ensures completely
> consistent style, but the tradeoff is that it's sometimes ugly, especially
> in scenarios with similar repeated entries like argparse, where yapf might
> insert line breaks in visually inconsistent and unappealing ways depending
> on the lengths of the keywords and expressions involved.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose
> yapf) I think it'd be a nice addition to setup a pre-commit [1] config so
> that people can opt in to running *lightweight* autofixers prior to
> commit.  This will not only reduce dev frustration but will also reduce the
> amount of cpu cycles that Jenkins spends pointing out lint errors.
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > -chad
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël
> Mejía <ie...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>
> >>>>>>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not
> to be much progress into autoformatting.
> >>>>>>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it
> could be more appropriate for Beam's use case.
> >>>>>>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
> >>>>>>> >> > >>>> >>>>>>>>> >> WDYT?
> >>>>>>> >> > >>>> >>>>>>>>> >>
> >>>>>>> >> > >>>> >>>>>>>>> >>
> >>>>>>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz
> Gajowy <lg...@apache.org> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK
> that does the job. My experience is that since spotless in Java SDK I would
> never start a new Java project without it. So many great benefits not only
> for one person coding but for all community.
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot
> easily browse past the reformat. It is not actually that hard, but does
> take a couple extra clicks to get GitHub to display blame before a
> reformat. It is easier with the command line. I do a lot of code history
> digging and the global Java reformat is not really a problem.
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but
> I agree it's not the best way to search the history. The most convenient
> and clear one I've found so far is in Jetbrains IDEs (Intelij) where you
> can:
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate"
> -> click again -> "annotate previous revision" -> ...
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the
> diff between two revisions.
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> Łukasz
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
> kenn@apache.org> napisał(a):
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't
> quite tell - seems there are some issues?)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna
> Kucharczyk <ka...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is
> faster PR review iteration. We will skip some of conversations about code
> style.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> ...
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may
> be less discouraged. When I started contribute I didn’t know how to format
> my code and I lost a lot of time to add pylint and adjust IntelliJ. I
> eventually failed. Currently I write code intuitively and when I don’t
> forget I rerun tox.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I
> supported it so much for Java. It is a community benefit. You do not have
> to be a contributor to the Python SDK to support this. That is why I am
> writing here. Just eliminate all discussion of formatting. It doesn't
> really matter what the resulting format is, if it is not crazy to read. I
> strongly oppose maintaining a non-default format.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not
> hard. The Java global reformat touched 50k lines. It does not really matter
> how big it is. Definitely do it all at once if you think the tool is good
> enough. And you should pin a version, so churn is not a problem. You can
> upgrade the version and reformat in a PR later and that is also easy.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot
> easily browse past the reformat. It is not actually that hard, but does
> take a couple extra clicks to get GitHub to display blame before a
> reformat. It is easier with the command line. I do a lot of code history
> digging and the global Java reformat is not really a problem.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>> Kenn
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a
> same way, so eventually it would be easier to read.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in
> previous emails - a lot of Jenkins failures won’t take place, so we save
> time and resources.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our
> pipelines has custom syntax and after formatting they looks a little bit
> weird, but maybe extending the only configurable option in Black - lines,
> from 88 to 110 would be solution.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python
> 3 to be run. I don’t know how big obstacle it would be.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it
> would be possible to introduce Black. First: just do it, it will hurt but
> then it would be ok (same as a dentist appointment). Of course it may
> require some work to adjust linters. On the other hand we can do it
> gradually and start including sdk parts one by one - maybe it will be less
> painful?
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of
> projects [2] I know that uses Black (they use also other cool checkers and
> pre-commit [3]). This is how looks their build with all checks [4].
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want
> improve our coding experience, we should improve our toolset. Black seems
> be recent and quite popular tool what makes think they won’t stop
> developing it.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [1]
> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [2]
> https://github.com/GoogleCloudPlatform/oozie-to-airflow
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> [4]
> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert
> Bradshaw <ro...@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a
> non-starter to me, as it would change nearly every single line in the
> codebase (and the loss of all context as well as that particular line).
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork
> exists. However, we don't conform to that either--we use 2 spaces for
> indentation, but 4 for continuation indentation. (As for the history of
> this, this goes back to Google's internal style guide, probably motivated
> by consistency with C++, Java, ... and the fact that with an indent level
> of 4 one ends up wrapping lines quite frequently (it's telling that black's
> default line length is 88)). This turns out to be an easy change to the
> codebase.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4
> whitespace thing, I found that this tool introduces a huge amount of
> vertical whitespace (e.g. closing parentheses on their own line), e.g.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> def foo(
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     args
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> ):
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>   if (
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>   ):
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     func(
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>         args
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>     )
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put
> closing parentheses on the same lines, as well as omit the newline after
> "if (", and disabling formatting of strings, which reduce the churn in our
> codebase to 15k lines (adding about 4k) out of 200k total.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> https://github.com/apache/beam/pull/8712/files
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in
> different ways then me, and doesn't understand the semantics of the code,
> but possibly something we could live with given the huge advantages of an
> autoformatter.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to
> allow, but not require, autoformatting of changed lines.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks
> like it's got a decent number of contributors and in my book being in the
> python github project is a strong positive signal. But, due to the above
> issues, I think we'd have to maintain a fork. (The code is pretty
> lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest
> implemented as a post-processing step (for now, incomplete), so it'd be
> easy to stay in sync with upstream.)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël
> Mejía <ie...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can
> be configured in a way to fit our
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't
> think it is feasible to reformat the
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we
> actually do because Black is
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the
> standard python codestyle guidelines
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and
> is what most projects in the
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
> access to the Git history. This
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java
> SDK. However, I have the feeling
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a
> problem with Python because the linter has
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but
> there are always tradeoffs we have
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM
> Maximilian Michels <mx...@apache.org> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can
> be configured in a way to fit our
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't
> think it is feasible to reformat the
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
> access to the Git history. This
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java
> SDK. However, I have the feeling
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a
> problem with Python because the linter has
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía
> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked
> as beta with a big warning.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
> person project. For the same reason I also strongly prefer not using a fork
> for a specific setting. Fork will only have less people looking at it.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked
> as beta because it is recent, it was
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and
> because some things can change since
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky
> beasts, I think beta in that case is
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If
> you look at the contribution page [1]
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less
> and less a single person project, there
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent
> contributions since the project became
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is
> hosted in the python organization
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence
> on the project continuity.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the
> fact that the main author seems to be
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in
> the 2-spaces issue he can seem
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just
> following pep8 style guide recommendations
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we
> (Beam) do not follow the 4 spaces
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even
> Google's own Python style guide [4],
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us
> to reconsider the current policy to
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the
> tool).
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with
> python 2.7 compatibility on
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4
> parsing errors which is positive given the
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files
> left unchanged, 4 files failed to reformat.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:
>  _display_progress = print
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>    file=sys.stderr)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>  print(traceback_string, file=sys.stderr)
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:
>  print('-->' if pc == last_pc else '    ',
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be
> positive for the project but well I am
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the
> python code base so I let you the python
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this,
> in any case it seems like a good
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
> https://github.com/python/black/graphs/contributors
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
> https://www.python.org/dev/peps/pep-0008/#indentation
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM
> Ahmet Altay <al...@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with
> Robert, I am in favor of autoformatters but I am not familiar with this
> one. My concerns are:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked
> as beta with a big warning.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
> person project. For the same reason I also strongly prefer not using a fork
> for a specific setting. Fork will only have less people looking at it.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage
> for us. That said lint issues are real as pointed in the thread. If someone
> would like to give it a try and see how it would look like for us that
> would be interesting.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot
> of Jenkins jobs failures are caused by lint problems.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to
> have something similar to Spotless in Java SDK (I heard there is problem
> with configuring Black with IntelliJ).
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52
> PM Robert Bradshaw <ro...@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
> autoformatters, though I haven't looked at
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one
> works. We might have to go with
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> https://github.com/desbma/black-2spaces given
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> https://github.com/python/black/issues/378 .
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43
> PM Pablo Estrada <pa...@google.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I
> know at least a couple people (myself included) who've been annoyed by
> having to take care of lint issues that maybe a code formatter could save
> us.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24
> PM Ismaël Mejía <ie...@gmail.com> wrote:
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into
> Black [1] a python code auto formatter that
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
> auto-formatter for python, and wanted to
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there
> interest from the python people to get this
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless
> for Java has been a good improvement and
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base
> may benefit of this too.
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1]
> https://github.com/python/black
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> --
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> Michał Walenia
> >>>>>>> >> > >>>> >> Polidea | Software Engineer
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> M: +48 791 432 002 <+48%20791%20432%20002>
> >>>>>>> >> > >>>> >> E: michal.walenia@polidea.com
> >>>>>>> >> > >>>> >>
> >>>>>>> >> > >>>> >> Unique Tech
> >>>>>>> >> > >>>> >> Check out our projects!
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Michał Walenia
> >>>>> Polidea | Software Engineer
> >>>>>
> >>>>> M: +48 791 432 002 <+48%20791%20432%20002>
> >>>>> E: michal.walenia@polidea.com
> >>>>>
> >>>>> Unique Tech
> >>>>> Check out our projects!
>
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Robert Bradshaw <ro...@google.com>.
Yeah, that's a lot worse. This looks like
https://github.com/google/yapf/issues/530 . In the meantime,
https://pypi.org/project/yapf/#potentially-frequently-asked-questions

On Fri, Feb 7, 2020 at 12:17 PM David Yan <da...@google.com> wrote:
>
> Hi, I just tried out the yapf formatter and I noticed that sometimes it's making the original code a lot less readable.
> In the below example, - is the original, + is after running the yapf formatter. Looks like the problem is with the method chaining pattern.
> How feasible is it to have yapf identify such a pattern and format it better?
> Before this can be fixed, Is it possible to have a directive in the code comment to bypass yapf?
>
> Thanks!
>
> -    test_stream = (TestStream()
> -                   .advance_watermark_to(0)
> -                   .add_elements(['a', 'b', 'c'])
> -                   .advance_processing_time(1)
> -                   .advance_processing_time(1)
> -                   .advance_processing_time(1)
> -                   .advance_processing_time(1)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(5)
> -                   .add_elements(['1', '2', '3'])
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(6)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(7)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(8)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(9)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(10)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(11)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(12)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(13)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(14)
> -                   .advance_processing_time(1)
> -                   .advance_watermark_to(15)
> -                   .advance_processing_time(1)
> -                   )
> +    test_stream = (
> +        TestStream().advance_watermark_to(0).add_elements(
> +            ['a', 'b', 'c']).advance_processing_time(1).advance_processing_time(
> +                1).advance_processing_time(1).advance_processing_time(1).
> +        advance_processing_time(1).advance_watermark_to(5).add_elements(
> +            ['1', '2', '3']).advance_processing_time(1).advance_watermark_to(
> +                6).advance_processing_time(1).advance_watermark_to(
> +                    7).advance_processing_time(1).advance_watermark_to(
> +                        8).advance_processing_time(1).advance_watermark_to(9).
> +        advance_processing_time(1).advance_watermark_to(
> +            10).advance_processing_time(1).advance_watermark_to(
> +                11).advance_processing_time(1).advance_watermark_to(
> +                    12).advance_processing_time(1).advance_watermark_to(
> +                        13).advance_processing_time(1).advance_watermark_to(
> +                            14).advance_processing_time(1).advance_watermark_to(
> +                                15).advance_processing_time(1))
>
> On Thu, Feb 6, 2020 at 1:50 PM Robert Bradshaw <ro...@google.com> wrote:
>>
>> Thanks!
>>
>> On Thu, Feb 6, 2020 at 1:29 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>
>>> Thanks Kamil and Michał for taking care of this.
>>> Excellent job!
>>>
>>> On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski <ka...@polidea.com> wrote:
>>>>
>>>> Thanks to everyone involved in the discussion.
>>>>
>>>> I've taken a look at the first 50 recently updated Pull Requests. Only few of them were affected. I hope it wouldn't be too hard to fix them.
>>>>
>>>> In any case, here you can find instructions on how to run formatter: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips (section "Formatting").
>>>>
>>>> On Thu, Feb 6, 2020 at 12:42 PM Michał Walenia <mi...@polidea.com> wrote:
>>>>>
>>>>> Hi,
>>>>> the PR is merged, all checks were green :)
>>>>> Enjoy prettier Python!
>>>>>
>>>>> On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>
>>>>>> Agree no need for vote for this because the consensus is clear and the sole
>>>>>> impact I can think of are pending PRs that will be broken. In the Java case
>>>>>> what we did was to just notice every PR that was affected by the change.
>>>>>> And clearly document how to validate and autoformat the code.
>>>>>>
>>>>>> So the earlier the better, go go autoformat!
>>>>>>
>>>>>> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>>
>>>>>>> No, perhaps not. I agree there's consensus, just wondering what the
>>>>>>> next steps should be to get this in. (The presubmits look like they're
>>>>>>> all passing, with the exception of some breakage in java that should
>>>>>>> be completely unrelated. Of course there's already merge conflicts...)
>>>>>>>
>>>>>>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>> >
>>>>>>> > Do we need a formal vote? There is consensus on this thread and on the PR.
>>>>>>> >
>>>>>>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>> >>
>>>>>>> >> The PR is looking good. Should we call a vote?
>>>>>>> >>
>>>>>>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>> >> >
>>>>>>> >> > Thanks. I commented on the PR. I think if we're going this route we
>>>>>>> >> > should add a pre-commit, plus instructions on how to run the tool
>>>>>>> >> > (similar to spotless).
>>>>>>> >> >
>>>>>>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>> >> > >
>>>>>>> >> > > I've done a pass on the PR on code I'm familiar with.
>>>>>>> >> > > Please make a pass and add your suggestions on the PR.
>>>>>>> >> > >
>>>>>>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>> >> > >>
>>>>>>> >> > >> Java build fails on any unformatted code so python probably should be like that.
>>>>>>> >> > >> We have to ensure however that it fails early on that.
>>>>>>> >> > >> As Robert said time to debate the knobs :)
>>>>>>> >> > >>
>>>>>>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <ka...@polidea.com> wrote:
>>>>>>> >> > >>>
>>>>>>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684. Please share your comments ;-) I've managed to reduce the impact a bit:
>>>>>>> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>>>>>>> >> > >>>
>>>>>>> >> > >>> We still need to consider how to enforce the usage of autoformatter. Pre-commit sounds like a nice addition, but it still needs to be installed manually by a developer. On the other hand, Jenkins precommit job that fails if any unformatted code is detected looks like too strict. What do you think?
>>>>>>> >> > >>>
>>>>>>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> FYI, I did a simple run (just pushed to
>>>>>>> >> > >>>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
>>>>>>> >> > >>>> the impact. The diff is
>>>>>>> >> > >>>>
>>>>>>> >> > >>>>     $ git diff --stat master
>>>>>>> >> > >>>>     ...
>>>>>>> >> > >>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> For reference
>>>>>>> >> > >>>>
>>>>>>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>>>>>>> >> > >>>>     ...
>>>>>>> >> > >>>>     200424  612002 7431637 total
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> which means a little over 10% of lines get touched. I think there are
>>>>>>> >> > >>>> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>>>>>>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style we are already
>>>>>>> >> > >>>> (mostly) following.
>>>>>>> >> > >>>>
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>>>>>>> >> > >>>> <ka...@polidea.com> wrote:
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> > Thank you Michał for creating the ticket. I have some free time and I'd like to volunteer myself for this task.
>>>>>>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf` first.
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> > Best,
>>>>>>> >> > >>>> > Kamil
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> >
>>>>>>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <mi...@polidea.com> wrote:
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> Hi all,
>>>>>>> >> > >>>> >> I created a JIRA issue for this and summarized the available tools
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> Cheers,
>>>>>>> >> > >>>> >> Michal
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
>>>>>>> >> > >>>> >>>
>>>>>>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>>>>>>> >> > >>>> >>>>
>>>>>>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I volunteer to take this on
>>>>>>> >> > >>>> >>>>
>>>>>>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>> +1 to autoformatting
>>>>>>> >> > >>>> >>>>>
>>>>>>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>>>>>>> >> > >>>> >>>>>>
>>>>>>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one time pass to apply the spotless formatting.
>>>>>>> >> > >>>> >>>>>>
>>>>>>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>> >> > >>>> >>>>>>>
>>>>>>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained project. I do support making a one time pass to apply formatting the whole code base.
>>>>>>> >> > >>>> >>>>>>>
>>>>>>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to violating (or at
>>>>>>> >> > >>>> >>>>>>>>> least changed) lines.
>>>>>>> >> > >>>> >>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>
>>>>>>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all of the code in one go, since it’s a very safe operation. Did you have something else in mind?
>>>>>>> >> > >>>> >>>>>>>>
>>>>>>> >> > >>>> >>>>>>>> -chad
>>>>>>> >> > >>>> >>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>>
>>>>>>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > +1 to autoformatting
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of formatters:  those which take the original formatting into consideration (autopep8) and those which disregard it (yapf, black).
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have plenty of options to tweak with yapf (enough to make the output a pretty close match to the current Beam style), and you can mark areas to preserve the original formatting, which could be very useful with Pipeline building with pipe operators.  Please don't pick black.
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it only corrects code that breaks the project's style rules.  The big problem with Beam's current style is that it is so esoteric that autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which I don't really have a problem with -- the problem is the use of either 2 or 4 spaces depending on context (expression start vs hanging indent, etc).  This is my *biggest* gripe about the current style.  PyCharm doesn't have enough control either.  So, if we can choose a style that can be expressed by flake8 or pycodestyle then we can use autopep8 to enforce it.
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle room to influence the style, but on a big project like Beam all that wiggle room ends up to minor but noticeable inconsistencies in style throughout the project.  yapf ensures completely consistent style, but the tradeoff is that it's sometimes ugly, especially in scenarios with similar repeated entries like argparse, where yapf might insert line breaks in visually inconsistent and unappealing ways depending on the lengths of the keywords and expressions involved.
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be a nice addition to setup a pre-commit [1] config so that people can opt in to running *lightweight* autofixers prior to commit.  This will not only reduce dev frustration but will also reduce the amount of cpu cycles that Jenkins spends pointing out lint errors.
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > -chad
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>
>>>>>>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to be much progress into autoformatting.
>>>>>>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could be more appropriate for Beam's use case.
>>>>>>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
>>>>>>> >> > >>>> >>>>>>>>> >> WDYT?
>>>>>>> >> > >>>> >>>>>>>>> >>
>>>>>>> >> > >>>> >>>>>>>>> >>
>>>>>>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My experience is that since spotless in Java SDK I would never start a new Java project without it. So many great benefits not only for one person coding but for all community.
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I agree it's not the best way to search the history. The most convenient and clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" -> click again -> "annotate previous revision" -> ...
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the diff between two revisions.
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>> Łukasz
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there are some issues?)
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review iteration. We will skip some of conversations about code style.
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>> ...
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be less discouraged. When I started contribute I didn’t know how to format my code and I lost a lot of time to add pylint and adjust IntelliJ. I eventually failed. Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much for Java. It is a community benefit. You do not have to be a contributor to the Python SDK to support this. That is why I am writing here. Just eliminate all discussion of formatting. It doesn't really matter what the resulting format is, if it is not crazy to read. I strongly oppose maintaining a non-default format.
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat touched 50k lines. It does not really matter how big it is. Definitely do it all at once if you think the tool is good enough. And you should pin a version, so churn is not a problem. You can upgrade the version and reformat in a PR later and that is also easy.
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>> Kenn
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same way, so eventually it would be easier to read.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins failures won’t take place, so we save time and resources.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and after formatting they looks a little bit weird, but maybe extending the only configurable option in Black - lines, from 88 to 110 would be solution.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t know how big obstacle it would be.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it would be possible to introduce Black. First: just do it, it will hurt but then it would be ok (same as a dentist appointment). Of course it may require some work to adjust linters. On the other hand we can do it gradually and start including sdk parts one by one - maybe it will be less painful?
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I know that uses Black (they use also other cool checkers and pre-commit [3]). This is how looks their build with all checks [4].
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our coding experience, we should improve our toolset. Black seems be recent and quite popular tool what makes think they won’t stop developing it.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> [1] https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> [4] https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change nearly every single line in the codebase (and the loss of all context as well as that particular line).
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't conform to that either--we use 2 spaces for indentation, but 4 for continuation indentation. (As for the history of this, this goes back to Google's internal style guide, probably motivated by consistency with C++, Java, ... and the fact that with an indent level of 4 one ends up wrapping lines quite frequently (it's telling that black's default line length is 88)). This turns out to be an easy change to the codebase.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool introduces a huge amount of vertical whitespace (e.g. closing parentheses on their own line), e.g.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> def foo(
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>     args
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> ):
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>   if (
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>   ):
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>     func(
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>         args
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>     )
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on the same lines, as well as omit the newline after "if (", and disabling formatting of strings, which reduce the churn in our codebase to 15k lines (adding about 4k) out of 200k total.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in different ways then me, and doesn't understand the semantics of the code, but possibly something we could live with given the huge advantages of an autoformatter.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but not require, autoformatting of changed lines.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent number of contributors and in my book being in the python github project is a strong positive signal. But, due to the above issues, I think we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest implemented as a post-processing step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we actually do because Black is
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the standard python codestyle guidelines
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is recent, it was
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can change since
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that case is
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the contribution page [1]
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a single person project, there
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions since the project became
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python organization
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project continuity.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact that the main author seems to be
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide recommendations
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide [4],
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the current policy to
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is positive given the
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to reformat.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string, file=sys.stderr)
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '    ',
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for the project but well I am
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I let you the python
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [4] https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of autoformatters but I am not familiar with this one. My concerns are:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint issues are real as pointed in the thread. If someone would like to give it a try and see how it would look like for us that would be interesting.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are caused by lint problems.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have something similar to Spotless in Java SDK (I heard there is problem with configuring Black with IntelliJ).
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't looked at
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might have to go with
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <pa...@google.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple people (myself included) who've been annoyed by having to take care of lint issues that maybe a code formatter could save us.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto formatter that
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and wanted to
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python people to get this
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good improvement and
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> --
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> Michał Walenia
>>>>>>> >> > >>>> >> Polidea | Software Engineer
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> M: +48 791 432 002
>>>>>>> >> > >>>> >> E: michal.walenia@polidea.com
>>>>>>> >> > >>>> >>
>>>>>>> >> > >>>> >> Unique Tech
>>>>>>> >> > >>>> >> Check out our projects!
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Michał Walenia
>>>>> Polidea | Software Engineer
>>>>>
>>>>> M: +48 791 432 002
>>>>> E: michal.walenia@polidea.com
>>>>>
>>>>> Unique Tech
>>>>> Check out our projects!


Re: [DISCUSS] Autoformat python code with Black

Posted by David Yan <da...@google.com>.
Hi, I just tried out the yapf formatter and I noticed that sometimes it's
making the original code a lot less readable.
In the below example, - is the original, + is after running the yapf
formatter. Looks like the problem is with the method chaining pattern.
How feasible is it to have yapf identify such a pattern and format it
better?
Before this can be fixed, Is it possible to have a directive in the code
comment to bypass yapf?

Thanks!

-    test_stream = (TestStream()
-                   .advance_watermark_to(0)
-                   .add_elements(['a', 'b', 'c'])
-                   .advance_processing_time(1)
-                   .advance_processing_time(1)
-                   .advance_processing_time(1)
-                   .advance_processing_time(1)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(5)
-                   .add_elements(['1', '2', '3'])
-                   .advance_processing_time(1)
-                   .advance_watermark_to(6)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(7)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(8)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(9)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(10)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(11)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(12)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(13)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(14)
-                   .advance_processing_time(1)
-                   .advance_watermark_to(15)
-                   .advance_processing_time(1)
-                   )
+    test_stream = (
+        TestStream().advance_watermark_to(0).add_elements(
+            ['a', 'b',
'c']).advance_processing_time(1).advance_processing_time(
+                1).advance_processing_time(1).advance_processing_time(1).
+        advance_processing_time(1).advance_watermark_to(5).add_elements(
+            ['1', '2',
'3']).advance_processing_time(1).advance_watermark_to(
+                6).advance_processing_time(1).advance_watermark_to(
+                    7).advance_processing_time(1).advance_watermark_to(
+
 8).advance_processing_time(1).advance_watermark_to(9).
+        advance_processing_time(1).advance_watermark_to(
+            10).advance_processing_time(1).advance_watermark_to(
+                11).advance_processing_time(1).advance_watermark_to(
+                    12).advance_processing_time(1).advance_watermark_to(
+
 13).advance_processing_time(1).advance_watermark_to(
+
 14).advance_processing_time(1).advance_watermark_to(
+                                15).advance_processing_time(1))

On Thu, Feb 6, 2020 at 1:50 PM Robert Bradshaw <ro...@google.com> wrote:

> Thanks!
>
> On Thu, Feb 6, 2020 at 1:29 PM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Thanks Kamil and Michał for taking care of this.
>> Excellent job!
>>
>> On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski <
>> kamil.wasilewski@polidea.com> wrote:
>>
>>> Thanks to everyone involved in the discussion.
>>>
>>> I've taken a look at the first 50 recently updated Pull Requests. Only
>>> few of them were affected. I hope it wouldn't be too hard to fix them.
>>>
>>> In any case, here you can find instructions on how to run formatter:
>>> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips (section
>>> "Formatting").
>>>
>>> On Thu, Feb 6, 2020 at 12:42 PM Michał Walenia <
>>> michal.walenia@polidea.com> wrote:
>>>
>>>> Hi,
>>>> the PR is merged, all checks were green :)
>>>> Enjoy prettier Python!
>>>>
>>>> On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>
>>>>> Agree no need for vote for this because the consensus is clear and the
>>>>> sole
>>>>> impact I can think of are pending PRs that will be broken. In the Java
>>>>> case
>>>>> what we did was to just notice every PR that was affected by the
>>>>> change.
>>>>> And clearly document how to validate and autoformat the code.
>>>>>
>>>>> So the earlier the better, go go autoformat!
>>>>>
>>>>> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> No, perhaps not. I agree there's consensus, just wondering what the
>>>>>> next steps should be to get this in. (The presubmits look like they're
>>>>>> all passing, with the exception of some breakage in java that should
>>>>>> be completely unrelated. Of course there's already merge conflicts...)
>>>>>>
>>>>>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com> wrote:
>>>>>> >
>>>>>> > Do we need a formal vote? There is consensus on this thread and on
>>>>>> the PR.
>>>>>> >
>>>>>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> The PR is looking good. Should we call a vote?
>>>>>> >>
>>>>>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <
>>>>>> robertwb@google.com> wrote:
>>>>>> >> >
>>>>>> >> > Thanks. I commented on the PR. I think if we're going this route
>>>>>> we
>>>>>> >> > should add a pre-commit, plus instructions on how to run the tool
>>>>>> >> > (similar to spotless).
>>>>>> >> >
>>>>>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com>
>>>>>> wrote:
>>>>>> >> > >
>>>>>> >> > > I've done a pass on the PR on code I'm familiar with.
>>>>>> >> > > Please make a pass and add your suggestions on the PR.
>>>>>> >> > >
>>>>>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <
>>>>>> iemejia@gmail.com> wrote:
>>>>>> >> > >>
>>>>>> >> > >> Java build fails on any unformatted code so python probably
>>>>>> should be like that.
>>>>>> >> > >> We have to ensure however that it fails early on that.
>>>>>> >> > >> As Robert said time to debate the knobs :)
>>>>>> >> > >>
>>>>>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
>>>>>> kamil.wasilewski@polidea.com> wrote:
>>>>>> >> > >>>
>>>>>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684.
>>>>>> Please share your comments ;-) I've managed to reduce the impact a bit:
>>>>>> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>>>>>> >> > >>>
>>>>>> >> > >>> We still need to consider how to enforce the usage of
>>>>>> autoformatter. Pre-commit sounds like a nice addition, but it still needs
>>>>>> to be installed manually by a developer. On the other hand, Jenkins
>>>>>> precommit job that fails if any unformatted code is detected looks like too
>>>>>> strict. What do you think?
>>>>>> >> > >>>
>>>>>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
>>>>>> robertwb@google.com> wrote:
>>>>>> >> > >>>>
>>>>>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
>>>>>> >> > >>>>
>>>>>> >> > >>>> FYI, I did a simple run (just pushed to
>>>>>> >> > >>>>
>>>>>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
>>>>>> >> > >>>> the impact. The diff is
>>>>>> >> > >>>>
>>>>>> >> > >>>>     $ git diff --stat master
>>>>>> >> > >>>>     ...
>>>>>> >> > >>>>      547 files changed, 22118 insertions(+), 21129
>>>>>> deletions(-)
>>>>>> >> > >>>>
>>>>>> >> > >>>> For reference
>>>>>> >> > >>>>
>>>>>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>>>>>> >> > >>>>     ...
>>>>>> >> > >>>>     200424  612002 7431637 total
>>>>>> >> > >>>>
>>>>>> >> > >>>> which means a little over 10% of lines get touched. I think
>>>>>> there are
>>>>>> >> > >>>> some options, such as
>>>>>> SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>>>>>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style we
>>>>>> are already
>>>>>> >> > >>>> (mostly) following.
>>>>>> >> > >>>>
>>>>>> >> > >>>>
>>>>>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>>>>>> >> > >>>> <ka...@polidea.com> wrote:
>>>>>> >> > >>>> >
>>>>>> >> > >>>> > Thank you Michał for creating the ticket. I have some
>>>>>> free time and I'd like to volunteer myself for this task.
>>>>>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so
>>>>>> I'll try `yapf` first.
>>>>>> >> > >>>> >
>>>>>> >> > >>>> > Best,
>>>>>> >> > >>>> > Kamil
>>>>>> >> > >>>> >
>>>>>> >> > >>>> >
>>>>>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
>>>>>> michal.walenia@polidea.com> wrote:
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >> Hi all,
>>>>>> >> > >>>> >> I created a JIRA issue for this and summarized the
>>>>>> available tools
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >> Cheers,
>>>>>> >> > >>>> >> Michal
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <
>>>>>> ehudm@google.com> wrote:
>>>>>> >> > >>>> >>>
>>>>>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
>>>>>> >> > >>>> >>>
>>>>>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <
>>>>>> ehudm@google.com> wrote:
>>>>>> >> > >>>> >>>>
>>>>>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I
>>>>>> volunteer to take this on
>>>>>> >> > >>>> >>>>
>>>>>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <
>>>>>> ehudm@google.com> wrote:
>>>>>> >> > >>>> >>>>>
>>>>>> >> > >>>> >>>>> +1 to autoformatting
>>>>>> >> > >>>> >>>>>
>>>>>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
>>>>>> lcwik@google.com> wrote:
>>>>>> >> > >>>> >>>>>>
>>>>>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went
>>>>>> through a one time pass to apply the spotless formatting.
>>>>>> >> > >>>> >>>>>>
>>>>>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
>>>>>> altay@google.com> wrote:
>>>>>> >> > >>>> >>>>>>>
>>>>>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a
>>>>>> well maintained project. I do support making a one time pass to apply
>>>>>> formatting the whole code base.
>>>>>> >> > >>>> >>>>>>>
>>>>>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
>>>>>> chadrik@gmail.com> wrote:
>>>>>> >> > >>>> >>>>>>>>>
>>>>>> >> > >>>> >>>>>>>>>
>>>>>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to
>>>>>> violating (or at
>>>>>> >> > >>>> >>>>>>>>> least changed) lines.
>>>>>> >> > >>>> >>>>>>>>
>>>>>> >> > >>>> >>>>>>>>
>>>>>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all
>>>>>> of the code in one go, since it’s a very safe operation. Did you have
>>>>>> something else in mind?
>>>>>> >> > >>>> >>>>>>>>
>>>>>> >> > >>>> >>>>>>>> -chad
>>>>>> >> > >>>> >>>>>>>>
>>>>>> >> > >>>> >>>>>>>>
>>>>>> >> > >>>> >>>>>>>>
>>>>>> >> > >>>> >>>>>>>>
>>>>>> >> > >>>> >>>>>>>>>
>>>>>> >> > >>>> >>>>>>>>>
>>>>>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
>>>>>> chadrik@gmail.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > +1 to autoformatting
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
>>>>>> formatters:  those which take the original formatting into consideration
>>>>>> (autopep8) and those which disregard it (yapf, black).
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have
>>>>>> plenty of options to tweak with yapf (enough to make the output a pretty
>>>>>> close match to the current Beam style), and you can mark areas to preserve
>>>>>> the original formatting, which could be very useful with Pipeline building
>>>>>> with pipe operators.  Please don't pick black.
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in
>>>>>> Java -- it only corrects code that breaks the project's style rules.  The
>>>>>> big problem with Beam's current style is that it is so esoteric that
>>>>>> autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which
>>>>>> I don't really have a problem with -- the problem is the use of either 2 or
>>>>>> 4 spaces depending on context (expression start vs hanging indent, etc).
>>>>>> This is my *biggest* gripe about the current style.  PyCharm doesn't have
>>>>>> enough control either.  So, if we can choose a style that can be expressed
>>>>>> by flake8 or pycodestyle then we can use autopep8 to enforce it.
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like
>>>>>> having a little wiggle room to influence the style, but on a big project
>>>>>> like Beam all that wiggle room ends up to minor but noticeable
>>>>>> inconsistencies in style throughout the project.  yapf ensures completely
>>>>>> consistent style, but the tradeoff is that it's sometimes ugly, especially
>>>>>> in scenarios with similar repeated entries like argparse, where yapf might
>>>>>> insert line breaks in visually inconsistent and unappealing ways depending
>>>>>> on the lengths of the keywords and expressions involved.
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I
>>>>>> think it'd be a nice addition to setup a pre-commit [1] config so that
>>>>>> people can opt in to running *lightweight* autofixers prior to commit.
>>>>>> This will not only reduce dev frustration but will also reduce the amount
>>>>>> of cpu cycles that Jenkins spends pointing out lint errors.
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > -chad
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >
>>>>>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
>>>>>> iemejia@gmail.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>
>>>>>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to
>>>>>> be much progress into autoformatting.
>>>>>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it
>>>>>> could be more appropriate for Beam's use case.
>>>>>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
>>>>>> >> > >>>> >>>>>>>>> >> WDYT?
>>>>>> >> > >>>> >>>>>>>>> >>
>>>>>> >> > >>>> >>>>>>>>> >>
>>>>>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy
>>>>>> <lg...@apache.org> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that
>>>>>> does the job. My experience is that since spotless in Java SDK I would
>>>>>> never start a new Java project without it. So many great benefits not only
>>>>>> for one person coding but for all community.
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot
>>>>>> easily browse past the reformat. It is not actually that hard, but does
>>>>>> take a couple extra clicks to get GitHub to display blame before a
>>>>>> reformat. It is easier with the command line. I do a lot of code history
>>>>>> digging and the global Java reformat is not really a problem.
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I
>>>>>> agree it's not the best way to search the history. The most convenient and
>>>>>> clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" ->
>>>>>> click again -> "annotate previous revision" -> ...
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the
>>>>>> diff between two revisions.
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>> Łukasz
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>>
>>>>>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
>>>>>> kenn@apache.org> napisał(a):
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't
>>>>>> quite tell - seems there are some issues?)
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna
>>>>>> Kucharczyk <ka...@gmail.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is
>>>>>> faster PR review iteration. We will skip some of conversations about code
>>>>>> style.
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>> ...
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be
>>>>>> less discouraged. When I started contribute I didn’t know how to format my
>>>>>> code and I lost a lot of time to add pylint and adjust IntelliJ. I
>>>>>> eventually failed. Currently I write code intuitively and when I don’t
>>>>>> forget I rerun tox.
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I
>>>>>> supported it so much for Java. It is a community benefit. You do not have
>>>>>> to be a contributor to the Python SDK to support this. That is why I am
>>>>>> writing here. Just eliminate all discussion of formatting. It doesn't
>>>>>> really matter what the resulting format is, if it is not crazy to read. I
>>>>>> strongly oppose maintaining a non-default format.
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard.
>>>>>> The Java global reformat touched 50k lines. It does not really matter how
>>>>>> big it is. Definitely do it all at once if you think the tool is good
>>>>>> enough. And you should pin a version, so churn is not a problem. You can
>>>>>> upgrade the version and reformat in a PR later and that is also easy.
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot
>>>>>> easily browse past the reformat. It is not actually that hard, but does
>>>>>> take a couple extra clicks to get GitHub to display blame before a
>>>>>> reformat. It is easier with the command line. I do a lot of code history
>>>>>> digging and the global Java reformat is not really a problem.
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>> Kenn
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same
>>>>>> way, so eventually it would be easier to read.
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous
>>>>>> emails - a lot of Jenkins failures won’t take place, so we save time and
>>>>>> resources.
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines
>>>>>> has custom syntax and after formatting they looks a little bit weird, but
>>>>>> maybe extending the only configurable option in Black - lines, from 88 to
>>>>>> 110 would be solution.
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3
>>>>>> to be run. I don’t know how big obstacle it would be.
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it
>>>>>> would be possible to introduce Black. First: just do it, it will hurt but
>>>>>> then it would be ok (same as a dentist appointment). Of course it may
>>>>>> require some work to adjust linters. On the other hand we can do it
>>>>>> gradually and start including sdk parts one by one - maybe it will be less
>>>>>> painful?
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects
>>>>>> [2] I know that uses Black (they use also other cool checkers and
>>>>>> pre-commit [3]). This is how looks their build with all checks [4].
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve
>>>>>> our coding experience, we should improve our toolset. Black seems be recent
>>>>>> and quite popular tool what makes think they won’t stop developing it.
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> [1]
>>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> [2]
>>>>>> https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> [4]
>>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert
>>>>>> Bradshaw <ro...@google.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a
>>>>>> non-starter to me, as it would change nearly every single line in the
>>>>>> codebase (and the loss of all context as well as that particular line).
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork
>>>>>> exists. However, we don't conform to that either--we use 2 spaces for
>>>>>> indentation, but 4 for continuation indentation. (As for the history of
>>>>>> this, this goes back to Google's internal style guide, probably motivated
>>>>>> by consistency with C++, Java, ... and the fact that with an indent level
>>>>>> of 4 one ends up wrapping lines quite frequently (it's telling that black's
>>>>>> default line length is 88)). This turns out to be an easy change to the
>>>>>> codebase.
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace
>>>>>> thing, I found that this tool introduces a huge amount of vertical
>>>>>> whitespace (e.g. closing parentheses on their own line), e.g.
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> def foo(
>>>>>> >> > >>>> >>>>>>>>> >>>>>>     args
>>>>>> >> > >>>> >>>>>>>>> >>>>>> ):
>>>>>> >> > >>>> >>>>>>>>> >>>>>>   if (
>>>>>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
>>>>>> >> > >>>> >>>>>>>>> >>>>>>   ):
>>>>>> >> > >>>> >>>>>>>>> >>>>>>     func(
>>>>>> >> > >>>> >>>>>>>>> >>>>>>         args
>>>>>> >> > >>>> >>>>>>>>> >>>>>>     )
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put
>>>>>> closing parentheses on the same lines, as well as omit the newline after
>>>>>> "if (", and disabling formatting of strings, which reduce the churn in our
>>>>>> codebase to 15k lines (adding about 4k) out of 200k total.
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in
>>>>>> different ways then me, and doesn't understand the semantics of the code,
>>>>>> but possibly something we could live with given the huge advantages of an
>>>>>> autoformatter.
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow,
>>>>>> but not require, autoformatting of changed lines.
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like
>>>>>> it's got a decent number of contributors and in my book being in the python
>>>>>> github project is a strong positive signal. But, due to the above issues, I
>>>>>> think we'd have to maintain a fork. (The code is pretty lightweight, the 2
>>>>>> vs. 4 space issue is a 2-line change, and the rest implemented as a
>>>>>> post-processing step (for now, incomplete), so it'd be easy to stay in sync
>>>>>> with upstream.)
>>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël
>>>>>> Mejía <ie...@gmail.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>>>>>> configured in a way to fit our
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think
>>>>>> it is feasible to reformat the
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we
>>>>>> actually do because Black is
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the
>>>>>> standard python codestyle guidelines
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is
>>>>>> what most projects in the
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
>>>>>> access to the Git history. This
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java
>>>>>> SDK. However, I have the feeling
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem
>>>>>> with Python because the linter has
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there
>>>>>> are always tradeoffs we have
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM
>>>>>> Maximilian Michels <mx...@apache.org> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>>>>>> configured in a way to fit our
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think
>>>>>> it is feasible to reformat the
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
>>>>>> access to the Git history. This
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java
>>>>>> SDK. However, I have the feeling
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem
>>>>>> with Python because the linter has
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as
>>>>>> beta with a big warning.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
>>>>>> person project. For the same reason I also strongly prefer not using a fork
>>>>>> for a specific setting. Fork will only have less people looking at it.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as
>>>>>> beta because it is recent, it was
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and
>>>>>> because some things can change since
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky
>>>>>> beasts, I think beta in that case is
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you
>>>>>> look at the contribution page [1]
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and
>>>>>> less a single person project, there
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent
>>>>>> contributions since the project became
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is
>>>>>> hosted in the python organization
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on
>>>>>> the project continuity.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact
>>>>>> that the main author seems to be
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in
>>>>>> the 2-spaces issue he can seem
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following
>>>>>> pep8 style guide recommendations
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam)
>>>>>> do not follow the 4 spaces
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even
>>>>>> Google's own Python style guide [4],
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to
>>>>>> reconsider the current policy to
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the
>>>>>> tool).
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with
>>>>>> python 2.7 compatibility on
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing
>>>>>> errors which is positive given the
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left
>>>>>> unchanged, 4 files failed to reformat.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:
>>>>>>  _display_progress = print
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>>>>>>  file=sys.stderr)
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>>>>>>  print(traceback_string, file=sys.stderr)
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:
>>>>>>  print('-->' if pc == last_pc else '    ',
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive
>>>>>> for the project but well I am
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python
>>>>>> code base so I let you the python
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in
>>>>>> any case it seems like a good
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
>>>>>> https://github.com/python/black/graphs/contributors
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
>>>>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
>>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM
>>>>>> Ahmet Altay <al...@google.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert,
>>>>>> I am in favor of autoformatters but I am not familiar with this one. My
>>>>>> concerns are:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as
>>>>>> beta with a big warning.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
>>>>>> person project. For the same reason I also strongly prefer not using a fork
>>>>>> for a specific setting. Fork will only have less people looking at it.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for
>>>>>> us. That said lint issues are real as pointed in the thread. If someone
>>>>>> would like to give it a try and see how it would look like for us that
>>>>>> would be interesting.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
>>>>>> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of
>>>>>> Jenkins jobs failures are caused by lint problems.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have
>>>>>> something similar to Spotless in Java SDK (I heard there is problem with
>>>>>> configuring Black with IntelliJ).
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM
>>>>>> Robert Bradshaw <ro...@google.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
>>>>>> autoformatters, though I haven't looked at
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one
>>>>>> works. We might have to go with
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>>> https://github.com/desbma/black-2spaces given
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>>> https://github.com/python/black/issues/378 .
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM
>>>>>> Pablo Estrada <pa...@google.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know
>>>>>> at least a couple people (myself included) who've been annoyed by having to
>>>>>> take care of lint issues that maybe a code formatter could save us.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM
>>>>>> Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black
>>>>>> [1] a python code auto formatter that
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
>>>>>> auto-formatter for python, and wanted to
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there
>>>>>> interest from the python people to get this
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless
>>>>>> for Java has been a good improvement and
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may
>>>>>> benefit of this too.
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1]
>>>>>> https://github.com/python/black
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >> --
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >> Michał Walenia
>>>>>> >> > >>>> >> Polidea | Software Engineer
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >> M: +48 791 432 002 <+48%20791%20432%20002>
>>>>>> >> > >>>> >> E: michal.walenia@polidea.com
>>>>>> >> > >>>> >>
>>>>>> >> > >>>> >> Unique Tech
>>>>>> >> > >>>> >> Check out our projects!
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Michał Walenia
>>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>>
>>>> M: +48 791 432 002 <+48791432002>
>>>> E: michal.walenia@polidea.com
>>>>
>>>> Unique Tech
>>>> Check out our projects! <https://www.polidea.com/our-work>
>>>>
>>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Robert Bradshaw <ro...@google.com>.
Thanks!

On Thu, Feb 6, 2020 at 1:29 PM Ismaël Mejía <ie...@gmail.com> wrote:

> Thanks Kamil and Michał for taking care of this.
> Excellent job!
>
> On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
>
>> Thanks to everyone involved in the discussion.
>>
>> I've taken a look at the first 50 recently updated Pull Requests. Only
>> few of them were affected. I hope it wouldn't be too hard to fix them.
>>
>> In any case, here you can find instructions on how to run formatter:
>> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips (section
>> "Formatting").
>>
>> On Thu, Feb 6, 2020 at 12:42 PM Michał Walenia <
>> michal.walenia@polidea.com> wrote:
>>
>>> Hi,
>>> the PR is merged, all checks were green :)
>>> Enjoy prettier Python!
>>>
>>> On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>
>>>> Agree no need for vote for this because the consensus is clear and the
>>>> sole
>>>> impact I can think of are pending PRs that will be broken. In the Java
>>>> case
>>>> what we did was to just notice every PR that was affected by the change.
>>>> And clearly document how to validate and autoformat the code.
>>>>
>>>> So the earlier the better, go go autoformat!
>>>>
>>>> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> No, perhaps not. I agree there's consensus, just wondering what the
>>>>> next steps should be to get this in. (The presubmits look like they're
>>>>> all passing, with the exception of some breakage in java that should
>>>>> be completely unrelated. Of course there's already merge conflicts...)
>>>>>
>>>>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com> wrote:
>>>>> >
>>>>> > Do we need a formal vote? There is consensus on this thread and on
>>>>> the PR.
>>>>> >
>>>>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>> >>
>>>>> >> The PR is looking good. Should we call a vote?
>>>>> >>
>>>>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <
>>>>> robertwb@google.com> wrote:
>>>>> >> >
>>>>> >> > Thanks. I commented on the PR. I think if we're going this route
>>>>> we
>>>>> >> > should add a pre-commit, plus instructions on how to run the tool
>>>>> >> > (similar to spotless).
>>>>> >> >
>>>>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com>
>>>>> wrote:
>>>>> >> > >
>>>>> >> > > I've done a pass on the PR on code I'm familiar with.
>>>>> >> > > Please make a pass and add your suggestions on the PR.
>>>>> >> > >
>>>>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com>
>>>>> wrote:
>>>>> >> > >>
>>>>> >> > >> Java build fails on any unformatted code so python probably
>>>>> should be like that.
>>>>> >> > >> We have to ensure however that it fails early on that.
>>>>> >> > >> As Robert said time to debate the knobs :)
>>>>> >> > >>
>>>>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
>>>>> kamil.wasilewski@polidea.com> wrote:
>>>>> >> > >>>
>>>>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684.
>>>>> Please share your comments ;-) I've managed to reduce the impact a bit:
>>>>> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>>>>> >> > >>>
>>>>> >> > >>> We still need to consider how to enforce the usage of
>>>>> autoformatter. Pre-commit sounds like a nice addition, but it still needs
>>>>> to be installed manually by a developer. On the other hand, Jenkins
>>>>> precommit job that fails if any unformatted code is detected looks like too
>>>>> strict. What do you think?
>>>>> >> > >>>
>>>>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
>>>>> robertwb@google.com> wrote:
>>>>> >> > >>>>
>>>>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
>>>>> >> > >>>>
>>>>> >> > >>>> FYI, I did a simple run (just pushed to
>>>>> >> > >>>>
>>>>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
>>>>> >> > >>>> the impact. The diff is
>>>>> >> > >>>>
>>>>> >> > >>>>     $ git diff --stat master
>>>>> >> > >>>>     ...
>>>>> >> > >>>>      547 files changed, 22118 insertions(+), 21129
>>>>> deletions(-)
>>>>> >> > >>>>
>>>>> >> > >>>> For reference
>>>>> >> > >>>>
>>>>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>>>>> >> > >>>>     ...
>>>>> >> > >>>>     200424  612002 7431637 total
>>>>> >> > >>>>
>>>>> >> > >>>> which means a little over 10% of lines get touched. I think
>>>>> there are
>>>>> >> > >>>> some options, such as
>>>>> SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>>>>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style we
>>>>> are already
>>>>> >> > >>>> (mostly) following.
>>>>> >> > >>>>
>>>>> >> > >>>>
>>>>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>>>>> >> > >>>> <ka...@polidea.com> wrote:
>>>>> >> > >>>> >
>>>>> >> > >>>> > Thank you Michał for creating the ticket. I have some free
>>>>> time and I'd like to volunteer myself for this task.
>>>>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so
>>>>> I'll try `yapf` first.
>>>>> >> > >>>> >
>>>>> >> > >>>> > Best,
>>>>> >> > >>>> > Kamil
>>>>> >> > >>>> >
>>>>> >> > >>>> >
>>>>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
>>>>> michal.walenia@polidea.com> wrote:
>>>>> >> > >>>> >>
>>>>> >> > >>>> >> Hi all,
>>>>> >> > >>>> >> I created a JIRA issue for this and summarized the
>>>>> available tools
>>>>> >> > >>>> >>
>>>>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>>>>> >> > >>>> >>
>>>>> >> > >>>> >> Cheers,
>>>>> >> > >>>> >> Michal
>>>>> >> > >>>> >>
>>>>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <
>>>>> ehudm@google.com> wrote:
>>>>> >> > >>>> >>>
>>>>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
>>>>> >> > >>>> >>>
>>>>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <
>>>>> ehudm@google.com> wrote:
>>>>> >> > >>>> >>>>
>>>>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I
>>>>> volunteer to take this on
>>>>> >> > >>>> >>>>
>>>>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com>
>>>>> wrote:
>>>>> >> > >>>> >>>>>
>>>>> >> > >>>> >>>>> +1 to autoformatting
>>>>> >> > >>>> >>>>>
>>>>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
>>>>> lcwik@google.com> wrote:
>>>>> >> > >>>> >>>>>>
>>>>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went
>>>>> through a one time pass to apply the spotless formatting.
>>>>> >> > >>>> >>>>>>
>>>>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
>>>>> altay@google.com> wrote:
>>>>> >> > >>>> >>>>>>>
>>>>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a
>>>>> well maintained project. I do support making a one time pass to apply
>>>>> formatting the whole code base.
>>>>> >> > >>>> >>>>>>>
>>>>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
>>>>> chadrik@gmail.com> wrote:
>>>>> >> > >>>> >>>>>>>>>
>>>>> >> > >>>> >>>>>>>>>
>>>>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to
>>>>> violating (or at
>>>>> >> > >>>> >>>>>>>>> least changed) lines.
>>>>> >> > >>>> >>>>>>>>
>>>>> >> > >>>> >>>>>>>>
>>>>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all of
>>>>> the code in one go, since it’s a very safe operation. Did you have
>>>>> something else in mind?
>>>>> >> > >>>> >>>>>>>>
>>>>> >> > >>>> >>>>>>>> -chad
>>>>> >> > >>>> >>>>>>>>
>>>>> >> > >>>> >>>>>>>>
>>>>> >> > >>>> >>>>>>>>
>>>>> >> > >>>> >>>>>>>>
>>>>> >> > >>>> >>>>>>>>>
>>>>> >> > >>>> >>>>>>>>>
>>>>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
>>>>> chadrik@gmail.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > +1 to autoformatting
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
>>>>> formatters:  those which take the original formatting into consideration
>>>>> (autopep8) and those which disregard it (yapf, black).
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have
>>>>> plenty of options to tweak with yapf (enough to make the output a pretty
>>>>> close match to the current Beam style), and you can mark areas to preserve
>>>>> the original formatting, which could be very useful with Pipeline building
>>>>> with pipe operators.  Please don't pick black.
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in
>>>>> Java -- it only corrects code that breaks the project's style rules.  The
>>>>> big problem with Beam's current style is that it is so esoteric that
>>>>> autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which
>>>>> I don't really have a problem with -- the problem is the use of either 2 or
>>>>> 4 spaces depending on context (expression start vs hanging indent, etc).
>>>>> This is my *biggest* gripe about the current style.  PyCharm doesn't have
>>>>> enough control either.  So, if we can choose a style that can be expressed
>>>>> by flake8 or pycodestyle then we can use autopep8 to enforce it.
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like
>>>>> having a little wiggle room to influence the style, but on a big project
>>>>> like Beam all that wiggle room ends up to minor but noticeable
>>>>> inconsistencies in style throughout the project.  yapf ensures completely
>>>>> consistent style, but the tradeoff is that it's sometimes ugly, especially
>>>>> in scenarios with similar repeated entries like argparse, where yapf might
>>>>> insert line breaks in visually inconsistent and unappealing ways depending
>>>>> on the lengths of the keywords and expressions involved.
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I
>>>>> think it'd be a nice addition to setup a pre-commit [1] config so that
>>>>> people can opt in to running *lightweight* autofixers prior to commit.
>>>>> This will not only reduce dev frustration but will also reduce the amount
>>>>> of cpu cycles that Jenkins spends pointing out lint errors.
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > -chad
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> >
>>>>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
>>>>> iemejia@gmail.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>
>>>>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to
>>>>> be much progress into autoformatting.
>>>>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it
>>>>> could be more appropriate for Beam's use case.
>>>>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
>>>>> >> > >>>> >>>>>>>>> >> WDYT?
>>>>> >> > >>>> >>>>>>>>> >>
>>>>> >> > >>>> >>>>>>>>> >>
>>>>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>>>>> lgajowy@apache.org> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that
>>>>> does the job. My experience is that since spotless in Java SDK I would
>>>>> never start a new Java project without it. So many great benefits not only
>>>>> for one person coding but for all community.
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily
>>>>> browse past the reformat. It is not actually that hard, but does take a
>>>>> couple extra clicks to get GitHub to display blame before a reformat. It is
>>>>> easier with the command line. I do a lot of code history digging and the
>>>>> global Java reformat is not really a problem.
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I
>>>>> agree it's not the best way to search the history. The most convenient and
>>>>> clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" ->
>>>>> click again -> "annotate previous revision" -> ...
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the
>>>>> diff between two revisions.
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>> Łukasz
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>>
>>>>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
>>>>> kenn@apache.org> napisał(a):
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite
>>>>> tell - seems there are some issues?)
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna
>>>>> Kucharczyk <ka...@gmail.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is
>>>>> faster PR review iteration. We will skip some of conversations about code
>>>>> style.
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>> ...
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be
>>>>> less discouraged. When I started contribute I didn’t know how to format my
>>>>> code and I lost a lot of time to add pylint and adjust IntelliJ. I
>>>>> eventually failed. Currently I write code intuitively and when I don’t
>>>>> forget I rerun tox.
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I
>>>>> supported it so much for Java. It is a community benefit. You do not have
>>>>> to be a contributor to the Python SDK to support this. That is why I am
>>>>> writing here. Just eliminate all discussion of formatting. It doesn't
>>>>> really matter what the resulting format is, if it is not crazy to read. I
>>>>> strongly oppose maintaining a non-default format.
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard.
>>>>> The Java global reformat touched 50k lines. It does not really matter how
>>>>> big it is. Definitely do it all at once if you think the tool is good
>>>>> enough. And you should pin a version, so churn is not a problem. You can
>>>>> upgrade the version and reformat in a PR later and that is also easy.
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot
>>>>> easily browse past the reformat. It is not actually that hard, but does
>>>>> take a couple extra clicks to get GitHub to display blame before a
>>>>> reformat. It is easier with the command line. I do a lot of code history
>>>>> digging and the global Java reformat is not really a problem.
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>> Kenn
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same
>>>>> way, so eventually it would be easier to read.
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous
>>>>> emails - a lot of Jenkins failures won’t take place, so we save time and
>>>>> resources.
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines
>>>>> has custom syntax and after formatting they looks a little bit weird, but
>>>>> maybe extending the only configurable option in Black - lines, from 88 to
>>>>> 110 would be solution.
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3
>>>>> to be run. I don’t know how big obstacle it would be.
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it would
>>>>> be possible to introduce Black. First: just do it, it will hurt but then it
>>>>> would be ok (same as a dentist appointment). Of course it may require some
>>>>> work to adjust linters. On the other hand we can do it gradually and start
>>>>> including sdk parts one by one - maybe it will be less painful?
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects
>>>>> [2] I know that uses Black (they use also other cool checkers and
>>>>> pre-commit [3]). This is how looks their build with all checks [4].
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve
>>>>> our coding experience, we should improve our toolset. Black seems be recent
>>>>> and quite popular tool what makes think they won’t stop developing it.
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> [1]
>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> [2]
>>>>> https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> [4]
>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert
>>>>> Bradshaw <ro...@google.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a
>>>>> non-starter to me, as it would change nearly every single line in the
>>>>> codebase (and the loss of all context as well as that particular line).
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork
>>>>> exists. However, we don't conform to that either--we use 2 spaces for
>>>>> indentation, but 4 for continuation indentation. (As for the history of
>>>>> this, this goes back to Google's internal style guide, probably motivated
>>>>> by consistency with C++, Java, ... and the fact that with an indent level
>>>>> of 4 one ends up wrapping lines quite frequently (it's telling that black's
>>>>> default line length is 88)). This turns out to be an easy change to the
>>>>> codebase.
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace
>>>>> thing, I found that this tool introduces a huge amount of vertical
>>>>> whitespace (e.g. closing parentheses on their own line), e.g.
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> def foo(
>>>>> >> > >>>> >>>>>>>>> >>>>>>     args
>>>>> >> > >>>> >>>>>>>>> >>>>>> ):
>>>>> >> > >>>> >>>>>>>>> >>>>>>   if (
>>>>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
>>>>> >> > >>>> >>>>>>>>> >>>>>>   ):
>>>>> >> > >>>> >>>>>>>>> >>>>>>     func(
>>>>> >> > >>>> >>>>>>>>> >>>>>>         args
>>>>> >> > >>>> >>>>>>>>> >>>>>>     )
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put
>>>>> closing parentheses on the same lines, as well as omit the newline after
>>>>> "if (", and disabling formatting of strings, which reduce the churn in our
>>>>> codebase to 15k lines (adding about 4k) out of 200k total.
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> https://github.com/apache/beam/pull/8712/files
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in
>>>>> different ways then me, and doesn't understand the semantics of the code,
>>>>> but possibly something we could live with given the huge advantages of an
>>>>> autoformatter.
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow,
>>>>> but not require, autoformatting of changed lines.
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like
>>>>> it's got a decent number of contributors and in my book being in the python
>>>>> github project is a strong positive signal. But, due to the above issues, I
>>>>> think we'd have to maintain a fork. (The code is pretty lightweight, the 2
>>>>> vs. 4 space issue is a 2-line change, and the rest implemented as a
>>>>> post-processing step (for now, incomplete), so it'd be easy to stay in sync
>>>>> with upstream.)
>>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël
>>>>> Mejía <ie...@gmail.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>>>>> configured in a way to fit our
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think
>>>>> it is feasible to reformat the
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we
>>>>> actually do because Black is
>>>>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the standard
>>>>> python codestyle guidelines
>>>>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is
>>>>> what most projects in the
>>>>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
>>>>> access to the Git history. This
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java
>>>>> SDK. However, I have the feeling
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem
>>>>> with Python because the linter has
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there
>>>>> are always tradeoffs we have
>>>>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM
>>>>> Maximilian Michels <mx...@apache.org> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>>>>> configured in a way to fit our
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think
>>>>> it is feasible to reformat the
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
>>>>> access to the Git history. This
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java
>>>>> SDK. However, I have the feeling
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem
>>>>> with Python because the linter has
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
>>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as
>>>>> beta with a big warning.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
>>>>> person project. For the same reason I also strongly prefer not using a fork
>>>>> for a specific setting. Fork will only have less people looking at it.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as
>>>>> beta because it is recent, it was
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and
>>>>> because some things can change since
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky
>>>>> beasts, I think beta in that case is
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you
>>>>> look at the contribution page [1]
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and
>>>>> less a single person project, there
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent
>>>>> contributions since the project became
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is
>>>>> hosted in the python organization
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on
>>>>> the project continuity.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact
>>>>> that the main author seems to be
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the
>>>>> 2-spaces issue he can seem
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following
>>>>> pep8 style guide recommendations
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do
>>>>> not follow the 4 spaces
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even
>>>>> Google's own Python style guide [4],
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to
>>>>> reconsider the current policy to
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with
>>>>> python 2.7 compatibility on
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing
>>>>> errors which is positive given the
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left
>>>>> unchanged, 4 files failed to reformat.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:
>>>>>  _display_progress = print
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>>>>>  file=sys.stderr)
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>>>>>  print(traceback_string, file=sys.stderr)
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:
>>>>>  print('-->' if pc == last_pc else '    ',
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive
>>>>> for the project but well I am
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python
>>>>> code base so I let you the python
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in
>>>>> any case it seems like a good
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
>>>>> https://github.com/python/black/graphs/contributors
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
>>>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM
>>>>> Ahmet Altay <al...@google.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I
>>>>> am in favor of autoformatters but I am not familiar with this one. My
>>>>> concerns are:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as
>>>>> beta with a big warning.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
>>>>> person project. For the same reason I also strongly prefer not using a fork
>>>>> for a specific setting. Fork will only have less people looking at it.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for
>>>>> us. That said lint issues are real as pointed in the thread. If someone
>>>>> would like to give it a try and see how it would look like for us that
>>>>> would be interesting.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
>>>>> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of
>>>>> Jenkins jobs failures are caused by lint problems.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have
>>>>> something similar to Spotless in Java SDK (I heard there is problem with
>>>>> configuring Black with IntelliJ).
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM
>>>>> Robert Bradshaw <ro...@google.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
>>>>> autoformatters, though I haven't looked at
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one
>>>>> works. We might have to go with
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>> https://github.com/desbma/black-2spaces given
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>> https://github.com/python/black/issues/378 .
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM
>>>>> Pablo Estrada <pa...@google.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know
>>>>> at least a couple people (myself included) who've been annoyed by having to
>>>>> take care of lint issues that maybe a code formatter could save us.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM
>>>>> Ismaël Mejía <ie...@gmail.com> wrote:
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black
>>>>> [1] a python code auto formatter that
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
>>>>> auto-formatter for python, and wanted to
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there
>>>>> interest from the python people to get this
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for
>>>>> Java has been a good improvement and
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may
>>>>> benefit of this too.
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1]
>>>>> https://github.com/python/black
>>>>> >> > >>>> >>
>>>>> >> > >>>> >>
>>>>> >> > >>>> >>
>>>>> >> > >>>> >> --
>>>>> >> > >>>> >>
>>>>> >> > >>>> >> Michał Walenia
>>>>> >> > >>>> >> Polidea | Software Engineer
>>>>> >> > >>>> >>
>>>>> >> > >>>> >> M: +48 791 432 002 <+48%20791%20432%20002>
>>>>> >> > >>>> >> E: michal.walenia@polidea.com
>>>>> >> > >>>> >>
>>>>> >> > >>>> >> Unique Tech
>>>>> >> > >>>> >> Check out our projects!
>>>>>
>>>>
>>>
>>> --
>>>
>>> Michał Walenia
>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>
>>> M: +48 791 432 002 <+48791432002>
>>> E: michal.walenia@polidea.com
>>>
>>> Unique Tech
>>> Check out our projects! <https://www.polidea.com/our-work>
>>>
>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Ismaël Mejía <ie...@gmail.com>.
Thanks Kamil and Michał for taking care of this.
Excellent job!

On Thu, Feb 6, 2020 at 1:45 PM Kamil Wasilewski <
kamil.wasilewski@polidea.com> wrote:

> Thanks to everyone involved in the discussion.
>
> I've taken a look at the first 50 recently updated Pull Requests. Only few
> of them were affected. I hope it wouldn't be too hard to fix them.
>
> In any case, here you can find instructions on how to run formatter:
> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips (section
> "Formatting").
>
> On Thu, Feb 6, 2020 at 12:42 PM Michał Walenia <mi...@polidea.com>
> wrote:
>
>> Hi,
>> the PR is merged, all checks were green :)
>> Enjoy prettier Python!
>>
>> On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> Agree no need for vote for this because the consensus is clear and the
>>> sole
>>> impact I can think of are pending PRs that will be broken. In the Java
>>> case
>>> what we did was to just notice every PR that was affected by the change.
>>> And clearly document how to validate and autoformat the code.
>>>
>>> So the earlier the better, go go autoformat!
>>>
>>> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> No, perhaps not. I agree there's consensus, just wondering what the
>>>> next steps should be to get this in. (The presubmits look like they're
>>>> all passing, with the exception of some breakage in java that should
>>>> be completely unrelated. Of course there's already merge conflicts...)
>>>>
>>>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com> wrote:
>>>> >
>>>> > Do we need a formal vote? There is consensus on this thread and on
>>>> the PR.
>>>> >
>>>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>> >>
>>>> >> The PR is looking good. Should we call a vote?
>>>> >>
>>>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <
>>>> robertwb@google.com> wrote:
>>>> >> >
>>>> >> > Thanks. I commented on the PR. I think if we're going this route we
>>>> >> > should add a pre-commit, plus instructions on how to run the tool
>>>> >> > (similar to spotless).
>>>> >> >
>>>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com>
>>>> wrote:
>>>> >> > >
>>>> >> > > I've done a pass on the PR on code I'm familiar with.
>>>> >> > > Please make a pass and add your suggestions on the PR.
>>>> >> > >
>>>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com>
>>>> wrote:
>>>> >> > >>
>>>> >> > >> Java build fails on any unformatted code so python probably
>>>> should be like that.
>>>> >> > >> We have to ensure however that it fails early on that.
>>>> >> > >> As Robert said time to debate the knobs :)
>>>> >> > >>
>>>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
>>>> kamil.wasilewski@polidea.com> wrote:
>>>> >> > >>>
>>>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684.
>>>> Please share your comments ;-) I've managed to reduce the impact a bit:
>>>> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>>>> >> > >>>
>>>> >> > >>> We still need to consider how to enforce the usage of
>>>> autoformatter. Pre-commit sounds like a nice addition, but it still needs
>>>> to be installed manually by a developer. On the other hand, Jenkins
>>>> precommit job that fails if any unformatted code is detected looks like too
>>>> strict. What do you think?
>>>> >> > >>>
>>>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
>>>> robertwb@google.com> wrote:
>>>> >> > >>>>
>>>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
>>>> >> > >>>>
>>>> >> > >>>> FYI, I did a simple run (just pushed to
>>>> >> > >>>> https://github.com/apache/beam/compare/master...robertwb:yapf)
>>>> to see
>>>> >> > >>>> the impact. The diff is
>>>> >> > >>>>
>>>> >> > >>>>     $ git diff --stat master
>>>> >> > >>>>     ...
>>>> >> > >>>>      547 files changed, 22118 insertions(+), 21129
>>>> deletions(-)
>>>> >> > >>>>
>>>> >> > >>>> For reference
>>>> >> > >>>>
>>>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>>>> >> > >>>>     ...
>>>> >> > >>>>     200424  612002 7431637 total
>>>> >> > >>>>
>>>> >> > >>>> which means a little over 10% of lines get touched. I think
>>>> there are
>>>> >> > >>>> some options, such as
>>>> SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>>>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style we are
>>>> already
>>>> >> > >>>> (mostly) following.
>>>> >> > >>>>
>>>> >> > >>>>
>>>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>>>> >> > >>>> <ka...@polidea.com> wrote:
>>>> >> > >>>> >
>>>> >> > >>>> > Thank you Michał for creating the ticket. I have some free
>>>> time and I'd like to volunteer myself for this task.
>>>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so I'll
>>>> try `yapf` first.
>>>> >> > >>>> >
>>>> >> > >>>> > Best,
>>>> >> > >>>> > Kamil
>>>> >> > >>>> >
>>>> >> > >>>> >
>>>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
>>>> michal.walenia@polidea.com> wrote:
>>>> >> > >>>> >>
>>>> >> > >>>> >> Hi all,
>>>> >> > >>>> >> I created a JIRA issue for this and summarized the
>>>> available tools
>>>> >> > >>>> >>
>>>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>>>> >> > >>>> >>
>>>> >> > >>>> >> Cheers,
>>>> >> > >>>> >> Michal
>>>> >> > >>>> >>
>>>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <
>>>> ehudm@google.com> wrote:
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
>>>> >> > >>>> >>>
>>>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <
>>>> ehudm@google.com> wrote:
>>>> >> > >>>> >>>>
>>>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I volunteer
>>>> to take this on
>>>> >> > >>>> >>>>
>>>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com>
>>>> wrote:
>>>> >> > >>>> >>>>>
>>>> >> > >>>> >>>>> +1 to autoformatting
>>>> >> > >>>> >>>>>
>>>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
>>>> lcwik@google.com> wrote:
>>>> >> > >>>> >>>>>>
>>>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went
>>>> through a one time pass to apply the spotless formatting.
>>>> >> > >>>> >>>>>>
>>>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
>>>> altay@google.com> wrote:
>>>> >> > >>>> >>>>>>>
>>>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a
>>>> well maintained project. I do support making a one time pass to apply
>>>> formatting the whole code base.
>>>> >> > >>>> >>>>>>>
>>>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
>>>> chadrik@gmail.com> wrote:
>>>> >> > >>>> >>>>>>>>>
>>>> >> > >>>> >>>>>>>>>
>>>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to
>>>> violating (or at
>>>> >> > >>>> >>>>>>>>> least changed) lines.
>>>> >> > >>>> >>>>>>>>
>>>> >> > >>>> >>>>>>>>
>>>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all of
>>>> the code in one go, since it’s a very safe operation. Did you have
>>>> something else in mind?
>>>> >> > >>>> >>>>>>>>
>>>> >> > >>>> >>>>>>>> -chad
>>>> >> > >>>> >>>>>>>>
>>>> >> > >>>> >>>>>>>>
>>>> >> > >>>> >>>>>>>>
>>>> >> > >>>> >>>>>>>>
>>>> >> > >>>> >>>>>>>>>
>>>> >> > >>>> >>>>>>>>>
>>>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
>>>> chadrik@gmail.com> wrote:
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > +1 to autoformatting
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
>>>> formatters:  those which take the original formatting into consideration
>>>> (autopep8) and those which disregard it (yapf, black).
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have
>>>> plenty of options to tweak with yapf (enough to make the output a pretty
>>>> close match to the current Beam style), and you can mark areas to preserve
>>>> the original formatting, which could be very useful with Pipeline building
>>>> with pipe operators.  Please don't pick black.
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in
>>>> Java -- it only corrects code that breaks the project's style rules.  The
>>>> big problem with Beam's current style is that it is so esoteric that
>>>> autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which
>>>> I don't really have a problem with -- the problem is the use of either 2 or
>>>> 4 spaces depending on context (expression start vs hanging indent, etc).
>>>> This is my *biggest* gripe about the current style.  PyCharm doesn't have
>>>> enough control either.  So, if we can choose a style that can be expressed
>>>> by flake8 or pycodestyle then we can use autopep8 to enforce it.
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having
>>>> a little wiggle room to influence the style, but on a big project like Beam
>>>> all that wiggle room ends up to minor but noticeable inconsistencies in
>>>> style throughout the project.  yapf ensures completely consistent style,
>>>> but the tradeoff is that it's sometimes ugly, especially in scenarios with
>>>> similar repeated entries like argparse, where yapf might insert line breaks
>>>> in visually inconsistent and unappealing ways depending on the lengths of
>>>> the keywords and expressions involved.
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I
>>>> think it'd be a nice addition to setup a pre-commit [1] config so that
>>>> people can opt in to running *lightweight* autofixers prior to commit.
>>>> This will not only reduce dev frustration but will also reduce the amount
>>>> of cpu cycles that Jenkins spends pointing out lint errors.
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > -chad
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> >
>>>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
>>>> iemejia@gmail.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>
>>>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to
>>>> be much progress into autoformatting.
>>>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it
>>>> could be more appropriate for Beam's use case.
>>>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
>>>> >> > >>>> >>>>>>>>> >> WDYT?
>>>> >> > >>>> >>>>>>>>> >>
>>>> >> > >>>> >>>>>>>>> >>
>>>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>>>> lgajowy@apache.org> wrote:
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that
>>>> does the job. My experience is that since spotless in Java SDK I would
>>>> never start a new Java project without it. So many great benefits not only
>>>> for one person coding but for all community.
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily
>>>> browse past the reformat. It is not actually that hard, but does take a
>>>> couple extra clicks to get GitHub to display blame before a reformat. It is
>>>> easier with the command line. I do a lot of code history digging and the
>>>> global Java reformat is not really a problem.
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I
>>>> agree it's not the best way to search the history. The most convenient and
>>>> clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" ->
>>>> click again -> "annotate previous revision" -> ...
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the diff
>>>> between two revisions.
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>> Łukasz
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>>
>>>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
>>>> kenn@apache.org> napisał(a):
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite
>>>> tell - seems there are some issues?)
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna
>>>> Kucharczyk <ka...@gmail.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is
>>>> faster PR review iteration. We will skip some of conversations about code
>>>> style.
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>> ...
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be
>>>> less discouraged. When I started contribute I didn’t know how to format my
>>>> code and I lost a lot of time to add pylint and adjust IntelliJ. I
>>>> eventually failed. Currently I write code intuitively and when I don’t
>>>> forget I rerun tox.
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I
>>>> supported it so much for Java. It is a community benefit. You do not have
>>>> to be a contributor to the Python SDK to support this. That is why I am
>>>> writing here. Just eliminate all discussion of formatting. It doesn't
>>>> really matter what the resulting format is, if it is not crazy to read. I
>>>> strongly oppose maintaining a non-default format.
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The
>>>> Java global reformat touched 50k lines. It does not really matter how big
>>>> it is. Definitely do it all at once if you think the tool is good enough.
>>>> And you should pin a version, so churn is not a problem. You can upgrade
>>>> the version and reformat in a PR later and that is also easy.
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily
>>>> browse past the reformat. It is not actually that hard, but does take a
>>>> couple extra clicks to get GitHub to display blame before a reformat. It is
>>>> easier with the command line. I do a lot of code history digging and the
>>>> global Java reformat is not really a problem.
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>> Kenn
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same
>>>> way, so eventually it would be easier to read.
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous
>>>> emails - a lot of Jenkins failures won’t take place, so we save time and
>>>> resources.
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines
>>>> has custom syntax and after formatting they looks a little bit weird, but
>>>> maybe extending the only configurable option in Black - lines, from 88 to
>>>> 110 would be solution.
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to
>>>> be run. I don’t know how big obstacle it would be.
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it would
>>>> be possible to introduce Black. First: just do it, it will hurt but then it
>>>> would be ok (same as a dentist appointment). Of course it may require some
>>>> work to adjust linters. On the other hand we can do it gradually and start
>>>> including sdk parts one by one - maybe it will be less painful?
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects [2]
>>>> I know that uses Black (they use also other cool checkers and pre-commit
>>>> [3]). This is how looks their build with all checks [4].
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve
>>>> our coding experience, we should improve our toolset. Black seems be recent
>>>> and quite popular tool what makes think they won’t stop developing it.
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> [1]
>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> [2]
>>>> https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> [4]
>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert
>>>> Bradshaw <ro...@google.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter
>>>> to me, as it would change nearly every single line in the codebase (and the
>>>> loss of all context as well as that particular line).
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork
>>>> exists. However, we don't conform to that either--we use 2 spaces for
>>>> indentation, but 4 for continuation indentation. (As for the history of
>>>> this, this goes back to Google's internal style guide, probably motivated
>>>> by consistency with C++, Java, ... and the fact that with an indent level
>>>> of 4 one ends up wrapping lines quite frequently (it's telling that black's
>>>> default line length is 88)). This turns out to be an easy change to the
>>>> codebase.
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace
>>>> thing, I found that this tool introduces a huge amount of vertical
>>>> whitespace (e.g. closing parentheses on their own line), e.g.
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> def foo(
>>>> >> > >>>> >>>>>>>>> >>>>>>     args
>>>> >> > >>>> >>>>>>>>> >>>>>> ):
>>>> >> > >>>> >>>>>>>>> >>>>>>   if (
>>>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
>>>> >> > >>>> >>>>>>>>> >>>>>>   ):
>>>> >> > >>>> >>>>>>>>> >>>>>>     func(
>>>> >> > >>>> >>>>>>>>> >>>>>>         args
>>>> >> > >>>> >>>>>>>>> >>>>>>     )
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put
>>>> closing parentheses on the same lines, as well as omit the newline after
>>>> "if (", and disabling formatting of strings, which reduce the churn in our
>>>> codebase to 15k lines (adding about 4k) out of 200k total.
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> https://github.com/apache/beam/pull/8712/files
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in
>>>> different ways then me, and doesn't understand the semantics of the code,
>>>> but possibly something we could live with given the huge advantages of an
>>>> autoformatter.
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but
>>>> not require, autoformatting of changed lines.
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like
>>>> it's got a decent number of contributors and in my book being in the python
>>>> github project is a strong positive signal. But, due to the above issues, I
>>>> think we'd have to maintain a fork. (The code is pretty lightweight, the 2
>>>> vs. 4 space issue is a 2-line change, and the rest implemented as a
>>>> post-processing step (for now, incomplete), so it'd be easy to stay in sync
>>>> with upstream.)
>>>> >> > >>>> >>>>>>>>> >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël
>>>> Mejía <ie...@gmail.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>>>> configured in a way to fit our
>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it
>>>> is feasible to reformat the
>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we
>>>> actually do because Black is
>>>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the standard
>>>> python codestyle guidelines
>>>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is
>>>> what most projects in the
>>>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
>>>> access to the Git history. This
>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
>>>> However, I have the feeling
>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem
>>>> with Python because the linter has
>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there
>>>> are always tradeoffs we have
>>>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>> >> > >>>> >>>>>>>>> >>>>>> >
>>>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM
>>>> Maximilian Michels <mx...@apache.org> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>>>> configured in a way to fit our
>>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it
>>>> is feasible to reformat the
>>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
>>>> access to the Git history. This
>>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
>>>> However, I have the feeling
>>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem
>>>> with Python because the linter has
>>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
>>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as
>>>> beta with a big warning.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
>>>> person project. For the same reason I also strongly prefer not using a fork
>>>> for a specific setting. Fork will only have less people looking at it.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as
>>>> beta because it is recent, it was
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because
>>>> some things can change since
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky
>>>> beasts, I think beta in that case is
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you
>>>> look at the contribution page [1]
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and
>>>> less a single person project, there
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions
>>>> since the project became
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted
>>>> in the python organization
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on
>>>> the project continuity.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact
>>>> that the main author seems to be
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the
>>>> 2-spaces issue he can seem
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following
>>>> pep8 style guide recommendations
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do
>>>> not follow the 4 spaces
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even
>>>> Google's own Python style guide [4],
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to
>>>> reconsider the current policy to
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python
>>>> 2.7 compatibility on
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing
>>>> errors which is positive given the
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left
>>>> unchanged, 4 files failed to reformat.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:
>>>>  _display_progress = print
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>>>>  file=sys.stderr)
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>>>>  print(traceback_string, file=sys.stderr)
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:
>>>>  print('-->' if pc == last_pc else '    ',
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for
>>>> the project but well I am
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python
>>>> code base so I let you the python
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any
>>>> case it seems like a good
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
>>>> https://github.com/python/black/graphs/contributors
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
>>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet
>>>> Altay <al...@google.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I
>>>> am in favor of autoformatters but I am not familiar with this one. My
>>>> concerns are:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as
>>>> beta with a big warning.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single
>>>> person project. For the same reason I also strongly prefer not using a fork
>>>> for a specific setting. Fork will only have less people looking at it.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for
>>>> us. That said lint issues are real as pointed in the thread. If someone
>>>> would like to give it a try and see how it would look like for us that
>>>> would be interesting.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
>>>> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of
>>>> Jenkins jobs failures are caused by lint problems.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have
>>>> something similar to Spotless in Java SDK (I heard there is problem with
>>>> configuring Black with IntelliJ).
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM
>>>> Robert Bradshaw <ro...@google.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
>>>> autoformatters, though I haven't looked at
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works.
>>>> We might have to go with
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>> https://github.com/desbma/black-2spaces given
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>> https://github.com/python/black/issues/378 .
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM
>>>> Pablo Estrada <pa...@google.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at
>>>> least a couple people (myself included) who've been annoyed by having to
>>>> take care of lint issues that maybe a code formatter could save us.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM
>>>> Ismaël Mejía <ie...@gmail.com> wrote:
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black
>>>> [1] a python code auto formatter that
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
>>>> auto-formatter for python, and wanted to
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest
>>>> from the python people to get this
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for
>>>> Java has been a good improvement and
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may
>>>> benefit of this too.
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1]
>>>> https://github.com/python/black
>>>> >> > >>>> >>
>>>> >> > >>>> >>
>>>> >> > >>>> >>
>>>> >> > >>>> >> --
>>>> >> > >>>> >>
>>>> >> > >>>> >> Michał Walenia
>>>> >> > >>>> >> Polidea | Software Engineer
>>>> >> > >>>> >>
>>>> >> > >>>> >> M: +48 791 432 002
>>>> >> > >>>> >> E: michal.walenia@polidea.com
>>>> >> > >>>> >>
>>>> >> > >>>> >> Unique Tech
>>>> >> > >>>> >> Check out our projects!
>>>>
>>>
>>
>> --
>>
>> Michał Walenia
>> Polidea <https://www.polidea.com/> | Software Engineer
>>
>> M: +48 791 432 002 <+48791432002>
>> E: michal.walenia@polidea.com
>>
>> Unique Tech
>> Check out our projects! <https://www.polidea.com/our-work>
>>
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Kamil Wasilewski <ka...@polidea.com>.
Thanks to everyone involved in the discussion.

I've taken a look at the first 50 recently updated Pull Requests. Only few
of them were affected. I hope it wouldn't be too hard to fix them.

In any case, here you can find instructions on how to run formatter:
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips (section
"Formatting").

On Thu, Feb 6, 2020 at 12:42 PM Michał Walenia <mi...@polidea.com>
wrote:

> Hi,
> the PR is merged, all checks were green :)
> Enjoy prettier Python!
>
> On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Agree no need for vote for this because the consensus is clear and the
>> sole
>> impact I can think of are pending PRs that will be broken. In the Java
>> case
>> what we did was to just notice every PR that was affected by the change.
>> And clearly document how to validate and autoformat the code.
>>
>> So the earlier the better, go go autoformat!
>>
>> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> No, perhaps not. I agree there's consensus, just wondering what the
>>> next steps should be to get this in. (The presubmits look like they're
>>> all passing, with the exception of some breakage in java that should
>>> be completely unrelated. Of course there's already merge conflicts...)
>>>
>>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com> wrote:
>>> >
>>> > Do we need a formal vote? There is consensus on this thread and on the
>>> PR.
>>> >
>>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>> >>
>>> >> The PR is looking good. Should we call a vote?
>>> >>
>>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>> >> >
>>> >> > Thanks. I commented on the PR. I think if we're going this route we
>>> >> > should add a pre-commit, plus instructions on how to run the tool
>>> >> > (similar to spotless).
>>> >> >
>>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com>
>>> wrote:
>>> >> > >
>>> >> > > I've done a pass on the PR on code I'm familiar with.
>>> >> > > Please make a pass and add your suggestions on the PR.
>>> >> > >
>>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com>
>>> wrote:
>>> >> > >>
>>> >> > >> Java build fails on any unformatted code so python probably
>>> should be like that.
>>> >> > >> We have to ensure however that it fails early on that.
>>> >> > >> As Robert said time to debate the knobs :)
>>> >> > >>
>>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
>>> kamil.wasilewski@polidea.com> wrote:
>>> >> > >>>
>>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684. Please
>>> share your comments ;-) I've managed to reduce the impact a bit:
>>> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>>> >> > >>>
>>> >> > >>> We still need to consider how to enforce the usage of
>>> autoformatter. Pre-commit sounds like a nice addition, but it still needs
>>> to be installed manually by a developer. On the other hand, Jenkins
>>> precommit job that fails if any unformatted code is detected looks like too
>>> strict. What do you think?
>>> >> > >>>
>>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
>>> robertwb@google.com> wrote:
>>> >> > >>>>
>>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
>>> >> > >>>>
>>> >> > >>>> FYI, I did a simple run (just pushed to
>>> >> > >>>> https://github.com/apache/beam/compare/master...robertwb:yapf)
>>> to see
>>> >> > >>>> the impact. The diff is
>>> >> > >>>>
>>> >> > >>>>     $ git diff --stat master
>>> >> > >>>>     ...
>>> >> > >>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
>>> >> > >>>>
>>> >> > >>>> For reference
>>> >> > >>>>
>>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>>> >> > >>>>     ...
>>> >> > >>>>     200424  612002 7431637 total
>>> >> > >>>>
>>> >> > >>>> which means a little over 10% of lines get touched. I think
>>> there are
>>> >> > >>>> some options, such as
>>> SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style we are
>>> already
>>> >> > >>>> (mostly) following.
>>> >> > >>>>
>>> >> > >>>>
>>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>>> >> > >>>> <ka...@polidea.com> wrote:
>>> >> > >>>> >
>>> >> > >>>> > Thank you Michał for creating the ticket. I have some free
>>> time and I'd like to volunteer myself for this task.
>>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so I'll
>>> try `yapf` first.
>>> >> > >>>> >
>>> >> > >>>> > Best,
>>> >> > >>>> > Kamil
>>> >> > >>>> >
>>> >> > >>>> >
>>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
>>> michal.walenia@polidea.com> wrote:
>>> >> > >>>> >>
>>> >> > >>>> >> Hi all,
>>> >> > >>>> >> I created a JIRA issue for this and summarized the
>>> available tools
>>> >> > >>>> >>
>>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>>> >> > >>>> >>
>>> >> > >>>> >> Cheers,
>>> >> > >>>> >> Michal
>>> >> > >>>> >>
>>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com>
>>> wrote:
>>> >> > >>>> >>>
>>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
>>> >> > >>>> >>>
>>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <
>>> ehudm@google.com> wrote:
>>> >> > >>>> >>>>
>>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I volunteer
>>> to take this on
>>> >> > >>>> >>>>
>>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com>
>>> wrote:
>>> >> > >>>> >>>>>
>>> >> > >>>> >>>>> +1 to autoformatting
>>> >> > >>>> >>>>>
>>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
>>> lcwik@google.com> wrote:
>>> >> > >>>> >>>>>>
>>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went
>>> through a one time pass to apply the spotless formatting.
>>> >> > >>>> >>>>>>
>>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
>>> altay@google.com> wrote:
>>> >> > >>>> >>>>>>>
>>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well
>>> maintained project. I do support making a one time pass to apply formatting
>>> the whole code base.
>>> >> > >>>> >>>>>>>
>>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
>>> chadrik@gmail.com> wrote:
>>> >> > >>>> >>>>>>>>>
>>> >> > >>>> >>>>>>>>>
>>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to
>>> violating (or at
>>> >> > >>>> >>>>>>>>> least changed) lines.
>>> >> > >>>> >>>>>>>>
>>> >> > >>>> >>>>>>>>
>>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all of
>>> the code in one go, since it’s a very safe operation. Did you have
>>> something else in mind?
>>> >> > >>>> >>>>>>>>
>>> >> > >>>> >>>>>>>> -chad
>>> >> > >>>> >>>>>>>>
>>> >> > >>>> >>>>>>>>
>>> >> > >>>> >>>>>>>>
>>> >> > >>>> >>>>>>>>
>>> >> > >>>> >>>>>>>>>
>>> >> > >>>> >>>>>>>>>
>>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
>>> chadrik@gmail.com> wrote:
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > +1 to autoformatting
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
>>> formatters:  those which take the original formatting into consideration
>>> (autopep8) and those which disregard it (yapf, black).
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have
>>> plenty of options to tweak with yapf (enough to make the output a pretty
>>> close match to the current Beam style), and you can mark areas to preserve
>>> the original formatting, which could be very useful with Pipeline building
>>> with pipe operators.  Please don't pick black.
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in
>>> Java -- it only corrects code that breaks the project's style rules.  The
>>> big problem with Beam's current style is that it is so esoteric that
>>> autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which
>>> I don't really have a problem with -- the problem is the use of either 2 or
>>> 4 spaces depending on context (expression start vs hanging indent, etc).
>>> This is my *biggest* gripe about the current style.  PyCharm doesn't have
>>> enough control either.  So, if we can choose a style that can be expressed
>>> by flake8 or pycodestyle then we can use autopep8 to enforce it.
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having
>>> a little wiggle room to influence the style, but on a big project like Beam
>>> all that wiggle room ends up to minor but noticeable inconsistencies in
>>> style throughout the project.  yapf ensures completely consistent style,
>>> but the tradeoff is that it's sometimes ugly, especially in scenarios with
>>> similar repeated entries like argparse, where yapf might insert line breaks
>>> in visually inconsistent and unappealing ways depending on the lengths of
>>> the keywords and expressions involved.
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I
>>> think it'd be a nice addition to setup a pre-commit [1] config so that
>>> people can opt in to running *lightweight* autofixers prior to commit.
>>> This will not only reduce dev frustration but will also reduce the amount
>>> of cpu cycles that Jenkins spends pointing out lint errors.
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > -chad
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> >
>>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
>>> iemejia@gmail.com> wrote:
>>> >> > >>>> >>>>>>>>> >>
>>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to be
>>> much progress into autoformatting.
>>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could
>>> be more appropriate for Beam's use case.
>>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
>>> >> > >>>> >>>>>>>>> >> WDYT?
>>> >> > >>>> >>>>>>>>> >>
>>> >> > >>>> >>>>>>>>> >>
>>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>>> lgajowy@apache.org> wrote:
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that
>>> does the job. My experience is that since spotless in Java SDK I would
>>> never start a new Java project without it. So many great benefits not only
>>> for one person coding but for all community.
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily
>>> browse past the reformat. It is not actually that hard, but does take a
>>> couple extra clicks to get GitHub to display blame before a reformat. It is
>>> easier with the command line. I do a lot of code history digging and the
>>> global Java reformat is not really a problem.
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I
>>> agree it's not the best way to search the history. The most convenient and
>>> clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" ->
>>> click again -> "annotate previous revision" -> ...
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the diff
>>> between two revisions.
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>> Łukasz
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>>
>>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
>>> kenn@apache.org> napisał(a):
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite
>>> tell - seems there are some issues?)
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna
>>> Kucharczyk <ka...@gmail.com> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster
>>> PR review iteration. We will skip some of conversations about code style.
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>> ...
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be
>>> less discouraged. When I started contribute I didn’t know how to format my
>>> code and I lost a lot of time to add pylint and adjust IntelliJ. I
>>> eventually failed. Currently I write code intuitively and when I don’t
>>> forget I rerun tox.
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported
>>> it so much for Java. It is a community benefit. You do not have to be a
>>> contributor to the Python SDK to support this. That is why I am writing
>>> here. Just eliminate all discussion of formatting. It doesn't really matter
>>> what the resulting format is, if it is not crazy to read. I strongly oppose
>>> maintaining a non-default format.
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The
>>> Java global reformat touched 50k lines. It does not really matter how big
>>> it is. Definitely do it all at once if you think the tool is good enough.
>>> And you should pin a version, so churn is not a problem. You can upgrade
>>> the version and reformat in a PR later and that is also easy.
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily
>>> browse past the reformat. It is not actually that hard, but does take a
>>> couple extra clicks to get GitHub to display blame before a reformat. It is
>>> easier with the command line. I do a lot of code history digging and the
>>> global Java reformat is not really a problem.
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>> Kenn
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same
>>> way, so eventually it would be easier to read.
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous
>>> emails - a lot of Jenkins failures won’t take place, so we save time and
>>> resources.
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has
>>> custom syntax and after formatting they looks a little bit weird, but maybe
>>> extending the only configurable option in Black - lines, from 88 to 110
>>> would be solution.
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to
>>> be run. I don’t know how big obstacle it would be.
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it would
>>> be possible to introduce Black. First: just do it, it will hurt but then it
>>> would be ok (same as a dentist appointment). Of course it may require some
>>> work to adjust linters. On the other hand we can do it gradually and start
>>> including sdk parts one by one - maybe it will be less painful?
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects [2]
>>> I know that uses Black (they use also other cool checkers and pre-commit
>>> [3]). This is how looks their build with all checks [4].
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve
>>> our coding experience, we should improve our toolset. Black seems be recent
>>> and quite popular tool what makes think they won’t stop developing it.
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> [1]
>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> [2]
>>> https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> [4]
>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>
>>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert
>>> Bradshaw <ro...@google.com> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter
>>> to me, as it would change nearly every single line in the codebase (and the
>>> loss of all context as well as that particular line).
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists.
>>> However, we don't conform to that either--we use 2 spaces for indentation,
>>> but 4 for continuation indentation. (As for the history of this, this goes
>>> back to Google's internal style guide, probably motivated by consistency
>>> with C++, Java, ... and the fact that with an indent level of 4 one ends up
>>> wrapping lines quite frequently (it's telling that black's default line
>>> length is 88)). This turns out to be an easy change to the codebase.
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace
>>> thing, I found that this tool introduces a huge amount of vertical
>>> whitespace (e.g. closing parentheses on their own line), e.g.
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> def foo(
>>> >> > >>>> >>>>>>>>> >>>>>>     args
>>> >> > >>>> >>>>>>>>> >>>>>> ):
>>> >> > >>>> >>>>>>>>> >>>>>>   if (
>>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
>>> >> > >>>> >>>>>>>>> >>>>>>   ):
>>> >> > >>>> >>>>>>>>> >>>>>>     func(
>>> >> > >>>> >>>>>>>>> >>>>>>         args
>>> >> > >>>> >>>>>>>>> >>>>>>     )
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put
>>> closing parentheses on the same lines, as well as omit the newline after
>>> "if (", and disabling formatting of strings, which reduce the churn in our
>>> codebase to 15k lines (adding about 4k) out of 200k total.
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> https://github.com/apache/beam/pull/8712/files
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in
>>> different ways then me, and doesn't understand the semantics of the code,
>>> but possibly something we could live with given the huge advantages of an
>>> autoformatter.
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but
>>> not require, autoformatting of changed lines.
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's
>>> got a decent number of contributors and in my book being in the python
>>> github project is a strong positive signal. But, due to the above issues, I
>>> think we'd have to maintain a fork. (The code is pretty lightweight, the 2
>>> vs. 4 space issue is a 2-line change, and the rest implemented as a
>>> post-processing step (for now, incomplete), so it'd be easy to stay in sync
>>> with upstream.)
>>> >> > >>>> >>>>>>>>> >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía
>>> <ie...@gmail.com> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>> >
>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>>> configured in a way to fit our
>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it
>>> is feasible to reformat the
>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>> >> > >>>> >>>>>>>>> >>>>>> >
>>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we
>>> actually do because Black is
>>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the standard
>>> python codestyle guidelines
>>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is
>>> what most projects in the
>>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
>>> >> > >>>> >>>>>>>>> >>>>>> >
>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
>>> access to the Git history. This
>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
>>> However, I have the feeling
>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with
>>> Python because the linter has
>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>> >> > >>>> >>>>>>>>> >>>>>> >
>>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there
>>> are always tradeoffs we have
>>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
>>> >> > >>>> >>>>>>>>> >>>>>> >
>>> >> > >>>> >>>>>>>>> >>>>>> >
>>> >> > >>>> >>>>>>>>> >>>>>> >
>>> >> > >>>> >>>>>>>>> >>>>>> >
>>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian
>>> Michels <mx...@apache.org> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>>> configured in a way to fit our
>>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it
>>> is feasible to reformat the
>>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick
>>> access to the Git history. This
>>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
>>> However, I have the feeling
>>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with
>>> Python because the linter has
>>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
>>> >> > >>>> >>>>>>>>> >>>>>> > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as
>>> beta with a big warning.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person
>>> project. For the same reason I also strongly prefer not using a fork for a
>>> specific setting. Fork will only have less people looking at it.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta
>>> because it is recent, it was
>>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because
>>> some things can change since
>>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky
>>> beasts, I think beta in that case is
>>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you
>>> look at the contribution page [1]
>>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less
>>> a single person project, there
>>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions
>>> since the project became
>>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted
>>> in the python organization
>>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the
>>> project continuity.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact
>>> that the main author seems to be
>>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the
>>> 2-spaces issue he can seem
>>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following
>>> pep8 style guide recommendations
>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do
>>> not follow the 4 spaces
>>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even
>>> Google's own Python style guide [4],
>>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to
>>> reconsider the current policy to
>>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python
>>> 2.7 compatibility on
>>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing
>>> errors which is positive given the
>>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left
>>> unchanged, 4 files failed to reformat.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:
>>>  _display_progress = print
>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>>>  file=sys.stderr)
>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>>>  print(traceback_string, file=sys.stderr)
>>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->'
>>> if pc == last_pc else '    ',
>>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for
>>> the project but well I am
>>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code
>>> base so I let you the python
>>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any
>>> case it seems like a good
>>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
>>> https://github.com/python/black/graphs/contributors
>>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>> >> > >>>> >>>>>>>>> >>>>>> > > >
>>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet
>>> Altay <al...@google.com> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I
>>> am in favor of autoformatters but I am not familiar with this one. My
>>> concerns are:
>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as
>>> beta with a big warning.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person
>>> project. For the same reason I also strongly prefer not using a fork for a
>>> specific setting. Fork will only have less people looking at it.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us.
>>> That said lint issues are real as pointed in the thread. If someone would
>>> like to give it a try and see how it would look like for us that would be
>>> interesting.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
>>> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of
>>> Jenkins jobs failures are caused by lint problems.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have
>>> something similar to Spotless in Java SDK (I heard there is problem with
>>> configuring Black with IntelliJ).
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM
>>> Robert Bradshaw <ro...@google.com> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
>>> autoformatters, though I haven't looked at
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works.
>>> We might have to go with
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>> https://github.com/desbma/black-2spaces given
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>> https://github.com/python/black/issues/378 .
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM
>>> Pablo Estrada <pa...@google.com> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at
>>> least a couple people (myself included) who've been annoyed by having to
>>> take care of lint issues that maybe a code formatter could save us.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM
>>> Ismaël Mejía <ie...@gmail.com> wrote:
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black
>>> [1] a python code auto formatter that
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
>>> auto-formatter for python, and wanted to
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest
>>> from the python people to get this
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for
>>> Java has been a good improvement and
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may
>>> benefit of this too.
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1]
>>> https://github.com/python/black
>>> >> > >>>> >>
>>> >> > >>>> >>
>>> >> > >>>> >>
>>> >> > >>>> >> --
>>> >> > >>>> >>
>>> >> > >>>> >> Michał Walenia
>>> >> > >>>> >> Polidea | Software Engineer
>>> >> > >>>> >>
>>> >> > >>>> >> M: +48 791 432 002
>>> >> > >>>> >> E: michal.walenia@polidea.com
>>> >> > >>>> >>
>>> >> > >>>> >> Unique Tech
>>> >> > >>>> >> Check out our projects!
>>>
>>
>
> --
>
> Michał Walenia
> Polidea <https://www.polidea.com/> | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.walenia@polidea.com
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Michał Walenia <mi...@polidea.com>.
Hi,
the PR is merged, all checks were green :)
Enjoy prettier Python!

On Thu, Feb 6, 2020 at 11:11 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Agree no need for vote for this because the consensus is clear and the sole
> impact I can think of are pending PRs that will be broken. In the Java case
> what we did was to just notice every PR that was affected by the change.
> And clearly document how to validate and autoformat the code.
>
> So the earlier the better, go go autoformat!
>
> On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> No, perhaps not. I agree there's consensus, just wondering what the
>> next steps should be to get this in. (The presubmits look like they're
>> all passing, with the exception of some breakage in java that should
>> be completely unrelated. Of course there's already merge conflicts...)
>>
>> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com> wrote:
>> >
>> > Do we need a formal vote? There is consensus on this thread and on the
>> PR.
>> >
>> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >>
>> >> The PR is looking good. Should we call a vote?
>> >>
>> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >> >
>> >> > Thanks. I commented on the PR. I think if we're going this route we
>> >> > should add a pre-commit, plus instructions on how to run the tool
>> >> > (similar to spotless).
>> >> >
>> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com> wrote:
>> >> > >
>> >> > > I've done a pass on the PR on code I'm familiar with.
>> >> > > Please make a pass and add your suggestions on the PR.
>> >> > >
>> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com>
>> wrote:
>> >> > >>
>> >> > >> Java build fails on any unformatted code so python probably
>> should be like that.
>> >> > >> We have to ensure however that it fails early on that.
>> >> > >> As Robert said time to debate the knobs :)
>> >> > >>
>> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
>> kamil.wasilewski@polidea.com> wrote:
>> >> > >>>
>> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684. Please
>> share your comments ;-) I've managed to reduce the impact a bit:
>> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>> >> > >>>
>> >> > >>> We still need to consider how to enforce the usage of
>> autoformatter. Pre-commit sounds like a nice addition, but it still needs
>> to be installed manually by a developer. On the other hand, Jenkins
>> precommit job that fails if any unformatted code is detected looks like too
>> strict. What do you think?
>> >> > >>>
>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >> > >>>>
>> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
>> >> > >>>>
>> >> > >>>> FYI, I did a simple run (just pushed to
>> >> > >>>> https://github.com/apache/beam/compare/master...robertwb:yapf)
>> to see
>> >> > >>>> the impact. The diff is
>> >> > >>>>
>> >> > >>>>     $ git diff --stat master
>> >> > >>>>     ...
>> >> > >>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
>> >> > >>>>
>> >> > >>>> For reference
>> >> > >>>>
>> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>> >> > >>>>     ...
>> >> > >>>>     200424  612002 7431637 total
>> >> > >>>>
>> >> > >>>> which means a little over 10% of lines get touched. I think
>> there are
>> >> > >>>> some options, such as
>> SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>> >> > >>>> COALESCE_BRACKETS, that will conform more to the style we are
>> already
>> >> > >>>> (mostly) following.
>> >> > >>>>
>> >> > >>>>
>> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>> >> > >>>> <ka...@polidea.com> wrote:
>> >> > >>>> >
>> >> > >>>> > Thank you Michał for creating the ticket. I have some free
>> time and I'd like to volunteer myself for this task.
>> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so I'll
>> try `yapf` first.
>> >> > >>>> >
>> >> > >>>> > Best,
>> >> > >>>> > Kamil
>> >> > >>>> >
>> >> > >>>> >
>> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
>> michal.walenia@polidea.com> wrote:
>> >> > >>>> >>
>> >> > >>>> >> Hi all,
>> >> > >>>> >> I created a JIRA issue for this and summarized the available
>> tools
>> >> > >>>> >>
>> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>> >> > >>>> >>
>> >> > >>>> >> Cheers,
>> >> > >>>> >> Michal
>> >> > >>>> >>
>> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com>
>> wrote:
>> >> > >>>> >>>
>> >> > >>>> >>> Sorry, backing off on this due to time constraints.
>> >> > >>>> >>>
>> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com>
>> wrote:
>> >> > >>>> >>>>
>> >> > >>>> >>>> It sounds like there's a consensus for yapf. I volunteer
>> to take this on
>> >> > >>>> >>>>
>> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com>
>> wrote:
>> >> > >>>> >>>>>
>> >> > >>>> >>>>> +1 to autoformatting
>> >> > >>>> >>>>>
>> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
>> lcwik@google.com> wrote:
>> >> > >>>> >>>>>>
>> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went
>> through a one time pass to apply the spotless formatting.
>> >> > >>>> >>>>>>
>> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
>> altay@google.com> wrote:
>> >> > >>>> >>>>>>>
>> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well
>> maintained project. I do support making a one time pass to apply formatting
>> the whole code base.
>> >> > >>>> >>>>>>>
>> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
>> chadrik@gmail.com> wrote:
>> >> > >>>> >>>>>>>>>
>> >> > >>>> >>>>>>>>>
>> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to
>> violating (or at
>> >> > >>>> >>>>>>>>> least changed) lines.
>> >> > >>>> >>>>>>>>
>> >> > >>>> >>>>>>>>
>> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all of
>> the code in one go, since it’s a very safe operation. Did you have
>> something else in mind?
>> >> > >>>> >>>>>>>>
>> >> > >>>> >>>>>>>> -chad
>> >> > >>>> >>>>>>>>
>> >> > >>>> >>>>>>>>
>> >> > >>>> >>>>>>>>
>> >> > >>>> >>>>>>>>
>> >> > >>>> >>>>>>>>>
>> >> > >>>> >>>>>>>>>
>> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
>> chadrik@gmail.com> wrote:
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > +1 to autoformatting
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
>> formatters:  those which take the original formatting into consideration
>> (autopep8) and those which disregard it (yapf, black).
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have
>> plenty of options to tweak with yapf (enough to make the output a pretty
>> close match to the current Beam style), and you can mark areas to preserve
>> the original formatting, which could be very useful with Pipeline building
>> with pipe operators.  Please don't pick black.
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in
>> Java -- it only corrects code that breaks the project's style rules.  The
>> big problem with Beam's current style is that it is so esoteric that
>> autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which
>> I don't really have a problem with -- the problem is the use of either 2 or
>> 4 spaces depending on context (expression start vs hanging indent, etc).
>> This is my *biggest* gripe about the current style.  PyCharm doesn't have
>> enough control either.  So, if we can choose a style that can be expressed
>> by flake8 or pycodestyle then we can use autopep8 to enforce it.
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a
>> little wiggle room to influence the style, but on a big project like Beam
>> all that wiggle room ends up to minor but noticeable inconsistencies in
>> style throughout the project.  yapf ensures completely consistent style,
>> but the tradeoff is that it's sometimes ugly, especially in scenarios with
>> similar repeated entries like argparse, where yapf might insert line breaks
>> in visually inconsistent and unappealing ways depending on the lengths of
>> the keywords and expressions involved.
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I
>> think it'd be a nice addition to setup a pre-commit [1] config so that
>> people can opt in to running *lightweight* autofixers prior to commit.
>> This will not only reduce dev frustration but will also reduce the amount
>> of cpu cycles that Jenkins spends pointing out lint errors.
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > -chad
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> >
>> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
>> iemejia@gmail.com> wrote:
>> >> > >>>> >>>>>>>>> >>
>> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to be
>> much progress into autoformatting.
>> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could
>> be more appropriate for Beam's use case.
>> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
>> >> > >>>> >>>>>>>>> >> WDYT?
>> >> > >>>> >>>>>>>>> >>
>> >> > >>>> >>>>>>>>> >>
>> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>> lgajowy@apache.org> wrote:
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does
>> the job. My experience is that since spotless in Java SDK I would never
>> start a new Java project without it. So many great benefits not only for
>> one person coding but for all community.
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily
>> browse past the reformat. It is not actually that hard, but does take a
>> couple extra clicks to get GitHub to display blame before a reformat. It is
>> easier with the command line. I do a lot of code history digging and the
>> global Java reformat is not really a problem.
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I
>> agree it's not the best way to search the history. The most convenient and
>> clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" -> click
>> again -> "annotate previous revision" -> ...
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the diff
>> between two revisions.
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>> Łukasz
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>>
>> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
>> kenn@apache.org> napisał(a):
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite
>> tell - seems there are some issues?)
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna
>> Kucharczyk <ka...@gmail.com> wrote:
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster
>> PR review iteration. We will skip some of conversations about code style.
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>> ...
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be less
>> discouraged. When I started contribute I didn’t know how to format my code
>> and I lost a lot of time to add pylint and adjust IntelliJ. I eventually
>> failed. Currently I write code intuitively and when I don’t forget I rerun
>> tox.
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported
>> it so much for Java. It is a community benefit. You do not have to be a
>> contributor to the Python SDK to support this. That is why I am writing
>> here. Just eliminate all discussion of formatting. It doesn't really matter
>> what the resulting format is, if it is not crazy to read. I strongly oppose
>> maintaining a non-default format.
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The
>> Java global reformat touched 50k lines. It does not really matter how big
>> it is. Definitely do it all at once if you think the tool is good enough.
>> And you should pin a version, so churn is not a problem. You can upgrade
>> the version and reformat in a PR later and that is also easy.
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily
>> browse past the reformat. It is not actually that hard, but does take a
>> couple extra clicks to get GitHub to display blame before a reformat. It is
>> easier with the command line. I do a lot of code history digging and the
>> global Java reformat is not really a problem.
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>> Kenn
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same
>> way, so eventually it would be easier to read.
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous
>> emails - a lot of Jenkins failures won’t take place, so we save time and
>> resources.
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has
>> custom syntax and after formatting they looks a little bit weird, but maybe
>> extending the only configurable option in Black - lines, from 88 to 110
>> would be solution.
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to
>> be run. I don’t know how big obstacle it would be.
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it would be
>> possible to introduce Black. First: just do it, it will hurt but then it
>> would be ok (same as a dentist appointment). Of course it may require some
>> work to adjust linters. On the other hand we can do it gradually and start
>> including sdk parts one by one - maybe it will be less painful?
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I
>> know that uses Black (they use also other cool checkers and pre-commit
>> [3]). This is how looks their build with all checks [4].
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our
>> coding experience, we should improve our toolset. Black seems be recent and
>> quite popular tool what makes think they won’t stop developing it.
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> [1]
>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> [2]
>> https://github.com/GoogleCloudPlatform/oozie-to-airflow
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> [4]
>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>>
>> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw
>> <ro...@google.com> wrote:
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter
>> to me, as it would change nearly every single line in the codebase (and the
>> loss of all context as well as that particular line).
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists.
>> However, we don't conform to that either--we use 2 spaces for indentation,
>> but 4 for continuation indentation. (As for the history of this, this goes
>> back to Google's internal style guide, probably motivated by consistency
>> with C++, Java, ... and the fact that with an indent level of 4 one ends up
>> wrapping lines quite frequently (it's telling that black's default line
>> length is 88)). This turns out to be an easy change to the codebase.
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace
>> thing, I found that this tool introduces a huge amount of vertical
>> whitespace (e.g. closing parentheses on their own line), e.g.
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> def foo(
>> >> > >>>> >>>>>>>>> >>>>>>     args
>> >> > >>>> >>>>>>>>> >>>>>> ):
>> >> > >>>> >>>>>>>>> >>>>>>   if (
>> >> > >>>> >>>>>>>>> >>>>>>       long expression)
>> >> > >>>> >>>>>>>>> >>>>>>   ):
>> >> > >>>> >>>>>>>>> >>>>>>     func(
>> >> > >>>> >>>>>>>>> >>>>>>         args
>> >> > >>>> >>>>>>>>> >>>>>>     )
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing
>> parentheses on the same lines, as well as omit the newline after "if (",
>> and disabling formatting of strings, which reduce the churn in our codebase
>> to 15k lines (adding about 4k) out of 200k total.
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in
>> different ways then me, and doesn't understand the semantics of the code,
>> but possibly something we could live with given the huge advantages of an
>> autoformatter.
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but
>> not require, autoformatting of changed lines.
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's
>> got a decent number of contributors and in my book being in the python
>> github project is a strong positive signal. But, due to the above issues, I
>> think we'd have to maintain a fork. (The code is pretty lightweight, the 2
>> vs. 4 space issue is a 2-line change, and the rest implemented as a
>> post-processing step (for now, incomplete), so it'd be easy to stay in sync
>> with upstream.)
>> >> > >>>> >>>>>>>>> >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>> iemejia@gmail.com> wrote:
>> >> > >>>> >>>>>>>>> >>>>>> >
>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>> configured in a way to fit our
>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it
>> is feasible to reformat the
>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>> >> > >>>> >>>>>>>>> >>>>>> >
>> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we
>> actually do because Black is
>> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the standard
>> python codestyle guidelines
>> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is
>> what most projects in the
>> >> > >>>> >>>>>>>>> >>>>>> > python world use.
>> >> > >>>> >>>>>>>>> >>>>>> >
>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access
>> to the Git history. This
>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
>> However, I have the feeling
>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with
>> Python because the linter has
>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>> >> > >>>> >>>>>>>>> >>>>>> >
>> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are
>> always tradeoffs we have
>> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
>> >> > >>>> >>>>>>>>> >>>>>> >
>> >> > >>>> >>>>>>>>> >>>>>> >
>> >> > >>>> >>>>>>>>> >>>>>> >
>> >> > >>>> >>>>>>>>> >>>>>> >
>> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian
>> Michels <mx...@apache.org> wrote:
>> >> > >>>> >>>>>>>>> >>>>>> > >
>> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
>> configured in a way to fit our
>> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it
>> is feasible to reformat the
>> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>> >> > >>>> >>>>>>>>> >>>>>> > >
>> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access
>> to the Git history. This
>> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
>> However, I have the feeling
>> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with
>> Python because the linter has
>> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>> >> > >>>> >>>>>>>>> >>>>>> > >
>> >> > >>>> >>>>>>>>> >>>>>> > > -Max
>> >> > >>>> >>>>>>>>> >>>>>> > >
>> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta
>> with a big warning.
>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person
>> project. For the same reason I also strongly prefer not using a fork for a
>> specific setting. Fork will only have less people looking at it.
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta
>> because it is recent, it was
>> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because
>> some things can change since
>> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky
>> beasts, I think beta in that case is
>> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you
>> look at the contribution page [1]
>> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less
>> a single person project, there
>> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions
>> since the project became
>> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted
>> in the python organization
>> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the
>> project continuity.
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact
>> that the main author seems to be
>> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the
>> 2-spaces issue he can seem
>> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8
>> style guide recommendations
>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do
>> not follow the 4 spaces
>> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's
>> own Python style guide [4],
>> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to
>> reconsider the current policy to
>> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python
>> 2.7 compatibility on
>> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing
>> errors which is positive given the
>> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left
>> unchanged, 4 files failed to reformat.
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress
>> = print
>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>>  file=sys.stderr)
>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>>  print(traceback_string, file=sys.stderr)
>> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->'
>> if pc == last_pc else '    ',
>> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for
>> the project but well I am
>> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code
>> base so I let you the python
>> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any
>> case it seems like a good
>> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
>> https://github.com/python/black/graphs/contributors
>> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
>> https://www.python.org/dev/peps/pep-0008/#indentation
>> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>> >> > >>>> >>>>>>>>> >>>>>> > > >
>> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet
>> Altay <al...@google.com> wrote:
>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am
>> in favor of autoformatters but I am not familiar with this one. My concerns
>> are:
>> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta
>> with a big warning.
>> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person
>> project. For the same reason I also strongly prefer not using a fork for a
>> specific setting. Fork will only have less people looking at it.
>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us.
>> That said lint issues are real as pointed in the thread. If someone would
>> like to give it a try and see how it would look like for us that would be
>> interesting.
>> >> > >>>> >>>>>>>>> >>>>>> > > >>
>> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
>> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of
>> Jenkins jobs failures are caused by lint problems.
>> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have
>> something similar to Spotless in Java SDK (I heard there is problem with
>> configuring Black with IntelliJ).
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM
>> Robert Bradshaw <ro...@google.com> wrote:
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
>> autoformatters, though I haven't looked at
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works.
>> We might have to go with
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> https://github.com/desbma/black-2spaces given
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> https://github.com/python/black/issues/378 .
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM
>> Pablo Estrada <pa...@google.com> wrote:
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at
>> least a couple people (myself included) who've been annoyed by having to
>> take care of lint issues that maybe a code formatter could save us.
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM
>> Ismaël Mejía <ie...@gmail.com> wrote:
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1]
>> a python code auto formatter that
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
>> auto-formatter for python, and wanted to
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest
>> from the python people to get this
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for
>> Java has been a good improvement and
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may
>> benefit of this too.
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>> >> > >>>> >>
>> >> > >>>> >>
>> >> > >>>> >>
>> >> > >>>> >> --
>> >> > >>>> >>
>> >> > >>>> >> Michał Walenia
>> >> > >>>> >> Polidea | Software Engineer
>> >> > >>>> >>
>> >> > >>>> >> M: +48 791 432 002
>> >> > >>>> >> E: michal.walenia@polidea.com
>> >> > >>>> >>
>> >> > >>>> >> Unique Tech
>> >> > >>>> >> Check out our projects!
>>
>

-- 

Michał Walenia
Polidea <https://www.polidea.com/> | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.walenia@polidea.com

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Re: [DISCUSS] Autoformat python code with Black

Posted by Ismaël Mejía <ie...@gmail.com>.
Agree no need for vote for this because the consensus is clear and the sole
impact I can think of are pending PRs that will be broken. In the Java case
what we did was to just notice every PR that was affected by the change.
And clearly document how to validate and autoformat the code.

So the earlier the better, go go autoformat!

On Thu, Feb 6, 2020 at 1:38 AM Robert Bradshaw <ro...@google.com> wrote:

> No, perhaps not. I agree there's consensus, just wondering what the
> next steps should be to get this in. (The presubmits look like they're
> all passing, with the exception of some breakage in java that should
> be completely unrelated. Of course there's already merge conflicts...)
>
> On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com> wrote:
> >
> > Do we need a formal vote? There is consensus on this thread and on the
> PR.
> >
> > On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> The PR is looking good. Should we call a vote?
> >>
> >> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <ro...@google.com>
> wrote:
> >> >
> >> > Thanks. I commented on the PR. I think if we're going this route we
> >> > should add a pre-commit, plus instructions on how to run the tool
> >> > (similar to spotless).
> >> >
> >> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com> wrote:
> >> > >
> >> > > I've done a pass on the PR on code I'm familiar with.
> >> > > Please make a pass and add your suggestions on the PR.
> >> > >
> >> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> >> > >>
> >> > >> Java build fails on any unformatted code so python probably should
> be like that.
> >> > >> We have to ensure however that it fails early on that.
> >> > >> As Robert said time to debate the knobs :)
> >> > >>
> >> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
> >> > >>>
> >> > >>> PR is ready: https://github.com/apache/beam/pull/10684. Please
> share your comments ;-) I've managed to reduce the impact a bit:
> >> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
> >> > >>>
> >> > >>> We still need to consider how to enforce the usage of
> autoformatter. Pre-commit sounds like a nice addition, but it still needs
> to be installed manually by a developer. On the other hand, Jenkins
> precommit job that fails if any unformatted code is detected looks like too
> strict. What do you think?
> >> > >>>
> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >> > >>>>
> >> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
> >> > >>>>
> >> > >>>> FYI, I did a simple run (just pushed to
> >> > >>>> https://github.com/apache/beam/compare/master...robertwb:yapf)
> to see
> >> > >>>> the impact. The diff is
> >> > >>>>
> >> > >>>>     $ git diff --stat master
> >> > >>>>     ...
> >> > >>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
> >> > >>>>
> >> > >>>> For reference
> >> > >>>>
> >> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
> >> > >>>>     ...
> >> > >>>>     200424  612002 7431637 total
> >> > >>>>
> >> > >>>> which means a little over 10% of lines get touched. I think
> there are
> >> > >>>> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES
> and
> >> > >>>> COALESCE_BRACKETS, that will conform more to the style we are
> already
> >> > >>>> (mostly) following.
> >> > >>>>
> >> > >>>>
> >> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
> >> > >>>> <ka...@polidea.com> wrote:
> >> > >>>> >
> >> > >>>> > Thank you Michał for creating the ticket. I have some free
> time and I'd like to volunteer myself for this task.
> >> > >>>> > Indeed, it looks like there's consensus for `yapf`, so I'll
> try `yapf` first.
> >> > >>>> >
> >> > >>>> > Best,
> >> > >>>> > Kamil
> >> > >>>> >
> >> > >>>> >
> >> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
> michal.walenia@polidea.com> wrote:
> >> > >>>> >>
> >> > >>>> >> Hi all,
> >> > >>>> >> I created a JIRA issue for this and summarized the available
> tools
> >> > >>>> >>
> >> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
> >> > >>>> >>
> >> > >>>> >> Cheers,
> >> > >>>> >> Michal
> >> > >>>> >>
> >> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com>
> wrote:
> >> > >>>> >>>
> >> > >>>> >>> Sorry, backing off on this due to time constraints.
> >> > >>>> >>>
> >> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com>
> wrote:
> >> > >>>> >>>>
> >> > >>>> >>>> It sounds like there's a consensus for yapf. I volunteer to
> take this on
> >> > >>>> >>>>
> >> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com>
> wrote:
> >> > >>>> >>>>>
> >> > >>>> >>>>> +1 to autoformatting
> >> > >>>> >>>>>
> >> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <
> lcwik@google.com> wrote:
> >> > >>>> >>>>>>
> >> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through
> a one time pass to apply the spotless formatting.
> >> > >>>> >>>>>>
> >> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
> altay@google.com> wrote:
> >> > >>>> >>>>>>>
> >> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well
> maintained project. I do support making a one time pass to apply formatting
> the whole code base.
> >> > >>>> >>>>>>>
> >> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
> chadrik@gmail.com> wrote:
> >> > >>>> >>>>>>>>>
> >> > >>>> >>>>>>>>>
> >> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to
> violating (or at
> >> > >>>> >>>>>>>>> least changed) lines.
> >> > >>>> >>>>>>>>
> >> > >>>> >>>>>>>>
> >> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all of the
> code in one go, since it’s a very safe operation. Did you have something
> else in mind?
> >> > >>>> >>>>>>>>
> >> > >>>> >>>>>>>> -chad
> >> > >>>> >>>>>>>>
> >> > >>>> >>>>>>>>
> >> > >>>> >>>>>>>>
> >> > >>>> >>>>>>>>
> >> > >>>> >>>>>>>>>
> >> > >>>> >>>>>>>>>
> >> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
> chadrik@gmail.com> wrote:
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > +1 to autoformatting
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > Let me add some nuance to that.
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of
> formatters:  those which take the original formatting into consideration
> (autopep8) and those which disregard it (yapf, black).
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have plenty
> of options to tweak with yapf (enough to make the output a pretty close
> match to the current Beam style), and you can mark areas to preserve the
> original formatting, which could be very useful with Pipeline building with
> pipe operators.  Please don't pick black.
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in Java
> -- it only corrects code that breaks the project's style rules.  The big
> problem with Beam's current style is that it is so esoteric that autopep8
> can't enforce it -- and I'm not just talking about 2-spaces, which I don't
> really have a problem with -- the problem is the use of either 2 or 4
> spaces depending on context (expression start vs hanging indent, etc).
> This is my *biggest* gripe about the current style.  PyCharm doesn't have
> enough control either.  So, if we can choose a style that can be expressed
> by flake8 or pycodestyle then we can use autopep8 to enforce it.
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a
> little wiggle room to influence the style, but on a big project like Beam
> all that wiggle room ends up to minor but noticeable inconsistencies in
> style throughout the project.  yapf ensures completely consistent style,
> but the tradeoff is that it's sometimes ugly, especially in scenarios with
> similar repeated entries like argparse, where yapf might insert line breaks
> in visually inconsistent and unappealing ways depending on the lengths of
> the keywords and expressions involved.
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I
> think it'd be a nice addition to setup a pre-commit [1] config so that
> people can opt in to running *lightweight* autofixers prior to commit.
> This will not only reduce dev frustration but will also reduce the amount
> of cpu cycles that Jenkins spends pointing out lint errors.
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > -chad
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> >
> >> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> >> > >>>> >>>>>>>>> >>
> >> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to be
> much progress into autoformatting.
> >> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could
> be more appropriate for Beam's use case.
> >> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
> >> > >>>> >>>>>>>>> >> WDYT?
> >> > >>>> >>>>>>>>> >>
> >> > >>>> >>>>>>>>> >>
> >> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
> lgajowy@apache.org> wrote:
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does
> the job. My experience is that since spotless in Java SDK I would never
> start a new Java project without it. So many great benefits not only for
> one person coding but for all community.
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily
> browse past the reformat. It is not actually that hard, but does take a
> couple extra clicks to get GitHub to display blame before a reformat. It is
> easier with the command line. I do a lot of code history digging and the
> global Java reformat is not really a problem.
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I agree
> it's not the best way to search the history. The most convenient and clear
> one I've found so far is in Jetbrains IDEs (Intelij) where you can:
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" -> click
> again -> "annotate previous revision" -> ...
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the diff
> between two revisions.
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>> Łukasz
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>>
> >> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
> kenn@apache.org> napisał(a):
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite
> tell - seems there are some issues?)
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna
> Kucharczyk <ka...@gmail.com> wrote:
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster
> PR review iteration. We will skip some of conversations about code style.
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>> ...
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be less
> discouraged. When I started contribute I didn’t know how to format my code
> and I lost a lot of time to add pylint and adjust IntelliJ. I eventually
> failed. Currently I write code intuitively and when I don’t forget I rerun
> tox.
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported
> it so much for Java. It is a community benefit. You do not have to be a
> contributor to the Python SDK to support this. That is why I am writing
> here. Just eliminate all discussion of formatting. It doesn't really matter
> what the resulting format is, if it is not crazy to read. I strongly oppose
> maintaining a non-default format.
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The
> Java global reformat touched 50k lines. It does not really matter how big
> it is. Definitely do it all at once if you think the tool is good enough.
> And you should pin a version, so churn is not a problem. You can upgrade
> the version and reformat in a PR later and that is also easy.
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily
> browse past the reformat. It is not actually that hard, but does take a
> couple extra clicks to get GitHub to display blame before a reformat. It is
> easier with the command line. I do a lot of code history digging and the
> global Java reformat is not really a problem.
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>> Kenn
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same way,
> so eventually it would be easier to read.
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails
> - a lot of Jenkins failures won’t take place, so we save time and resources.
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has
> custom syntax and after formatting they looks a little bit weird, but maybe
> extending the only configurable option in Black - lines, from 88 to 110
> would be solution.
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be
> run. I don’t know how big obstacle it would be.
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> I believe there are two options how it would be
> possible to introduce Black. First: just do it, it will hurt but then it
> would be ok (same as a dentist appointment). Of course it may require some
> work to adjust linters. On the other hand we can do it gradually and start
> including sdk parts one by one - maybe it will be less painful?
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I
> know that uses Black (they use also other cool checkers and pre-commit
> [3]). This is how looks their build with all checks [4].
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our
> coding experience, we should improve our toolset. Black seems be recent and
> quite popular tool what makes think they won’t stop developing it.
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> [1]
> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> [2]
> https://github.com/GoogleCloudPlatform/oozie-to-airflow
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> [4]
> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>>
> >> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to
> me, as it would change nearly every single line in the codebase (and the
> loss of all context as well as that particular line).
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists.
> However, we don't conform to that either--we use 2 spaces for indentation,
> but 4 for continuation indentation. (As for the history of this, this goes
> back to Google's internal style guide, probably motivated by consistency
> with C++, Java, ... and the fact that with an indent level of 4 one ends up
> wrapping lines quite frequently (it's telling that black's default line
> length is 88)). This turns out to be an easy change to the codebase.
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace
> thing, I found that this tool introduces a huge amount of vertical
> whitespace (e.g. closing parentheses on their own line), e.g.
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> def foo(
> >> > >>>> >>>>>>>>> >>>>>>     args
> >> > >>>> >>>>>>>>> >>>>>> ):
> >> > >>>> >>>>>>>>> >>>>>>   if (
> >> > >>>> >>>>>>>>> >>>>>>       long expression)
> >> > >>>> >>>>>>>>> >>>>>>   ):
> >> > >>>> >>>>>>>>> >>>>>>     func(
> >> > >>>> >>>>>>>>> >>>>>>         args
> >> > >>>> >>>>>>>>> >>>>>>     )
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing
> parentheses on the same lines, as well as omit the newline after "if (",
> and disabling formatting of strings, which reduce the churn in our codebase
> to 15k lines (adding about 4k) out of 200k total.
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in different
> ways then me, and doesn't understand the semantics of the code, but
> possibly something we could live with given the huge advantages of an
> autoformatter.
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but
> not require, autoformatting of changed lines.
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's
> got a decent number of contributors and in my book being in the python
> github project is a strong positive signal. But, due to the above issues, I
> think we'd have to maintain a fork. (The code is pretty lightweight, the 2
> vs. 4 space issue is a 2-line change, and the rest implemented as a
> post-processing step (for now, incomplete), so it'd be easy to stay in sync
> with upstream.)
> >> > >>>> >>>>>>>>> >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> >> > >>>> >>>>>>>>> >>>>>> >
> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
> configured in a way to fit our
> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is
> feasible to reformat the
> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> >> > >>>> >>>>>>>>> >>>>>> >
> >> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we
> actually do because Black is
> >> > >>>> >>>>>>>>> >>>>>> > configurable only to support the standard
> python codestyle guidelines
> >> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what
> most projects in the
> >> > >>>> >>>>>>>>> >>>>>> > python world use.
> >> > >>>> >>>>>>>>> >>>>>> >
> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access
> to the Git history. This
> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
> However, I have the feeling
> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with
> Python because the linter has
> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >> > >>>> >>>>>>>>> >>>>>> >
> >> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are
> always tradeoffs we have
> >> > >>>> >>>>>>>>> >>>>>> > to deal with.
> >> > >>>> >>>>>>>>> >>>>>> >
> >> > >>>> >>>>>>>>> >>>>>> >
> >> > >>>> >>>>>>>>> >>>>>> >
> >> > >>>> >>>>>>>>> >>>>>> >
> >> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian
> Michels <mx...@apache.org> wrote:
> >> > >>>> >>>>>>>>> >>>>>> > >
> >> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
> configured in a way to fit our
> >> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is
> feasible to reformat the
> >> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> >> > >>>> >>>>>>>>> >>>>>> > >
> >> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access
> to the Git history. This
> >> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
> However, I have the feeling
> >> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with
> Python because the linter has
> >> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >> > >>>> >>>>>>>>> >>>>>> > >
> >> > >>>> >>>>>>>>> >>>>>> > > -Max
> >> > >>>> >>>>>>>>> >>>>>> > >
> >> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
> >> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta
> with a big warning.
> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person
> project. For the same reason I also strongly prefer not using a fork for a
> specific setting. Fork will only have less people looking at it.
> >> > >>>> >>>>>>>>> >>>>>> > > >
> >> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta
> because it is recent, it was
> >> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because
> some things can change since
> >> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts,
> I think beta in that case is
> >> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look
> at the contribution page [1]
> >> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a
> single person project, there
> >> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions
> since the project became
> >> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in
> the python organization
> >> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the
> project continuity.
> >> > >>>> >>>>>>>>> >>>>>> > > >
> >> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact that
> the main author seems to be
> >> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the
> 2-spaces issue he can seem
> >> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8
> style guide recommendations
> >> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not
> follow the 4 spaces
> >> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's
> own Python style guide [4],
> >> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to
> reconsider the current policy to
> >> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
> >> > >>>> >>>>>>>>> >>>>>> > > >
> >> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python
> 2.7 compatibility on
> >> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors
> which is positive given the
> >> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
> >> > >>>> >>>>>>>>> >>>>>> > > >
> >> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left
> unchanged, 4 files failed to reformat.
> >> > >>>> >>>>>>>>> >>>>>> > > >
> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress
> = print
> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>  file=sys.stderr)
> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>  print(traceback_string, file=sys.stderr)
> >> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
> >> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->'
> if pc == last_pc else '    ',
> >> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
> >> > >>>> >>>>>>>>> >>>>>> > > >
> >> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for
> the project but well I am
> >> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code
> base so I let you the python
> >> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any
> case it seems like a good
> >> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
> >> > >>>> >>>>>>>>> >>>>>> > > >
> >> > >>>> >>>>>>>>> >>>>>> > > > [1]
> https://github.com/python/black/graphs/contributors
> >> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
> >> > >>>> >>>>>>>>> >>>>>> > > > [3]
> https://www.python.org/dev/peps/pep-0008/#indentation
> >> > >>>> >>>>>>>>> >>>>>> > > > [4]
> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
> >> > >>>> >>>>>>>>> >>>>>> > > >
> >> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet
> Altay <al...@google.com> wrote:
> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am
> in favor of autoformatters but I am not familiar with this one. My concerns
> are:
> >> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta
> with a big warning.
> >> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person
> project. For the same reason I also strongly prefer not using a fork for a
> specific setting. Fork will only have less people looking at it.
> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us.
> That said lint issues are real as pointed in the thread. If someone would
> like to give it a try and see how it would look like for us that would be
> interesting.
> >> > >>>> >>>>>>>>> >>>>>> > > >>
> >> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM
> Katarzyna Kucharczyk <ka...@gmail.com> wrote:
> >> > >>>> >>>>>>>>> >>>>>> > > >>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of
> Jenkins jobs failures are caused by lint problems.
> >> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have
> something similar to Spotless in Java SDK (I heard there is problem with
> configuring Black with IntelliJ).
> >> > >>>> >>>>>>>>> >>>>>> > > >>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert
> Bradshaw <ro...@google.com> wrote:
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of
> autoformatters, though I haven't looked at
> >> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We
> might have to go with
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> https://github.com/desbma/black-2spaces given
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> https://github.com/python/black/issues/378 .
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo
> Estrada <pa...@google.com> wrote:
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at
> least a couple people (myself included) who've been annoyed by having to
> take care of lint issues that maybe a code formatter could save us.
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël
> Mejía <ie...@gmail.com> wrote:
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1]
> a python code auto formatter that
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
> auto-formatter for python, and wanted to
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest
> from the python people to get this
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for
> Java has been a good improvement and
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may
> benefit of this too.
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
> >> > >>>> >>
> >> > >>>> >>
> >> > >>>> >>
> >> > >>>> >> --
> >> > >>>> >>
> >> > >>>> >> Michał Walenia
> >> > >>>> >> Polidea | Software Engineer
> >> > >>>> >>
> >> > >>>> >> M: +48 791 432 002
> >> > >>>> >> E: michal.walenia@polidea.com
> >> > >>>> >>
> >> > >>>> >> Unique Tech
> >> > >>>> >> Check out our projects!
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Robert Bradshaw <ro...@google.com>.
No, perhaps not. I agree there's consensus, just wondering what the
next steps should be to get this in. (The presubmits look like they're
all passing, with the exception of some breakage in java that should
be completely unrelated. Of course there's already merge conflicts...)

On Wed, Feb 5, 2020 at 3:55 PM Ahmet Altay <al...@google.com> wrote:
>
> Do we need a formal vote? There is consensus on this thread and on the PR.
>
> On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com> wrote:
>>
>> The PR is looking good. Should we call a vote?
>>
>> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <ro...@google.com> wrote:
>> >
>> > Thanks. I commented on the PR. I think if we're going this route we
>> > should add a pre-commit, plus instructions on how to run the tool
>> > (similar to spotless).
>> >
>> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com> wrote:
>> > >
>> > > I've done a pass on the PR on code I'm familiar with.
>> > > Please make a pass and add your suggestions on the PR.
>> > >
>> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com> wrote:
>> > >>
>> > >> Java build fails on any unformatted code so python probably should be like that.
>> > >> We have to ensure however that it fails early on that.
>> > >> As Robert said time to debate the knobs :)
>> > >>
>> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <ka...@polidea.com> wrote:
>> > >>>
>> > >>> PR is ready: https://github.com/apache/beam/pull/10684. Please share your comments ;-) I've managed to reduce the impact a bit:
>> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>> > >>>
>> > >>> We still need to consider how to enforce the usage of autoformatter. Pre-commit sounds like a nice addition, but it still needs to be installed manually by a developer. On the other hand, Jenkins precommit job that fails if any unformatted code is detected looks like too strict. What do you think?
>> > >>>
>> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <ro...@google.com> wrote:
>> > >>>>
>> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
>> > >>>>
>> > >>>> FYI, I did a simple run (just pushed to
>> > >>>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
>> > >>>> the impact. The diff is
>> > >>>>
>> > >>>>     $ git diff --stat master
>> > >>>>     ...
>> > >>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
>> > >>>>
>> > >>>> For reference
>> > >>>>
>> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>> > >>>>     ...
>> > >>>>     200424  612002 7431637 total
>> > >>>>
>> > >>>> which means a little over 10% of lines get touched. I think there are
>> > >>>> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>> > >>>> COALESCE_BRACKETS, that will conform more to the style we are already
>> > >>>> (mostly) following.
>> > >>>>
>> > >>>>
>> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>> > >>>> <ka...@polidea.com> wrote:
>> > >>>> >
>> > >>>> > Thank you Michał for creating the ticket. I have some free time and I'd like to volunteer myself for this task.
>> > >>>> > Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf` first.
>> > >>>> >
>> > >>>> > Best,
>> > >>>> > Kamil
>> > >>>> >
>> > >>>> >
>> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <mi...@polidea.com> wrote:
>> > >>>> >>
>> > >>>> >> Hi all,
>> > >>>> >> I created a JIRA issue for this and summarized the available tools
>> > >>>> >>
>> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>> > >>>> >>
>> > >>>> >> Cheers,
>> > >>>> >> Michal
>> > >>>> >>
>> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
>> > >>>> >>>
>> > >>>> >>> Sorry, backing off on this due to time constraints.
>> > >>>> >>>
>> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>> > >>>> >>>>
>> > >>>> >>>> It sounds like there's a consensus for yapf. I volunteer to take this on
>> > >>>> >>>>
>> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>> > >>>> >>>>>
>> > >>>> >>>>> +1 to autoformatting
>> > >>>> >>>>>
>> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>> > >>>> >>>>>>
>> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one time pass to apply the spotless formatting.
>> > >>>> >>>>>>
>> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>> > >>>> >>>>>>>
>> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained project. I do support making a one time pass to apply formatting the whole code base.
>> > >>>> >>>>>>>
>> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:
>> > >>>> >>>>>>>>>
>> > >>>> >>>>>>>>>
>> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to violating (or at
>> > >>>> >>>>>>>>> least changed) lines.
>> > >>>> >>>>>>>>
>> > >>>> >>>>>>>>
>> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all of the code in one go, since it’s a very safe operation. Did you have something else in mind?
>> > >>>> >>>>>>>>
>> > >>>> >>>>>>>> -chad
>> > >>>> >>>>>>>>
>> > >>>> >>>>>>>>
>> > >>>> >>>>>>>>
>> > >>>> >>>>>>>>
>> > >>>> >>>>>>>>>
>> > >>>> >>>>>>>>>
>> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > +1 to autoformatting
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > Let me add some nuance to that.
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of formatters:  those which take the original formatting into consideration (autopep8) and those which disregard it (yapf, black).
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have plenty of options to tweak with yapf (enough to make the output a pretty close match to the current Beam style), and you can mark areas to preserve the original formatting, which could be very useful with Pipeline building with pipe operators.  Please don't pick black.
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it only corrects code that breaks the project's style rules.  The big problem with Beam's current style is that it is so esoteric that autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which I don't really have a problem with -- the problem is the use of either 2 or 4 spaces depending on context (expression start vs hanging indent, etc).  This is my *biggest* gripe about the current style.  PyCharm doesn't have enough control either.  So, if we can choose a style that can be expressed by flake8 or pycodestyle then we can use autopep8 to enforce it.
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle room to influence the style, but on a big project like Beam all that wiggle room ends up to minor but noticeable inconsistencies in style throughout the project.  yapf ensures completely consistent style, but the tradeoff is that it's sometimes ugly, especially in scenarios with similar repeated entries like argparse, where yapf might insert line breaks in visually inconsistent and unappealing ways depending on the lengths of the keywords and expressions involved.
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be a nice addition to setup a pre-commit [1] config so that people can opt in to running *lightweight* autofixers prior to commit.  This will not only reduce dev frustration but will also reduce the amount of cpu cycles that Jenkins spends pointing out lint errors.
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > -chad
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> >
>> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:
>> > >>>> >>>>>>>>> >>
>> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to be much progress into autoformatting.
>> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could be more appropriate for Beam's use case.
>> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
>> > >>>> >>>>>>>>> >> WDYT?
>> > >>>> >>>>>>>>> >>
>> > >>>> >>>>>>>>> >>
>> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My experience is that since spotless in Java SDK I would never start a new Java project without it. So many great benefits not only for one person coding but for all community.
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I agree it's not the best way to search the history. The most convenient and clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" -> click again -> "annotate previous revision" -> ...
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the diff between two revisions.
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>> Łukasz
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>>
>> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there are some issues?)
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review iteration. We will skip some of conversations about code style.
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>> ...
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be less discouraged. When I started contribute I didn’t know how to format my code and I lost a lot of time to add pylint and adjust IntelliJ. I eventually failed. Currently I write code intuitively and when I don’t forget I rerun tox.
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much for Java. It is a community benefit. You do not have to be a contributor to the Python SDK to support this. That is why I am writing here. Just eliminate all discussion of formatting. It doesn't really matter what the resulting format is, if it is not crazy to read. I strongly oppose maintaining a non-default format.
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat touched 50k lines. It does not really matter how big it is. Definitely do it all at once if you think the tool is good enough. And you should pin a version, so churn is not a problem. You can upgrade the version and reformat in a PR later and that is also easy.
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>> Kenn
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same way, so eventually it would be easier to read.
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins failures won’t take place, so we save time and resources.
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and after formatting they looks a little bit weird, but maybe extending the only configurable option in Black - lines, from 88 to 110 would be solution.
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t know how big obstacle it would be.
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> I believe there are two options how it would be possible to introduce Black. First: just do it, it will hurt but then it would be ok (same as a dentist appointment). Of course it may require some work to adjust linters. On the other hand we can do it gradually and start including sdk parts one by one - maybe it will be less painful?
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I know that uses Black (they use also other cool checkers and pre-commit [3]). This is how looks their build with all checks [4].
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our coding experience, we should improve our toolset. Black seems be recent and quite popular tool what makes think they won’t stop developing it.
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> [1] https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> [4] https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>>
>> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com> wrote:
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change nearly every single line in the codebase (and the loss of all context as well as that particular line).
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't conform to that either--we use 2 spaces for indentation, but 4 for continuation indentation. (As for the history of this, this goes back to Google's internal style guide, probably motivated by consistency with C++, Java, ... and the fact that with an indent level of 4 one ends up wrapping lines quite frequently (it's telling that black's default line length is 88)). This turns out to be an easy change to the codebase.
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool introduces a huge amount of vertical whitespace (e.g. closing parentheses on their own line), e.g.
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> def foo(
>> > >>>> >>>>>>>>> >>>>>>     args
>> > >>>> >>>>>>>>> >>>>>> ):
>> > >>>> >>>>>>>>> >>>>>>   if (
>> > >>>> >>>>>>>>> >>>>>>       long expression)
>> > >>>> >>>>>>>>> >>>>>>   ):
>> > >>>> >>>>>>>>> >>>>>>     func(
>> > >>>> >>>>>>>>> >>>>>>         args
>> > >>>> >>>>>>>>> >>>>>>     )
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on the same lines, as well as omit the newline after "if (", and disabling formatting of strings, which reduce the churn in our codebase to 15k lines (adding about 4k) out of 200k total.
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in different ways then me, and doesn't understand the semantics of the code, but possibly something we could live with given the huge advantages of an autoformatter.
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but not require, autoformatting of changed lines.
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent number of contributors and in my book being in the python github project is a strong positive signal. But, due to the above issues, I think we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest implemented as a post-processing step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>> > >>>> >>>>>>>>> >>>>>>
>> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com> wrote:
>> > >>>> >>>>>>>>> >>>>>> >
>> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
>> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
>> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>> > >>>> >>>>>>>>> >>>>>> >
>> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we actually do because Black is
>> > >>>> >>>>>>>>> >>>>>> > configurable only to support the standard python codestyle guidelines
>> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>> > >>>> >>>>>>>>> >>>>>> > python world use.
>> > >>>> >>>>>>>>> >>>>>> >
>> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
>> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>> > >>>> >>>>>>>>> >>>>>> >
>> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>> > >>>> >>>>>>>>> >>>>>> > to deal with.
>> > >>>> >>>>>>>>> >>>>>> >
>> > >>>> >>>>>>>>> >>>>>> >
>> > >>>> >>>>>>>>> >>>>>> >
>> > >>>> >>>>>>>>> >>>>>> >
>> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org> wrote:
>> > >>>> >>>>>>>>> >>>>>> > >
>> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
>> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
>> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>> > >>>> >>>>>>>>> >>>>>> > >
>> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
>> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>> > >>>> >>>>>>>>> >>>>>> > >
>> > >>>> >>>>>>>>> >>>>>> > > -Max
>> > >>>> >>>>>>>>> >>>>>> > >
>> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>> > >>>> >>>>>>>>> >>>>>> > > >
>> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is recent, it was
>> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can change since
>> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that case is
>> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the contribution page [1]
>> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a single person project, there
>> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions since the project became
>> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python organization
>> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project continuity.
>> > >>>> >>>>>>>>> >>>>>> > > >
>> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact that the main author seems to be
>> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide recommendations
>> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide [4],
>> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the current policy to
>> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>> > >>>> >>>>>>>>> >>>>>> > > >
>> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is positive given the
>> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
>> > >>>> >>>>>>>>> >>>>>> > > >
>> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to reformat.
>> > >>>> >>>>>>>>> >>>>>> > > >
>> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> > >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> > >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> > >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string, file=sys.stderr)
>> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
>> > >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '    ',
>> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
>> > >>>> >>>>>>>>> >>>>>> > > >
>> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for the project but well I am
>> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I let you the python
>> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>> > >>>> >>>>>>>>> >>>>>> > > >
>> > >>>> >>>>>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>> > >>>> >>>>>>>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>> > >>>> >>>>>>>>> >>>>>> > > > [4] https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>> > >>>> >>>>>>>>> >>>>>> > > >
>> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com> wrote:
>> > >>>> >>>>>>>>> >>>>>> > > >>
>> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of autoformatters but I am not familiar with this one. My concerns are:
>> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>> > >>>> >>>>>>>>> >>>>>> > > >>
>> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint issues are real as pointed in the thread. If someone would like to give it a try and see how it would look like for us that would be interesting.
>> > >>>> >>>>>>>>> >>>>>> > > >>
>> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>> > >>>> >>>>>>>>> >>>>>> > > >>>
>> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are caused by lint problems.
>> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have something similar to Spotless in Java SDK (I heard there is problem with configuring Black with IntelliJ).
>> > >>>> >>>>>>>>> >>>>>> > > >>>
>> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <ro...@google.com> wrote:
>> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't looked at
>> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might have to go with
>> > >>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>> > >>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>> > >>>> >>>>>>>>> >>>>>> > > >>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <pa...@google.com> wrote:
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple people (myself included) who've been annoyed by having to take care of lint issues that maybe a code formatter could save us.
>> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <ie...@gmail.com> wrote:
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto formatter that
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and wanted to
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python people to get this
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good improvement and
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
>> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>> > >>>> >>
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> --
>> > >>>> >>
>> > >>>> >> Michał Walenia
>> > >>>> >> Polidea | Software Engineer
>> > >>>> >>
>> > >>>> >> M: +48 791 432 002
>> > >>>> >> E: michal.walenia@polidea.com
>> > >>>> >>
>> > >>>> >> Unique Tech
>> > >>>> >> Check out our projects!

Re: [DISCUSS] Autoformat python code with Black

Posted by Ahmet Altay <al...@google.com>.
Do we need a formal vote? There is consensus on this thread and on the PR.

On Wed, Feb 5, 2020 at 3:37 PM Robert Bradshaw <ro...@google.com> wrote:

> The PR is looking good. Should we call a vote?
>
> On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <ro...@google.com>
> wrote:
> >
> > Thanks. I commented on the PR. I think if we're going this route we
> > should add a pre-commit, plus instructions on how to run the tool
> > (similar to spotless).
> >
> > On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com> wrote:
> > >
> > > I've done a pass on the PR on code I'm familiar with.
> > > Please make a pass and add your suggestions on the PR.
> > >
> > > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> > >>
> > >> Java build fails on any unformatted code so python probably should be
> like that.
> > >> We have to ensure however that it fails early on that.
> > >> As Robert said time to debate the knobs :)
> > >>
> > >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
> > >>>
> > >>> PR is ready: https://github.com/apache/beam/pull/10684. Please
> share your comments ;-) I've managed to reduce the impact a bit:
> > >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
> > >>>
> > >>> We still need to consider how to enforce the usage of autoformatter.
> Pre-commit sounds like a nice addition, but it still needs to be installed
> manually by a developer. On the other hand, Jenkins precommit job that
> fails if any unformatted code is detected looks like too strict. What do
> you think?
> > >>>
> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <ro...@google.com>
> wrote:
> > >>>>
> > >>>> Thanks! Now we get to debate what knobs to twiddle :-P
> > >>>>
> > >>>> FYI, I did a simple run (just pushed to
> > >>>> https://github.com/apache/beam/compare/master...robertwb:yapf) to
> see
> > >>>> the impact. The diff is
> > >>>>
> > >>>>     $ git diff --stat master
> > >>>>     ...
> > >>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
> > >>>>
> > >>>> For reference
> > >>>>
> > >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
> > >>>>     ...
> > >>>>     200424  612002 7431637 total
> > >>>>
> > >>>> which means a little over 10% of lines get touched. I think there
> are
> > >>>> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
> > >>>> COALESCE_BRACKETS, that will conform more to the style we are
> already
> > >>>> (mostly) following.
> > >>>>
> > >>>>
> > >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
> > >>>> <ka...@polidea.com> wrote:
> > >>>> >
> > >>>> > Thank you Michał for creating the ticket. I have some free time
> and I'd like to volunteer myself for this task.
> > >>>> > Indeed, it looks like there's consensus for `yapf`, so I'll try
> `yapf` first.
> > >>>> >
> > >>>> > Best,
> > >>>> > Kamil
> > >>>> >
> > >>>> >
> > >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
> michal.walenia@polidea.com> wrote:
> > >>>> >>
> > >>>> >> Hi all,
> > >>>> >> I created a JIRA issue for this and summarized the available
> tools
> > >>>> >>
> > >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
> > >>>> >>
> > >>>> >> Cheers,
> > >>>> >> Michal
> > >>>> >>
> > >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com>
> wrote:
> > >>>> >>>
> > >>>> >>> Sorry, backing off on this due to time constraints.
> > >>>> >>>
> > >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com>
> wrote:
> > >>>> >>>>
> > >>>> >>>> It sounds like there's a consensus for yapf. I volunteer to
> take this on
> > >>>> >>>>
> > >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com>
> wrote:
> > >>>> >>>>>
> > >>>> >>>>> +1 to autoformatting
> > >>>> >>>>>
> > >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com>
> wrote:
> > >>>> >>>>>>
> > >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through a
> one time pass to apply the spotless formatting.
> > >>>> >>>>>>
> > >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <
> altay@google.com> wrote:
> > >>>> >>>>>>>
> > >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well
> maintained project. I do support making a one time pass to apply formatting
> the whole code base.
> > >>>> >>>>>>>
> > >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <
> chadrik@gmail.com> wrote:
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> It'd be good if there was a way to only apply to
> violating (or at
> > >>>> >>>>>>>>> least changed) lines.
> > >>>> >>>>>>>>
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> I assumed the first thing we’d do is convert all of the
> code in one go, since it’s a very safe operation. Did you have something
> else in mind?
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> -chad
> > >>>> >>>>>>>>
> > >>>> >>>>>>>>
> > >>>> >>>>>>>>
> > >>>> >>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
> chadrik@gmail.com> wrote:
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > +1 to autoformatting
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > Let me add some nuance to that.
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > The way I see it there are 2 varieties of formatters:
> those which take the original formatting into consideration (autopep8) and
> those which disregard it (yapf, black).
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > I much prefer yapf to black, because you have plenty of
> options to tweak with yapf (enough to make the output a pretty close match
> to the current Beam style), and you can mark areas to preserve the original
> formatting, which could be very useful with Pipeline building with pipe
> operators.  Please don't pick black.
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in Java --
> it only corrects code that breaks the project's style rules.  The big
> problem with Beam's current style is that it is so esoteric that autopep8
> can't enforce it -- and I'm not just talking about 2-spaces, which I don't
> really have a problem with -- the problem is the use of either 2 or 4
> spaces depending on context (expression start vs hanging indent, etc).
> This is my *biggest* gripe about the current style.  PyCharm doesn't have
> enough control either.  So, if we can choose a style that can be expressed
> by flake8 or pycodestyle then we can use autopep8 to enforce it.
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a
> little wiggle room to influence the style, but on a big project like Beam
> all that wiggle room ends up to minor but noticeable inconsistencies in
> style throughout the project.  yapf ensures completely consistent style,
> but the tradeoff is that it's sometimes ugly, especially in scenarios with
> similar repeated entries like argparse, where yapf might insert line breaks
> in visually inconsistent and unappealing ways depending on the lengths of
> the keywords and expressions involved.
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I think
> it'd be a nice addition to setup a pre-commit [1] config so that people can
> opt in to running *lightweight* autofixers prior to commit.  This will not
> only reduce dev frustration but will also reduce the amount of cpu cycles
> that Jenkins spends pointing out lint errors.
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > [1] https://pre-commit.com/
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > -chad
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> >
> > >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> > >>>> >>>>>>>>> >>
> > >>>> >>>>>>>>> >> Last time we discussed this there seems not to be much
> progress into autoformatting.
> > >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could be
> more appropriate for Beam's use case.
> > >>>> >>>>>>>>> >> https://github.com/google/yapf/
> > >>>> >>>>>>>>> >> WDYT?
> > >>>> >>>>>>>>> >>
> > >>>> >>>>>>>>> >>
> > >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
> lgajowy@apache.org> wrote:
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the
> job. My experience is that since spotless in Java SDK I would never start a
> new Java project without it. So many great benefits not only for one person
> coding but for all community.
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse
> past the reformat. It is not actually that hard, but does take a couple
> extra clicks to get GitHub to display blame before a reformat. It is easier
> with the command line. I do a lot of code history digging and the global
> Java reformat is not really a problem.
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>> It's actually one more click on Github but I agree
> it's not the best way to search the history. The most convenient and clear
> one I've found so far is in Jetbrains IDEs (Intelij) where you can:
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>> right click on line number -> "annotate" -> click
> again -> "annotate previous revision" -> ...
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>> You can also use "compare with" to see the diff
> between two revisions.
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>> Łukasz
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>>
> > >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <
> kenn@apache.org> napisał(a):
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell -
> seems there are some issues?)
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk
> <ka...@gmail.com> wrote:
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster PR
> review iteration. We will skip some of conversations about code style.
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>> ...
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be less
> discouraged. When I started contribute I didn’t know how to format my code
> and I lost a lot of time to add pylint and adjust IntelliJ. I eventually
> failed. Currently I write code intuitively and when I don’t forget I rerun
> tox.
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported it
> so much for Java. It is a community benefit. You do not have to be a
> contributor to the Python SDK to support this. That is why I am writing
> here. Just eliminate all discussion of formatting. It doesn't really matter
> what the resulting format is, if it is not crazy to read. I strongly oppose
> maintaining a non-default format.
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java
> global reformat touched 50k lines. It does not really matter how big it is.
> Definitely do it all at once if you think the tool is good enough. And you
> should pin a version, so churn is not a problem. You can upgrade the
> version and reformat in a PR later and that is also easy.
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily
> browse past the reformat. It is not actually that hard, but does take a
> couple extra clicks to get GitHub to display blame before a reformat. It is
> easier with the command line. I do a lot of code history digging and the
> global Java reformat is not really a problem.
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>> Kenn
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same way, so
> eventually it would be easier to read.
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails -
> a lot of Jenkins failures won’t take place, so we save time and resources.
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has
> custom syntax and after formatting they looks a little bit weird, but maybe
> extending the only configurable option in Black - lines, from 88 to 110
> would be solution.
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be
> run. I don’t know how big obstacle it would be.
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> I believe there are two options how it would be
> possible to introduce Black. First: just do it, it will hurt but then it
> would be ok (same as a dentist appointment). Of course it may require some
> work to adjust linters. On the other hand we can do it gradually and start
> including sdk parts one by one - maybe it will be less painful?
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I
> know that uses Black (they use also other cool checkers and pre-commit
> [3]). This is how looks their build with all checks [4].
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our
> coding experience, we should improve our toolset. Black seems be recent and
> quite popular tool what makes think they won’t stop developing it.
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> [1]
> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> [2]
> https://github.com/GoogleCloudPlatform/oozie-to-airflow
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> [4]
> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>>
> > >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to
> me, as it would change nearly every single line in the codebase (and the
> loss of all context as well as that particular line).
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists.
> However, we don't conform to that either--we use 2 spaces for indentation,
> but 4 for continuation indentation. (As for the history of this, this goes
> back to Google's internal style guide, probably motivated by consistency
> with C++, Java, ... and the fact that with an indent level of 4 one ends up
> wrapping lines quite frequently (it's telling that black's default line
> length is 88)). This turns out to be an easy change to the codebase.
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing,
> I found that this tool introduces a huge amount of vertical whitespace
> (e.g. closing parentheses on their own line), e.g.
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> def foo(
> > >>>> >>>>>>>>> >>>>>>     args
> > >>>> >>>>>>>>> >>>>>> ):
> > >>>> >>>>>>>>> >>>>>>   if (
> > >>>> >>>>>>>>> >>>>>>       long expression)
> > >>>> >>>>>>>>> >>>>>>   ):
> > >>>> >>>>>>>>> >>>>>>     func(
> > >>>> >>>>>>>>> >>>>>>         args
> > >>>> >>>>>>>>> >>>>>>     )
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing
> parentheses on the same lines, as well as omit the newline after "if (",
> and disabling formatting of strings, which reduce the churn in our codebase
> to 15k lines (adding about 4k) out of 200k total.
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in different
> ways then me, and doesn't understand the semantics of the code, but
> possibly something we could live with given the huge advantages of an
> autoformatter.
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but not
> require, autoformatting of changed lines.
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's got
> a decent number of contributors and in my book being in the python github
> project is a strong positive signal. But, due to the above issues, I think
> we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4
> space issue is a 2-line change, and the rest implemented as a
> post-processing step (for now, incomplete), so it'd be easy to stay in sync
> with upstream.)
> > >>>> >>>>>>>>> >>>>>>
> > >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> > >>>> >>>>>>>>> >>>>>> >
> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
> configured in a way to fit our
> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is
> feasible to reformat the
> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> > >>>> >>>>>>>>> >>>>>> >
> > >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we actually
> do because Black is
> > >>>> >>>>>>>>> >>>>>> > configurable only to support the standard python
> codestyle guidelines
> > >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what
> most projects in the
> > >>>> >>>>>>>>> >>>>>> > python world use.
> > >>>> >>>>>>>>> >>>>>> >
> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to
> the Git history. This
> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
> However, I have the feeling
> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with
> Python because the linter has
> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> > >>>> >>>>>>>>> >>>>>> >
> > >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are
> always tradeoffs we have
> > >>>> >>>>>>>>> >>>>>> > to deal with.
> > >>>> >>>>>>>>> >>>>>> >
> > >>>> >>>>>>>>> >>>>>> >
> > >>>> >>>>>>>>> >>>>>> >
> > >>>> >>>>>>>>> >>>>>> >
> > >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian
> Michels <mx...@apache.org> wrote:
> > >>>> >>>>>>>>> >>>>>> > >
> > >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be
> configured in a way to fit our
> > >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is
> feasible to reformat the
> > >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> > >>>> >>>>>>>>> >>>>>> > >
> > >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to
> the Git history. This
> > >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK.
> However, I have the feeling
> > >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with
> Python because the linter has
> > >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> > >>>> >>>>>>>>> >>>>>> > >
> > >>>> >>>>>>>>> >>>>>> > > -Max
> > >>>> >>>>>>>>> >>>>>> > >
> > >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
> > >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta
> with a big warning.
> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person
> project. For the same reason I also strongly prefer not using a fork for a
> specific setting. Fork will only have less people looking at it.
> > >>>> >>>>>>>>> >>>>>> > > >
> > >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta
> because it is recent, it was
> > >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some
> things can change since
> > >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I
> think beta in that case is
> > >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at
> the contribution page [1]
> > >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a
> single person project, there
> > >>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions since
> the project became
> > >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in
> the python organization
> > >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the
> project continuity.
> > >>>> >>>>>>>>> >>>>>> > > >
> > >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact that
> the main author seems to be
> > >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the
> 2-spaces issue he can seem
> > >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8
> style guide recommendations
> > >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not
> follow the 4 spaces
> > >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own
> Python style guide [4],
> > >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to
> reconsider the current policy to
> > >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
> > >>>> >>>>>>>>> >>>>>> > > >
> > >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7
> compatibility on
> > >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors
> which is positive given the
> > >>>> >>>>>>>>> >>>>>> > > > size of our code base.
> > >>>> >>>>>>>>> >>>>>> > > >
> > >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left
> unchanged, 4 files failed to reformat.
> > >>>> >>>>>>>>> >>>>>> > > >
> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress =
> print
> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>  file=sys.stderr)
> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>  print(traceback_string, file=sys.stderr)
> > >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> > >>>> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
> > >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if
> pc == last_pc else '    ',
> > >>>> >>>>>>>>> >>>>>> > > > end=' ')
> > >>>> >>>>>>>>> >>>>>> > > >
> > >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for the
> project but well I am
> > >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code base
> so I let you the python
> > >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case
> it seems like a good
> > >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
> > >>>> >>>>>>>>> >>>>>> > > >
> > >>>> >>>>>>>>> >>>>>> > > > [1]
> https://github.com/python/black/graphs/contributors
> > >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
> > >>>> >>>>>>>>> >>>>>> > > > [3]
> https://www.python.org/dev/peps/pep-0008/#indentation
> > >>>> >>>>>>>>> >>>>>> > > > [4]
> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
> > >>>> >>>>>>>>> >>>>>> > > >
> > >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay
> <al...@google.com> wrote:
> > >>>> >>>>>>>>> >>>>>> > > >>
> > >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in
> favor of autoformatters but I am not familiar with this one. My concerns
> are:
> > >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta
> with a big warning.
> > >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person
> project. For the same reason I also strongly prefer not using a fork for a
> specific setting. Fork will only have less people looking at it.
> > >>>> >>>>>>>>> >>>>>> > > >>
> > >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That
> said lint issues are real as pointed in the thread. If someone would like
> to give it a try and see how it would look like for us that would be
> interesting.
> > >>>> >>>>>>>>> >>>>>> > > >>
> > >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna
> Kucharczyk <ka...@gmail.com> wrote:
> > >>>> >>>>>>>>> >>>>>> > > >>>
> > >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins
> jobs failures are caused by lint problems.
> > >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have
> something similar to Spotless in Java SDK (I heard there is problem with
> configuring Black with IntelliJ).
> > >>>> >>>>>>>>> >>>>>> > > >>>
> > >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert
> Bradshaw <ro...@google.com> wrote:
> > >>>> >>>>>>>>> >>>>>> > > >>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters,
> though I haven't looked at
> > >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We
> might have to go with
> > >>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces
> given
> > >>>> >>>>>>>>> >>>>>> > > >>>>
> https://github.com/python/black/issues/378 .
> > >>>> >>>>>>>>> >>>>>> > > >>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo
> Estrada <pa...@google.com> wrote:
> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least
> a couple people (myself included) who've been annoyed by having to take
> care of lint issues that maybe a code formatter could save us.
> > >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
> > >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël
> Mejía <ie...@gmail.com> wrote:
> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a
> python code auto formatter that
> > >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto'
> auto-formatter for python, and wanted to
> > >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from
> the python people to get this
> > >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java
> has been a good improvement and
> > >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit
> of this too.
> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
> > >>>> >>>>>>>>> >>>>>> > > >>>>>>
> > >>>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >> --
> > >>>> >>
> > >>>> >> Michał Walenia
> > >>>> >> Polidea | Software Engineer
> > >>>> >>
> > >>>> >> M: +48 791 432 002 <+48%20791%20432%20002>
> > >>>> >> E: michal.walenia@polidea.com
> > >>>> >>
> > >>>> >> Unique Tech
> > >>>> >> Check out our projects!
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Robert Bradshaw <ro...@google.com>.
The PR is looking good. Should we call a vote?

On Mon, Jan 27, 2020 at 11:03 AM Robert Bradshaw <ro...@google.com> wrote:
>
> Thanks. I commented on the PR. I think if we're going this route we
> should add a pre-commit, plus instructions on how to run the tool
> (similar to spotless).
>
> On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com> wrote:
> >
> > I've done a pass on the PR on code I'm familiar with.
> > Please make a pass and add your suggestions on the PR.
> >
> > On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com> wrote:
> >>
> >> Java build fails on any unformatted code so python probably should be like that.
> >> We have to ensure however that it fails early on that.
> >> As Robert said time to debate the knobs :)
> >>
> >> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <ka...@polidea.com> wrote:
> >>>
> >>> PR is ready: https://github.com/apache/beam/pull/10684. Please share your comments ;-) I've managed to reduce the impact a bit:
> >>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
> >>>
> >>> We still need to consider how to enforce the usage of autoformatter. Pre-commit sounds like a nice addition, but it still needs to be installed manually by a developer. On the other hand, Jenkins precommit job that fails if any unformatted code is detected looks like too strict. What do you think?
> >>>
> >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <ro...@google.com> wrote:
> >>>>
> >>>> Thanks! Now we get to debate what knobs to twiddle :-P
> >>>>
> >>>> FYI, I did a simple run (just pushed to
> >>>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
> >>>> the impact. The diff is
> >>>>
> >>>>     $ git diff --stat master
> >>>>     ...
> >>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
> >>>>
> >>>> For reference
> >>>>
> >>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
> >>>>     ...
> >>>>     200424  612002 7431637 total
> >>>>
> >>>> which means a little over 10% of lines get touched. I think there are
> >>>> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
> >>>> COALESCE_BRACKETS, that will conform more to the style we are already
> >>>> (mostly) following.
> >>>>
> >>>>
> >>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
> >>>> <ka...@polidea.com> wrote:
> >>>> >
> >>>> > Thank you Michał for creating the ticket. I have some free time and I'd like to volunteer myself for this task.
> >>>> > Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf` first.
> >>>> >
> >>>> > Best,
> >>>> > Kamil
> >>>> >
> >>>> >
> >>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <mi...@polidea.com> wrote:
> >>>> >>
> >>>> >> Hi all,
> >>>> >> I created a JIRA issue for this and summarized the available tools
> >>>> >>
> >>>> >> https://issues.apache.org/jira/browse/BEAM-9175
> >>>> >>
> >>>> >> Cheers,
> >>>> >> Michal
> >>>> >>
> >>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
> >>>> >>>
> >>>> >>> Sorry, backing off on this due to time constraints.
> >>>> >>>
> >>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
> >>>> >>>>
> >>>> >>>> It sounds like there's a consensus for yapf. I volunteer to take this on
> >>>> >>>>
> >>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
> >>>> >>>>>
> >>>> >>>>> +1 to autoformatting
> >>>> >>>>>
> >>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
> >>>> >>>>>>
> >>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one time pass to apply the spotless formatting.
> >>>> >>>>>>
> >>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
> >>>> >>>>>>>
> >>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained project. I do support making a one time pass to apply formatting the whole code base.
> >>>> >>>>>>>
> >>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> It'd be good if there was a way to only apply to violating (or at
> >>>> >>>>>>>>> least changed) lines.
> >>>> >>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>> I assumed the first thing we’d do is convert all of the code in one go, since it’s a very safe operation. Did you have something else in mind?
> >>>> >>>>>>>>
> >>>> >>>>>>>> -chad
> >>>> >>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > +1 to autoformatting
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > Let me add some nuance to that.
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > The way I see it there are 2 varieties of formatters:  those which take the original formatting into consideration (autopep8) and those which disregard it (yapf, black).
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > I much prefer yapf to black, because you have plenty of options to tweak with yapf (enough to make the output a pretty close match to the current Beam style), and you can mark areas to preserve the original formatting, which could be very useful with Pipeline building with pipe operators.  Please don't pick black.
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it only corrects code that breaks the project's style rules.  The big problem with Beam's current style is that it is so esoteric that autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which I don't really have a problem with -- the problem is the use of either 2 or 4 spaces depending on context (expression start vs hanging indent, etc).  This is my *biggest* gripe about the current style.  PyCharm doesn't have enough control either.  So, if we can choose a style that can be expressed by flake8 or pycodestyle then we can use autopep8 to enforce it.
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle room to influence the style, but on a big project like Beam all that wiggle room ends up to minor but noticeable inconsistencies in style throughout the project.  yapf ensures completely consistent style, but the tradeoff is that it's sometimes ugly, especially in scenarios with similar repeated entries like argparse, where yapf might insert line breaks in visually inconsistent and unappealing ways depending on the lengths of the keywords and expressions involved.
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be a nice addition to setup a pre-commit [1] config so that people can opt in to running *lightweight* autofixers prior to commit.  This will not only reduce dev frustration but will also reduce the amount of cpu cycles that Jenkins spends pointing out lint errors.
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > [1] https://pre-commit.com/
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > -chad
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> >
> >>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:
> >>>> >>>>>>>>> >>
> >>>> >>>>>>>>> >> Last time we discussed this there seems not to be much progress into autoformatting.
> >>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could be more appropriate for Beam's use case.
> >>>> >>>>>>>>> >> https://github.com/google/yapf/
> >>>> >>>>>>>>> >> WDYT?
> >>>> >>>>>>>>> >>
> >>>> >>>>>>>>> >>
> >>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My experience is that since spotless in Java SDK I would never start a new Java project without it. So many great benefits not only for one person coding but for all community.
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>> It's actually one more click on Github but I agree it's not the best way to search the history. The most convenient and clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>> right click on line number -> "annotate" -> click again -> "annotate previous revision" -> ...
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>> You can also use "compare with" to see the diff between two revisions.
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>> Łukasz
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>>
> >>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there are some issues?)
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review iteration. We will skip some of conversations about code style.
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>> ...
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> Last but not least, new contributor may be less discouraged. When I started contribute I didn’t know how to format my code and I lost a lot of time to add pylint and adjust IntelliJ. I eventually failed. Currently I write code intuitively and when I don’t forget I rerun tox.
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much for Java. It is a community benefit. You do not have to be a contributor to the Python SDK to support this. That is why I am writing here. Just eliminate all discussion of formatting. It doesn't really matter what the resulting format is, if it is not crazy to read. I strongly oppose maintaining a non-default format.
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat touched 50k lines. It does not really matter how big it is. Definitely do it all at once if you think the tool is good enough. And you should pin a version, so churn is not a problem. You can upgrade the version and reformat in a PR later and that is also easy.
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>> Kenn
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> Also everything will be formatted in a same way, so eventually it would be easier to read.
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins failures won’t take place, so we save time and resources.
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and after formatting they looks a little bit weird, but maybe extending the only configurable option in Black - lines, from 88 to 110 would be solution.
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t know how big obstacle it would be.
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> I believe there are two options how it would be possible to introduce Black. First: just do it, it will hurt but then it would be ok (same as a dentist appointment). Of course it may require some work to adjust linters. On the other hand we can do it gradually and start including sdk parts one by one - maybe it will be less painful?
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I know that uses Black (they use also other cool checkers and pre-commit [3]). This is how looks their build with all checks [4].
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our coding experience, we should improve our toolset. Black seems be recent and quite popular tool what makes think they won’t stop developing it.
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> [1] https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> [4] https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>>
> >>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com> wrote:
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change nearly every single line in the codebase (and the loss of all context as well as that particular line).
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't conform to that either--we use 2 spaces for indentation, but 4 for continuation indentation. (As for the history of this, this goes back to Google's internal style guide, probably motivated by consistency with C++, Java, ... and the fact that with an indent level of 4 one ends up wrapping lines quite frequently (it's telling that black's default line length is 88)). This turns out to be an easy change to the codebase.
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool introduces a huge amount of vertical whitespace (e.g. closing parentheses on their own line), e.g.
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> def foo(
> >>>> >>>>>>>>> >>>>>>     args
> >>>> >>>>>>>>> >>>>>> ):
> >>>> >>>>>>>>> >>>>>>   if (
> >>>> >>>>>>>>> >>>>>>       long expression)
> >>>> >>>>>>>>> >>>>>>   ):
> >>>> >>>>>>>>> >>>>>>     func(
> >>>> >>>>>>>>> >>>>>>         args
> >>>> >>>>>>>>> >>>>>>     )
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on the same lines, as well as omit the newline after "if (", and disabling formatting of strings, which reduce the churn in our codebase to 15k lines (adding about 4k) out of 200k total.
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> It's still very opinionated, often in different ways then me, and doesn't understand the semantics of the code, but possibly something we could live with given the huge advantages of an autoformatter.
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but not require, autoformatting of changed lines.
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent number of contributors and in my book being in the python github project is a strong positive signal. But, due to the above issues, I think we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest implemented as a post-processing step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
> >>>> >>>>>>>>> >>>>>>
> >>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com> wrote:
> >>>> >>>>>>>>> >>>>>> >
> >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
> >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
> >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> >>>> >>>>>>>>> >>>>>> >
> >>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we actually do because Black is
> >>>> >>>>>>>>> >>>>>> > configurable only to support the standard python codestyle guidelines
> >>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
> >>>> >>>>>>>>> >>>>>> > python world use.
> >>>> >>>>>>>>> >>>>>> >
> >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
> >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
> >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
> >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >>>> >>>>>>>>> >>>>>> >
> >>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
> >>>> >>>>>>>>> >>>>>> > to deal with.
> >>>> >>>>>>>>> >>>>>> >
> >>>> >>>>>>>>> >>>>>> >
> >>>> >>>>>>>>> >>>>>> >
> >>>> >>>>>>>>> >>>>>> >
> >>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org> wrote:
> >>>> >>>>>>>>> >>>>>> > >
> >>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
> >>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
> >>>> >>>>>>>>> >>>>>> > > entire Python SDK.
> >>>> >>>>>>>>> >>>>>> > >
> >>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
> >>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
> >>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
> >>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >>>> >>>>>>>>> >>>>>> > >
> >>>> >>>>>>>>> >>>>>> > > -Max
> >>>> >>>>>>>>> >>>>>> > >
> >>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
> >>>> >>>>>>>>> >>>>>> > > >> My concerns are:
> >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
> >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
> >>>> >>>>>>>>> >>>>>> > > >
> >>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is recent, it was
> >>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can change since
> >>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that case is
> >>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the contribution page [1]
> >>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a single person project, there
> >>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions since the project became
> >>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python organization
> >>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project continuity.
> >>>> >>>>>>>>> >>>>>> > > >
> >>>> >>>>>>>>> >>>>>> > > > You are right however about the fact that the main author seems to be
> >>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
> >>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide recommendations
> >>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
> >>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide [4],
> >>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the current policy to
> >>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
> >>>> >>>>>>>>> >>>>>> > > >
> >>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
> >>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is positive given the
> >>>> >>>>>>>>> >>>>>> > > > size of our code base.
> >>>> >>>>>>>>> >>>>>> > > >
> >>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to reformat.
> >>>> >>>>>>>>> >>>>>> > > >
> >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
> >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
> >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
> >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
> >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
> >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string, file=sys.stderr)
> >>>> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
> >>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '    ',
> >>>> >>>>>>>>> >>>>>> > > > end=' ')
> >>>> >>>>>>>>> >>>>>> > > >
> >>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for the project but well I am
> >>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I let you the python
> >>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like a good
> >>>> >>>>>>>>> >>>>>> > > > improvement for the project.
> >>>> >>>>>>>>> >>>>>> > > >
> >>>> >>>>>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
> >>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
> >>>> >>>>>>>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
> >>>> >>>>>>>>> >>>>>> > > > [4] https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
> >>>> >>>>>>>>> >>>>>> > > >
> >>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com> wrote:
> >>>> >>>>>>>>> >>>>>> > > >>
> >>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of autoformatters but I am not familiar with this one. My concerns are:
> >>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
> >>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
> >>>> >>>>>>>>> >>>>>> > > >>
> >>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint issues are real as pointed in the thread. If someone would like to give it a try and see how it would look like for us that would be interesting.
> >>>> >>>>>>>>> >>>>>> > > >>
> >>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
> >>>> >>>>>>>>> >>>>>> > > >>>
> >>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are caused by lint problems.
> >>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have something similar to Spotless in Java SDK (I heard there is problem with configuring Black with IntelliJ).
> >>>> >>>>>>>>> >>>>>> > > >>>
> >>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <ro...@google.com> wrote:
> >>>> >>>>>>>>> >>>>>> > > >>>>
> >>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't looked at
> >>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might have to go with
> >>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
> >>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
> >>>> >>>>>>>>> >>>>>> > > >>>>
> >>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <pa...@google.com> wrote:
> >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple people (myself included) who've been annoyed by having to take care of lint issues that maybe a code formatter could save us.
> >>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
> >>>> >>>>>>>>> >>>>>> > > >>>>> -P.
> >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>> >>>>>>>>> >>>>>> > > >>>>>
> >>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <ie...@gmail.com> wrote:
> >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto formatter that
> >>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and wanted to
> >>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python people to get this
> >>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
> >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good improvement and
> >>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
> >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
> >>>> >>>>>>>>> >>>>>> > > >>>>>>
> >>>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >>
> >>>> >> Michał Walenia
> >>>> >> Polidea | Software Engineer
> >>>> >>
> >>>> >> M: +48 791 432 002
> >>>> >> E: michal.walenia@polidea.com
> >>>> >>
> >>>> >> Unique Tech
> >>>> >> Check out our projects!

Re: [DISCUSS] Autoformat python code with Black

Posted by Robert Bradshaw <ro...@google.com>.
Thanks. I commented on the PR. I think if we're going this route we
should add a pre-commit, plus instructions on how to run the tool
(similar to spotless).

On Mon, Jan 27, 2020 at 10:00 AM Udi Meiri <eh...@google.com> wrote:
>
> I've done a pass on the PR on code I'm familiar with.
> Please make a pass and add your suggestions on the PR.
>
> On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>> Java build fails on any unformatted code so python probably should be like that.
>> We have to ensure however that it fails early on that.
>> As Robert said time to debate the knobs :)
>>
>> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <ka...@polidea.com> wrote:
>>>
>>> PR is ready: https://github.com/apache/beam/pull/10684. Please share your comments ;-) I've managed to reduce the impact a bit:
>>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>>>
>>> We still need to consider how to enforce the usage of autoformatter. Pre-commit sounds like a nice addition, but it still needs to be installed manually by a developer. On the other hand, Jenkins precommit job that fails if any unformatted code is detected looks like too strict. What do you think?
>>>
>>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>
>>>> Thanks! Now we get to debate what knobs to twiddle :-P
>>>>
>>>> FYI, I did a simple run (just pushed to
>>>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
>>>> the impact. The diff is
>>>>
>>>>     $ git diff --stat master
>>>>     ...
>>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
>>>>
>>>> For reference
>>>>
>>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>>>>     ...
>>>>     200424  612002 7431637 total
>>>>
>>>> which means a little over 10% of lines get touched. I think there are
>>>> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>>>> COALESCE_BRACKETS, that will conform more to the style we are already
>>>> (mostly) following.
>>>>
>>>>
>>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>>>> <ka...@polidea.com> wrote:
>>>> >
>>>> > Thank you Michał for creating the ticket. I have some free time and I'd like to volunteer myself for this task.
>>>> > Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf` first.
>>>> >
>>>> > Best,
>>>> > Kamil
>>>> >
>>>> >
>>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <mi...@polidea.com> wrote:
>>>> >>
>>>> >> Hi all,
>>>> >> I created a JIRA issue for this and summarized the available tools
>>>> >>
>>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>>>> >>
>>>> >> Cheers,
>>>> >> Michal
>>>> >>
>>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
>>>> >>>
>>>> >>> Sorry, backing off on this due to time constraints.
>>>> >>>
>>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>>>> >>>>
>>>> >>>> It sounds like there's a consensus for yapf. I volunteer to take this on
>>>> >>>>
>>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>>>> >>>>>
>>>> >>>>> +1 to autoformatting
>>>> >>>>>
>>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>>>> >>>>>>
>>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one time pass to apply the spotless formatting.
>>>> >>>>>>
>>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>>>> >>>>>>>
>>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained project. I do support making a one time pass to apply formatting the whole code base.
>>>> >>>>>>>
>>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> It'd be good if there was a way to only apply to violating (or at
>>>> >>>>>>>>> least changed) lines.
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> I assumed the first thing we’d do is convert all of the code in one go, since it’s a very safe operation. Did you have something else in mind?
>>>> >>>>>>>>
>>>> >>>>>>>> -chad
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > +1 to autoformatting
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > Let me add some nuance to that.
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > The way I see it there are 2 varieties of formatters:  those which take the original formatting into consideration (autopep8) and those which disregard it (yapf, black).
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > I much prefer yapf to black, because you have plenty of options to tweak with yapf (enough to make the output a pretty close match to the current Beam style), and you can mark areas to preserve the original formatting, which could be very useful with Pipeline building with pipe operators.  Please don't pick black.
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it only corrects code that breaks the project's style rules.  The big problem with Beam's current style is that it is so esoteric that autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which I don't really have a problem with -- the problem is the use of either 2 or 4 spaces depending on context (expression start vs hanging indent, etc).  This is my *biggest* gripe about the current style.  PyCharm doesn't have enough control either.  So, if we can choose a style that can be expressed by flake8 or pycodestyle then we can use autopep8 to enforce it.
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle room to influence the style, but on a big project like Beam all that wiggle room ends up to minor but noticeable inconsistencies in style throughout the project.  yapf ensures completely consistent style, but the tradeoff is that it's sometimes ugly, especially in scenarios with similar repeated entries like argparse, where yapf might insert line breaks in visually inconsistent and unappealing ways depending on the lengths of the keywords and expressions involved.
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be a nice addition to setup a pre-commit [1] config so that people can opt in to running *lightweight* autofixers prior to commit.  This will not only reduce dev frustration but will also reduce the amount of cpu cycles that Jenkins spends pointing out lint errors.
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > [1] https://pre-commit.com/
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > -chad
>>>> >>>>>>>>> >
>>>> >>>>>>>>> >
>>>> >>>>>>>>> >
>>>> >>>>>>>>> >
>>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>> >>>>>>>>> >>
>>>> >>>>>>>>> >> Last time we discussed this there seems not to be much progress into autoformatting.
>>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could be more appropriate for Beam's use case.
>>>> >>>>>>>>> >> https://github.com/google/yapf/
>>>> >>>>>>>>> >> WDYT?
>>>> >>>>>>>>> >>
>>>> >>>>>>>>> >>
>>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My experience is that since spotless in Java SDK I would never start a new Java project without it. So many great benefits not only for one person coding but for all community.
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>> It's actually one more click on Github but I agree it's not the best way to search the history. The most convenient and clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>> right click on line number -> "annotate" -> click again -> "annotate previous revision" -> ...
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>> You can also use "compare with" to see the diff between two revisions.
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>> Łukasz
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>>
>>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there are some issues?)
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review iteration. We will skip some of conversations about code style.
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>> ...
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> Last but not least, new contributor may be less discouraged. When I started contribute I didn’t know how to format my code and I lost a lot of time to add pylint and adjust IntelliJ. I eventually failed. Currently I write code intuitively and when I don’t forget I rerun tox.
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much for Java. It is a community benefit. You do not have to be a contributor to the Python SDK to support this. That is why I am writing here. Just eliminate all discussion of formatting. It doesn't really matter what the resulting format is, if it is not crazy to read. I strongly oppose maintaining a non-default format.
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat touched 50k lines. It does not really matter how big it is. Definitely do it all at once if you think the tool is good enough. And you should pin a version, so churn is not a problem. You can upgrade the version and reformat in a PR later and that is also easy.
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>> Kenn
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> Also everything will be formatted in a same way, so eventually it would be easier to read.
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins failures won’t take place, so we save time and resources.
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and after formatting they looks a little bit weird, but maybe extending the only configurable option in Black - lines, from 88 to 110 would be solution.
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t know how big obstacle it would be.
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> I believe there are two options how it would be possible to introduce Black. First: just do it, it will hurt but then it would be ok (same as a dentist appointment). Of course it may require some work to adjust linters. On the other hand we can do it gradually and start including sdk parts one by one - maybe it will be less painful?
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I know that uses Black (they use also other cool checkers and pre-commit [3]). This is how looks their build with all checks [4].
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our coding experience, we should improve our toolset. Black seems be recent and quite popular tool what makes think they won’t stop developing it.
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> [1] https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> [4] https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>>
>>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com> wrote:
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change nearly every single line in the codebase (and the loss of all context as well as that particular line).
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't conform to that either--we use 2 spaces for indentation, but 4 for continuation indentation. (As for the history of this, this goes back to Google's internal style guide, probably motivated by consistency with C++, Java, ... and the fact that with an indent level of 4 one ends up wrapping lines quite frequently (it's telling that black's default line length is 88)). This turns out to be an easy change to the codebase.
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool introduces a huge amount of vertical whitespace (e.g. closing parentheses on their own line), e.g.
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> def foo(
>>>> >>>>>>>>> >>>>>>     args
>>>> >>>>>>>>> >>>>>> ):
>>>> >>>>>>>>> >>>>>>   if (
>>>> >>>>>>>>> >>>>>>       long expression)
>>>> >>>>>>>>> >>>>>>   ):
>>>> >>>>>>>>> >>>>>>     func(
>>>> >>>>>>>>> >>>>>>         args
>>>> >>>>>>>>> >>>>>>     )
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on the same lines, as well as omit the newline after "if (", and disabling formatting of strings, which reduce the churn in our codebase to 15k lines (adding about 4k) out of 200k total.
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> It's still very opinionated, often in different ways then me, and doesn't understand the semantics of the code, but possibly something we could live with given the huge advantages of an autoformatter.
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but not require, autoformatting of changed lines.
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent number of contributors and in my book being in the python github project is a strong positive signal. But, due to the above issues, I think we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest implemented as a post-processing step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>> >>>>>>>>> >>>>>>
>>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>> >>>>>>>>> >>>>>> >
>>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
>>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>> >>>>>>>>> >>>>>> >
>>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we actually do because Black is
>>>> >>>>>>>>> >>>>>> > configurable only to support the standard python codestyle guidelines
>>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>> >>>>>>>>> >>>>>> > python world use.
>>>> >>>>>>>>> >>>>>> >
>>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
>>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>> >>>>>>>>> >>>>>> >
>>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>> >>>>>>>>> >>>>>> > to deal with.
>>>> >>>>>>>>> >>>>>> >
>>>> >>>>>>>>> >>>>>> >
>>>> >>>>>>>>> >>>>>> >
>>>> >>>>>>>>> >>>>>> >
>>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org> wrote:
>>>> >>>>>>>>> >>>>>> > >
>>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
>>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>>> >>>>>>>>> >>>>>> > >
>>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
>>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>> >>>>>>>>> >>>>>> > >
>>>> >>>>>>>>> >>>>>> > > -Max
>>>> >>>>>>>>> >>>>>> > >
>>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>>>> >>>>>>>>> >>>>>> > > >
>>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is recent, it was
>>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can change since
>>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that case is
>>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the contribution page [1]
>>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a single person project, there
>>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions since the project became
>>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python organization
>>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project continuity.
>>>> >>>>>>>>> >>>>>> > > >
>>>> >>>>>>>>> >>>>>> > > > You are right however about the fact that the main author seems to be
>>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide recommendations
>>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide [4],
>>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the current policy to
>>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>> >>>>>>>>> >>>>>> > > >
>>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is positive given the
>>>> >>>>>>>>> >>>>>> > > > size of our code base.
>>>> >>>>>>>>> >>>>>> > > >
>>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to reformat.
>>>> >>>>>>>>> >>>>>> > > >
>>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string, file=sys.stderr)
>>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>>> >>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '    ',
>>>> >>>>>>>>> >>>>>> > > > end=' ')
>>>> >>>>>>>>> >>>>>> > > >
>>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for the project but well I am
>>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I let you the python
>>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>>>> >>>>>>>>> >>>>>> > > >
>>>> >>>>>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>> >>>>>>>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>> >>>>>>>>> >>>>>> > > > [4] https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>> >>>>>>>>> >>>>>> > > >
>>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com> wrote:
>>>> >>>>>>>>> >>>>>> > > >>
>>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of autoformatters but I am not familiar with this one. My concerns are:
>>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>>>> >>>>>>>>> >>>>>> > > >>
>>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint issues are real as pointed in the thread. If someone would like to give it a try and see how it would look like for us that would be interesting.
>>>> >>>>>>>>> >>>>>> > > >>
>>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>> >>>>>>>>> >>>>>> > > >>>
>>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are caused by lint problems.
>>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have something similar to Spotless in Java SDK (I heard there is problem with configuring Black with IntelliJ).
>>>> >>>>>>>>> >>>>>> > > >>>
>>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <ro...@google.com> wrote:
>>>> >>>>>>>>> >>>>>> > > >>>>
>>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't looked at
>>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might have to go with
>>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>> >>>>>>>>> >>>>>> > > >>>>
>>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <pa...@google.com> wrote:
>>>> >>>>>>>>> >>>>>> > > >>>>>
>>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple people (myself included) who've been annoyed by having to take care of lint issues that maybe a code formatter could save us.
>>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>>>> >>>>>>>>> >>>>>> > > >>>>>
>>>> >>>>>>>>> >>>>>> > > >>>>>
>>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto formatter that
>>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and wanted to
>>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python people to get this
>>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good improvement and
>>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>> >>>>>>>>> >>>>>> > > >>>>>>
>>>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >>
>>>> >> Michał Walenia
>>>> >> Polidea | Software Engineer
>>>> >>
>>>> >> M: +48 791 432 002
>>>> >> E: michal.walenia@polidea.com
>>>> >>
>>>> >> Unique Tech
>>>> >> Check out our projects!

Re: [DISCUSS] Autoformat python code with Black

Posted by Udi Meiri <eh...@google.com>.
I've done a pass on the PR on code I'm familiar with.
Please make a pass and add your suggestions on the PR.

On Fri, Jan 24, 2020 at 7:15 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Java build fails on any unformatted code so python probably should be like
> that.
> We have to ensure however that it fails early on that.
> As Robert said time to debate the knobs :)
>
> On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
> kamil.wasilewski@polidea.com> wrote:
>
>> PR is ready: https://github.com/apache/beam/pull/10684. Please share
>> your comments ;-) I've managed to reduce the impact a bit:
>> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>>
>> We still need to consider how to enforce the usage of autoformatter.
>> Pre-commit sounds like a nice addition, but it still needs to be installed
>> manually by a developer. On the other hand, Jenkins precommit job that
>> fails if any unformatted code is detected looks like too strict. What do
>> you think?
>>
>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> Thanks! Now we get to debate what knobs to twiddle :-P
>>>
>>> FYI, I did a simple run (just pushed to
>>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
>>> the impact. The diff is
>>>
>>>     $ git diff --stat master
>>>     ...
>>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
>>>
>>> For reference
>>>
>>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>>>     ...
>>>     200424  612002 7431637 total
>>>
>>> which means a little over 10% of lines get touched. I think there are
>>> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>>> COALESCE_BRACKETS, that will conform more to the style we are already
>>> (mostly) following.
>>>
>>>
>>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>>> <ka...@polidea.com> wrote:
>>> >
>>> > Thank you Michał for creating the ticket. I have some free time and
>>> I'd like to volunteer myself for this task.
>>> > Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf`
>>> first.
>>> >
>>> > Best,
>>> > Kamil
>>> >
>>> >
>>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
>>> michal.walenia@polidea.com> wrote:
>>> >>
>>> >> Hi all,
>>> >> I created a JIRA issue for this and summarized the available tools
>>> >>
>>> >> https://issues.apache.org/jira/browse/BEAM-9175
>>> >>
>>> >> Cheers,
>>> >> Michal
>>> >>
>>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
>>> >>>
>>> >>> Sorry, backing off on this due to time constraints.
>>> >>>
>>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>>> >>>>
>>> >>>> It sounds like there's a consensus for yapf. I volunteer to take
>>> this on
>>> >>>>
>>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>>> >>>>>
>>> >>>>> +1 to autoformatting
>>> >>>>>
>>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com>
>>> wrote:
>>> >>>>>>
>>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one
>>> time pass to apply the spotless formatting.
>>> >>>>>>
>>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well
>>> maintained project. I do support making a one time pass to apply formatting
>>> the whole code base.
>>> >>>>>>>
>>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com>
>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> It'd be good if there was a way to only apply to violating (or
>>> at
>>> >>>>>>>>> least changed) lines.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> I assumed the first thing we’d do is convert all of the code in
>>> one go, since it’s a very safe operation. Did you have something else in
>>> mind?
>>> >>>>>>>>
>>> >>>>>>>> -chad
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
>>> chadrik@gmail.com> wrote:
>>> >>>>>>>>> >
>>> >>>>>>>>> > +1 to autoformatting
>>> >>>>>>>>> >
>>> >>>>>>>>> > Let me add some nuance to that.
>>> >>>>>>>>> >
>>> >>>>>>>>> > The way I see it there are 2 varieties of formatters:  those
>>> which take the original formatting into consideration (autopep8) and those
>>> which disregard it (yapf, black).
>>> >>>>>>>>> >
>>> >>>>>>>>> > I much prefer yapf to black, because you have plenty of
>>> options to tweak with yapf (enough to make the output a pretty close match
>>> to the current Beam style), and you can mark areas to preserve the original
>>> formatting, which could be very useful with Pipeline building with pipe
>>> operators.  Please don't pick black.
>>> >>>>>>>>> >
>>> >>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it
>>> only corrects code that breaks the project's style rules.  The big problem
>>> with Beam's current style is that it is so esoteric that autopep8 can't
>>> enforce it -- and I'm not just talking about 2-spaces, which I don't really
>>> have a problem with -- the problem is the use of either 2 or 4 spaces
>>> depending on context (expression start vs hanging indent, etc).  This is my
>>> *biggest* gripe about the current style.  PyCharm doesn't have enough
>>> control either.  So, if we can choose a style that can be expressed by
>>> flake8 or pycodestyle then we can use autopep8 to enforce it.
>>> >>>>>>>>> >
>>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little
>>> wiggle room to influence the style, but on a big project like Beam all that
>>> wiggle room ends up to minor but noticeable inconsistencies in style
>>> throughout the project.  yapf ensures completely consistent style, but the
>>> tradeoff is that it's sometimes ugly, especially in scenarios with similar
>>> repeated entries like argparse, where yapf might insert line breaks in
>>> visually inconsistent and unappealing ways depending on the lengths of the
>>> keywords and expressions involved.
>>> >>>>>>>>> >
>>> >>>>>>>>> > Either way (but especially if we choose yapf) I think it'd
>>> be a nice addition to setup a pre-commit [1] config so that people can opt
>>> in to running *lightweight* autofixers prior to commit.  This will not only
>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>> Jenkins spends pointing out lint errors.
>>> >>>>>>>>> >
>>> >>>>>>>>> > [1] https://pre-commit.com/
>>> >>>>>>>>> >
>>> >>>>>>>>> > -chad
>>> >>>>>>>>> >
>>> >>>>>>>>> >
>>> >>>>>>>>> >
>>> >>>>>>>>> >
>>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
>>> iemejia@gmail.com> wrote:
>>> >>>>>>>>> >>
>>> >>>>>>>>> >> Last time we discussed this there seems not to be much
>>> progress into autoformatting.
>>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could be more
>>> appropriate for Beam's use case.
>>> >>>>>>>>> >> https://github.com/google/yapf/
>>> >>>>>>>>> >> WDYT?
>>> >>>>>>>>> >>
>>> >>>>>>>>> >>
>>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>>> lgajowy@apache.org> wrote:
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job.
>>> My experience is that since spotless in Java SDK I would never start a new
>>> Java project without it. So many great benefits not only for one person
>>> coding but for all community.
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past
>>> the reformat. It is not actually that hard, but does take a couple extra
>>> clicks to get GitHub to display blame before a reformat. It is easier with
>>> the command line. I do a lot of code history digging and the global Java
>>> reformat is not really a problem.
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>> It's actually one more click on Github but I agree it's
>>> not the best way to search the history. The most convenient and clear one
>>> I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>> right click on line number -> "annotate" -> click again ->
>>> "annotate previous revision" -> ...
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>> You can also use "compare with" to see the diff between
>>> two revisions.
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>> Łukasz
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>>
>>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>>> napisał(a):
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell -
>>> seems there are some issues?)
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>> ka.kucharczyk@gmail.com> wrote:
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review
>>> iteration. We will skip some of conversations about code style.
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>> ...
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> Last but not least, new contributor may be less
>>> discouraged. When I started contribute I didn’t know how to format my code
>>> and I lost a lot of time to add pylint and adjust IntelliJ. I eventually
>>> failed. Currently I write code intuitively and when I don’t forget I rerun
>>> tox.
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported it so
>>> much for Java. It is a community benefit. You do not have to be a
>>> contributor to the Python SDK to support this. That is why I am writing
>>> here. Just eliminate all discussion of formatting. It doesn't really matter
>>> what the resulting format is, if it is not crazy to read. I strongly oppose
>>> maintaining a non-default format.
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java
>>> global reformat touched 50k lines. It does not really matter how big it is.
>>> Definitely do it all at once if you think the tool is good enough. And you
>>> should pin a version, so churn is not a problem. You can upgrade the
>>> version and reformat in a PR later and that is also easy.
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse
>>> past the reformat. It is not actually that hard, but does take a couple
>>> extra clicks to get GitHub to display blame before a reformat. It is easier
>>> with the command line. I do a lot of code history digging and the global
>>> Java reformat is not really a problem.
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>> Kenn
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> Also everything will be formatted in a same way, so
>>> eventually it would be easier to read.
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot
>>> of Jenkins failures won’t take place, so we save time and resources.
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom
>>> syntax and after formatting they looks a little bit weird, but maybe
>>> extending the only configurable option in Black - lines, from 88 to 110
>>> would be solution.
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I
>>> don’t know how big obstacle it would be.
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> I believe there are two options how it would be possible
>>> to introduce Black. First: just do it, it will hurt but then it would be ok
>>> (same as a dentist appointment). Of course it may require some work to
>>> adjust linters. On the other hand we can do it gradually and start
>>> including sdk parts one by one - maybe it will be less painful?
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I know
>>> that uses Black (they use also other cool checkers and pre-commit [3]).
>>> This is how looks their build with all checks [4].
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our coding
>>> experience, we should improve our toolset. Black seems be recent and quite
>>> popular tool what makes think they won’t stop developing it.
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> [1]
>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> [2]
>>> https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> [4]
>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>>
>>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>> robertwb@google.com> wrote:
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as
>>> it would change nearly every single line in the codebase (and the loss of
>>> all context as well as that particular line).
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However,
>>> we don't conform to that either--we use 2 spaces for indentation, but 4 for
>>> continuation indentation. (As for the history of this, this goes back to
>>> Google's internal style guide, probably motivated by consistency with C++,
>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>> lines quite frequently (it's telling that black's default line length is
>>> 88)). This turns out to be an easy change to the codebase.
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I
>>> found that this tool introduces a huge amount of vertical whitespace (e.g.
>>> closing parentheses on their own line), e.g.
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> def foo(
>>> >>>>>>>>> >>>>>>     args
>>> >>>>>>>>> >>>>>> ):
>>> >>>>>>>>> >>>>>>   if (
>>> >>>>>>>>> >>>>>>       long expression)
>>> >>>>>>>>> >>>>>>   ):
>>> >>>>>>>>> >>>>>>     func(
>>> >>>>>>>>> >>>>>>         args
>>> >>>>>>>>> >>>>>>     )
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing
>>> parentheses on the same lines, as well as omit the newline after "if (",
>>> and disabling formatting of strings, which reduce the churn in our codebase
>>> to 15k lines (adding about 4k) out of 200k total.
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> It's still very opinionated, often in different ways
>>> then me, and doesn't understand the semantics of the code, but possibly
>>> something we could live with given the huge advantages of an autoformatter.
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but not
>>> require, autoformatting of changed lines.
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a
>>> decent number of contributors and in my book being in the python github
>>> project is a strong positive signal. But, due to the above issues, I think
>>> we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4
>>> space issue is a 2-line change, and the rest implemented as a
>>> post-processing step (for now, incomplete), so it'd be easy to stay in sync
>>> with upstream.)
>>> >>>>>>>>> >>>>>>
>>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>>> iemejia@gmail.com> wrote:
>>> >>>>>>>>> >>>>>> >
>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in
>>> a way to fit our
>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is
>>> feasible to reformat the
>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>> >>>>>>>>> >>>>>> >
>>> >>>>>>>>> >>>>>> > It cannot be configured to do what we actually do
>>> because Black is
>>> >>>>>>>>> >>>>>> > configurable only to support the standard python
>>> codestyle guidelines
>>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most
>>> projects in the
>>> >>>>>>>>> >>>>>> > python world use.
>>> >>>>>>>>> >>>>>> >
>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the
>>> Git history. This
>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I
>>> have the feeling
>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python
>>> because the linter has
>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>> >>>>>>>>> >>>>>> >
>>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always
>>> tradeoffs we have
>>> >>>>>>>>> >>>>>> > to deal with.
>>> >>>>>>>>> >>>>>> >
>>> >>>>>>>>> >>>>>> >
>>> >>>>>>>>> >>>>>> >
>>> >>>>>>>>> >>>>>> >
>>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>> mxm@apache.org> wrote:
>>> >>>>>>>>> >>>>>> > >
>>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in
>>> a way to fit our
>>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is
>>> feasible to reformat the
>>> >>>>>>>>> >>>>>> > > entire Python SDK.
>>> >>>>>>>>> >>>>>> > >
>>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the
>>> Git history. This
>>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I
>>> have the feeling
>>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python
>>> because the linter has
>>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>> >>>>>>>>> >>>>>> > >
>>> >>>>>>>>> >>>>>> > > -Max
>>> >>>>>>>>> >>>>>> > >
>>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>> >>>>>>>>> >>>>>> > > >> My concerns are:
>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a
>>> big warning.
>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project.
>>> For the same reason I also strongly prefer not using a fork for a specific
>>> setting. Fork will only have less people looking at it.
>>> >>>>>>>>> >>>>>> > > >
>>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because
>>> it is recent, it was
>>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some
>>> things can change since
>>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think
>>> beta in that case is
>>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>> contribution page [1]
>>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a single
>>> person project, there
>>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions since the
>>> project became
>>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the
>>> python organization
>>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project
>>> continuity.
>>> >>>>>>>>> >>>>>> > > >
>>> >>>>>>>>> >>>>>> > > > You are right however about the fact that the
>>> main author seems to be
>>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces
>>> issue he can seem
>>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style
>>> guide recommendations
>>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow
>>> the 4 spaces
>>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own
>>> Python style guide [4],
>>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the
>>> current policy to
>>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>> >>>>>>>>> >>>>>> > > >
>>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7
>>> compatibility on
>>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which
>>> is positive given the
>>> >>>>>>>>> >>>>>> > > > size of our code base.
>>> >>>>>>>>> >>>>>> > > >
>>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4
>>> files failed to reformat.
>>> >>>>>>>>> >>>>>> > > >
>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>> >>>>>>>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>> >>>>>>>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>>>  file=sys.stderr)
>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>> >>>>>>>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>>>  print(traceback_string, file=sys.stderr)
>>> >>>>>>>>> >>>>>> > > > error: cannot format
>>> >>>>>>>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc ==
>>> last_pc else '    ',
>>> >>>>>>>>> >>>>>> > > > end=' ')
>>> >>>>>>>>> >>>>>> > > >
>>> >>>>>>>>> >>>>>> > > > I still think this can be positive for the
>>> project but well I am
>>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I
>>> let you the python
>>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it
>>> seems like a good
>>> >>>>>>>>> >>>>>> > > > improvement for the project.
>>> >>>>>>>>> >>>>>> > > >
>>> >>>>>>>>> >>>>>> > > > [1]
>>> https://github.com/python/black/graphs/contributors
>>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>> >>>>>>>>> >>>>>> > > > [3]
>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>> >>>>>>>>> >>>>>> > > > [4]
>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>> >>>>>>>>> >>>>>> > > >
>>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>> altay@google.com> wrote:
>>> >>>>>>>>> >>>>>> > > >>
>>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor
>>> of autoformatters but I am not familiar with this one. My concerns are:
>>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a
>>> big warning.
>>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project.
>>> For the same reason I also strongly prefer not using a fork for a specific
>>> setting. Fork will only have less people looking at it.
>>> >>>>>>>>> >>>>>> > > >>
>>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said
>>> lint issues are real as pointed in the thread. If someone would like to
>>> give it a try and see how it would look like for us that would be
>>> interesting.
>>> >>>>>>>>> >>>>>> > > >>
>>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna
>>> Kucharczyk <ka...@gmail.com> wrote:
>>> >>>>>>>>> >>>>>> > > >>>
>>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs
>>> failures are caused by lint problems.
>>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have something
>>> similar to Spotless in Java SDK (I heard there is problem with configuring
>>> Black with IntelliJ).
>>> >>>>>>>>> >>>>>> > > >>>
>>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert
>>> Bradshaw <ro...@google.com> wrote:
>>> >>>>>>>>> >>>>>> > > >>>>
>>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters,
>>> though I haven't looked at
>>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might
>>> have to go with
>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>> >>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>> >>>>>>>>> >>>>>> > > >>>>
>>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada
>>> <pa...@google.com> wrote:
>>> >>>>>>>>> >>>>>> > > >>>>>
>>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a
>>> couple people (myself included) who've been annoyed by having to take care
>>> of lint issues that maybe a code formatter could save us.
>>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>> >>>>>>>>> >>>>>> > > >>>>> -P.
>>> >>>>>>>>> >>>>>> > > >>>>>
>>> >>>>>>>>> >>>>>> > > >>>>>
>>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>> iemejia@gmail.com> wrote:
>>> >>>>>>>>> >>>>>> > > >>>>>>
>>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python
>>> code auto formatter that
>>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter
>>> for python, and wanted to
>>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the
>>> python people to get this
>>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>>> >>>>>>>>> >>>>>> > > >>>>>>
>>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has
>>> been a good improvement and
>>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of
>>> this too.
>>> >>>>>>>>> >>>>>> > > >>>>>>
>>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>> >>>>>>>>> >>>>>> > > >>>>>>
>>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> Michał Walenia
>>> >> Polidea | Software Engineer
>>> >>
>>> >> M: +48 791 432 002 <+48%20791%20432%20002>
>>> >> E: michal.walenia@polidea.com
>>> >>
>>> >> Unique Tech
>>> >> Check out our projects!
>>>
>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Ismaël Mejía <ie...@gmail.com>.
Java build fails on any unformatted code so python probably should be like
that.
We have to ensure however that it fails early on that.
As Robert said time to debate the knobs :)

On Fri, Jan 24, 2020 at 3:19 PM Kamil Wasilewski <
kamil.wasilewski@polidea.com> wrote:

> PR is ready: https://github.com/apache/beam/pull/10684. Please share your
> comments ;-) I've managed to reduce the impact a bit:
> 501 files changed, 18245 insertions(+), 19495 deletions(-)
>
> We still need to consider how to enforce the usage of autoformatter.
> Pre-commit sounds like a nice addition, but it still needs to be installed
> manually by a developer. On the other hand, Jenkins precommit job that
> fails if any unformatted code is detected looks like too strict. What do
> you think?
>
> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> Thanks! Now we get to debate what knobs to twiddle :-P
>>
>> FYI, I did a simple run (just pushed to
>> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
>> the impact. The diff is
>>
>>     $ git diff --stat master
>>     ...
>>      547 files changed, 22118 insertions(+), 21129 deletions(-)
>>
>> For reference
>>
>>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>>     ...
>>     200424  612002 7431637 total
>>
>> which means a little over 10% of lines get touched. I think there are
>> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
>> COALESCE_BRACKETS, that will conform more to the style we are already
>> (mostly) following.
>>
>>
>> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
>> <ka...@polidea.com> wrote:
>> >
>> > Thank you Michał for creating the ticket. I have some free time and I'd
>> like to volunteer myself for this task.
>> > Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf`
>> first.
>> >
>> > Best,
>> > Kamil
>> >
>> >
>> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
>> michal.walenia@polidea.com> wrote:
>> >>
>> >> Hi all,
>> >> I created a JIRA issue for this and summarized the available tools
>> >>
>> >> https://issues.apache.org/jira/browse/BEAM-9175
>> >>
>> >> Cheers,
>> >> Michal
>> >>
>> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
>> >>>
>> >>> Sorry, backing off on this due to time constraints.
>> >>>
>> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>> >>>>
>> >>>> It sounds like there's a consensus for yapf. I volunteer to take
>> this on
>> >>>>
>> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>> >>>>>
>> >>>>> +1 to autoformatting
>> >>>>>
>> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>> >>>>>>
>> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one
>> time pass to apply the spotless formatting.
>> >>>>>>
>> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained
>> project. I do support making a one time pass to apply formatting the whole
>> code base.
>> >>>>>>>
>> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com>
>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> It'd be good if there was a way to only apply to violating (or
>> at
>> >>>>>>>>> least changed) lines.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I assumed the first thing we’d do is convert all of the code in
>> one go, since it’s a very safe operation. Did you have something else in
>> mind?
>> >>>>>>>>
>> >>>>>>>> -chad
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <
>> chadrik@gmail.com> wrote:
>> >>>>>>>>> >
>> >>>>>>>>> > +1 to autoformatting
>> >>>>>>>>> >
>> >>>>>>>>> > Let me add some nuance to that.
>> >>>>>>>>> >
>> >>>>>>>>> > The way I see it there are 2 varieties of formatters:  those
>> which take the original formatting into consideration (autopep8) and those
>> which disregard it (yapf, black).
>> >>>>>>>>> >
>> >>>>>>>>> > I much prefer yapf to black, because you have plenty of
>> options to tweak with yapf (enough to make the output a pretty close match
>> to the current Beam style), and you can mark areas to preserve the original
>> formatting, which could be very useful with Pipeline building with pipe
>> operators.  Please don't pick black.
>> >>>>>>>>> >
>> >>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it
>> only corrects code that breaks the project's style rules.  The big problem
>> with Beam's current style is that it is so esoteric that autopep8 can't
>> enforce it -- and I'm not just talking about 2-spaces, which I don't really
>> have a problem with -- the problem is the use of either 2 or 4 spaces
>> depending on context (expression start vs hanging indent, etc).  This is my
>> *biggest* gripe about the current style.  PyCharm doesn't have enough
>> control either.  So, if we can choose a style that can be expressed by
>> flake8 or pycodestyle then we can use autopep8 to enforce it.
>> >>>>>>>>> >
>> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little
>> wiggle room to influence the style, but on a big project like Beam all that
>> wiggle room ends up to minor but noticeable inconsistencies in style
>> throughout the project.  yapf ensures completely consistent style, but the
>> tradeoff is that it's sometimes ugly, especially in scenarios with similar
>> repeated entries like argparse, where yapf might insert line breaks in
>> visually inconsistent and unappealing ways depending on the lengths of the
>> keywords and expressions involved.
>> >>>>>>>>> >
>> >>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be
>> a nice addition to setup a pre-commit [1] config so that people can opt in
>> to running *lightweight* autofixers prior to commit.  This will not only
>> reduce dev frustration but will also reduce the amount of cpu cycles that
>> Jenkins spends pointing out lint errors.
>> >>>>>>>>> >
>> >>>>>>>>> > [1] https://pre-commit.com/
>> >>>>>>>>> >
>> >>>>>>>>> > -chad
>> >>>>>>>>> >
>> >>>>>>>>> >
>> >>>>>>>>> >
>> >>>>>>>>> >
>> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
>> iemejia@gmail.com> wrote:
>> >>>>>>>>> >>
>> >>>>>>>>> >> Last time we discussed this there seems not to be much
>> progress into autoformatting.
>> >>>>>>>>> >> This tool looks more tweakable, so maybe it could be more
>> appropriate for Beam's use case.
>> >>>>>>>>> >> https://github.com/google/yapf/
>> >>>>>>>>> >> WDYT?
>> >>>>>>>>> >>
>> >>>>>>>>> >>
>> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>> lgajowy@apache.org> wrote:
>> >>>>>>>>> >>>
>> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job.
>> My experience is that since spotless in Java SDK I would never start a new
>> Java project without it. So many great benefits not only for one person
>> coding but for all community.
>> >>>>>>>>> >>>
>> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past
>> the reformat. It is not actually that hard, but does take a couple extra
>> clicks to get GitHub to display blame before a reformat. It is easier with
>> the command line. I do a lot of code history digging and the global Java
>> reformat is not really a problem.
>> >>>>>>>>> >>>
>> >>>>>>>>> >>> It's actually one more click on Github but I agree it's not
>> the best way to search the history. The most convenient and clear one I've
>> found so far is in Jetbrains IDEs (Intelij) where you can:
>> >>>>>>>>> >>>
>> >>>>>>>>> >>> right click on line number -> "annotate" -> click again ->
>> "annotate previous revision" -> ...
>> >>>>>>>>> >>>
>> >>>>>>>>> >>> You can also use "compare with" to see the diff between two
>> revisions.
>> >>>>>>>>> >>>
>> >>>>>>>>> >>> Łukasz
>> >>>>>>>>> >>>
>> >>>>>>>>> >>>
>> >>>>>>>>> >>>
>> >>>>>>>>> >>>
>> >>>>>>>>> >>>
>> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>> napisał(a):
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems
>> there are some issues?)
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>> ka.kucharczyk@gmail.com> wrote:
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review
>> iteration. We will skip some of conversations about code style.
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>> ...
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> Last but not least, new contributor may be less
>> discouraged. When I started contribute I didn’t know how to format my code
>> and I lost a lot of time to add pylint and adjust IntelliJ. I eventually
>> failed. Currently I write code intuitively and when I don’t forget I rerun
>> tox.
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much
>> for Java. It is a community benefit. You do not have to be a contributor to
>> the Python SDK to support this. That is why I am writing here. Just
>> eliminate all discussion of formatting. It doesn't really matter what the
>> resulting format is, if it is not crazy to read. I strongly oppose
>> maintaining a non-default format.
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
>> reformat touched 50k lines. It does not really matter how big it is.
>> Definitely do it all at once if you think the tool is good enough. And you
>> should pin a version, so churn is not a problem. You can upgrade the
>> version and reformat in a PR later and that is also easy.
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past
>> the reformat. It is not actually that hard, but does take a couple extra
>> clicks to get GitHub to display blame before a reformat. It is easier with
>> the command line. I do a lot of code history digging and the global Java
>> reformat is not really a problem.
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>> Kenn
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> Also everything will be formatted in a same way, so
>> eventually it would be easier to read.
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot
>> of Jenkins failures won’t take place, so we save time and resources.
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom
>> syntax and after formatting they looks a little bit weird, but maybe
>> extending the only configurable option in Black - lines, from 88 to 110
>> would be solution.
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I
>> don’t know how big obstacle it would be.
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> I believe there are two options how it would be possible
>> to introduce Black. First: just do it, it will hurt but then it would be ok
>> (same as a dentist appointment). Of course it may require some work to
>> adjust linters. On the other hand we can do it gradually and start
>> including sdk parts one by one - maybe it will be less painful?
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> As an example I can share one of projects [2] I know that
>> uses Black (they use also other cool checkers and pre-commit [3]). This is
>> how looks their build with all checks [4].
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> To sum up I believe that if we want improve our coding
>> experience, we should improve our toolset. Black seems be recent and quite
>> popular tool what makes think they won’t stop developing it.
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> [1]
>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> [2]
>> https://github.com/GoogleCloudPlatform/oozie-to-airflow
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> [3] https://pre-commit.com
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> [4]
>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>>
>> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as
>> it would change nearly every single line in the codebase (and the loss of
>> all context as well as that particular line).
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However,
>> we don't conform to that either--we use 2 spaces for indentation, but 4 for
>> continuation indentation. (As for the history of this, this goes back to
>> Google's internal style guide, probably motivated by consistency with C++,
>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>> lines quite frequently (it's telling that black's default line length is
>> 88)). This turns out to be an easy change to the codebase.
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I
>> found that this tool introduces a huge amount of vertical whitespace (e.g.
>> closing parentheses on their own line), e.g.
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> def foo(
>> >>>>>>>>> >>>>>>     args
>> >>>>>>>>> >>>>>> ):
>> >>>>>>>>> >>>>>>   if (
>> >>>>>>>>> >>>>>>       long expression)
>> >>>>>>>>> >>>>>>   ):
>> >>>>>>>>> >>>>>>     func(
>> >>>>>>>>> >>>>>>         args
>> >>>>>>>>> >>>>>>     )
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing
>> parentheses on the same lines, as well as omit the newline after "if (",
>> and disabling formatting of strings, which reduce the churn in our codebase
>> to 15k lines (adding about 4k) out of 200k total.
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> It's still very opinionated, often in different ways
>> then me, and doesn't understand the semantics of the code, but possibly
>> something we could live with given the huge advantages of an autoformatter.
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> An intermediate point would be to allow, but not
>> require, autoformatting of changed lines.
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a
>> decent number of contributors and in my book being in the python github
>> project is a strong positive signal. But, due to the above issues, I think
>> we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4
>> space issue is a 2-line change, and the rest implemented as a
>> post-processing step (for now, incomplete), so it'd be easy to stay in sync
>> with upstream.)
>> >>>>>>>>> >>>>>>
>> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>> iemejia@gmail.com> wrote:
>> >>>>>>>>> >>>>>> >
>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a
>> way to fit our
>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible
>> to reformat the
>> >>>>>>>>> >>>>>> > > entire Python SDK.
>> >>>>>>>>> >>>>>> >
>> >>>>>>>>> >>>>>> > It cannot be configured to do what we actually do
>> because Black is
>> >>>>>>>>> >>>>>> > configurable only to support the standard python
>> codestyle guidelines
>> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most
>> projects in the
>> >>>>>>>>> >>>>>> > python world use.
>> >>>>>>>>> >>>>>> >
>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the
>> Git history. This
>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I
>> have the feeling
>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python
>> because the linter has
>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>> >>>>>>>>> >>>>>> >
>> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always
>> tradeoffs we have
>> >>>>>>>>> >>>>>> > to deal with.
>> >>>>>>>>> >>>>>> >
>> >>>>>>>>> >>>>>> >
>> >>>>>>>>> >>>>>> >
>> >>>>>>>>> >>>>>> >
>> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>> mxm@apache.org> wrote:
>> >>>>>>>>> >>>>>> > >
>> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a
>> way to fit our
>> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible
>> to reformat the
>> >>>>>>>>> >>>>>> > > entire Python SDK.
>> >>>>>>>>> >>>>>> > >
>> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the
>> Git history. This
>> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I
>> have the feeling
>> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python
>> because the linter has
>> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>> >>>>>>>>> >>>>>> > >
>> >>>>>>>>> >>>>>> > > -Max
>> >>>>>>>>> >>>>>> > >
>> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>> >>>>>>>>> >>>>>> > > >> My concerns are:
>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a
>> big warning.
>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project.
>> For the same reason I also strongly prefer not using a fork for a specific
>> setting. Fork will only have less people looking at it.
>> >>>>>>>>> >>>>>> > > >
>> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it
>> is recent, it was
>> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things
>> can change since
>> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think
>> beta in that case is
>> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>> contribution page [1]
>> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a single
>> person project, there
>> >>>>>>>>> >>>>>> > > > have been 93 independent contributions since the
>> project became
>> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the
>> python organization
>> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project
>> continuity.
>> >>>>>>>>> >>>>>> > > >
>> >>>>>>>>> >>>>>> > > > You are right however about the fact that the main
>> author seems to be
>> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces
>> issue he can seem
>> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style
>> guide recommendations
>> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow
>> the 4 spaces
>> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own
>> Python style guide [4],
>> >>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the
>> current policy to
>> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>> >>>>>>>>> >>>>>> > > >
>> >>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7
>> compatibility on
>> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
>> positive given the
>> >>>>>>>>> >>>>>> > > > size of our code base.
>> >>>>>>>>> >>>>>> > > >
>> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4
>> files failed to reformat.
>> >>>>>>>>> >>>>>> > > >
>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:
>>  file=sys.stderr)
>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:
>>  print(traceback_string, file=sys.stderr)
>> >>>>>>>>> >>>>>> > > > error: cannot format
>> >>>>>>>>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc ==
>> last_pc else '    ',
>> >>>>>>>>> >>>>>> > > > end=' ')
>> >>>>>>>>> >>>>>> > > >
>> >>>>>>>>> >>>>>> > > > I still think this can be positive for the project
>> but well I am
>> >>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I
>> let you the python
>> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it
>> seems like a good
>> >>>>>>>>> >>>>>> > > > improvement for the project.
>> >>>>>>>>> >>>>>> > > >
>> >>>>>>>>> >>>>>> > > > [1]
>> https://github.com/python/black/graphs/contributors
>> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
>> >>>>>>>>> >>>>>> > > > [3]
>> https://www.python.org/dev/peps/pep-0008/#indentation
>> >>>>>>>>> >>>>>> > > > [4]
>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>> >>>>>>>>> >>>>>> > > >
>> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>> altay@google.com> wrote:
>> >>>>>>>>> >>>>>> > > >>
>> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor
>> of autoformatters but I am not familiar with this one. My concerns are:
>> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a
>> big warning.
>> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project.
>> For the same reason I also strongly prefer not using a fork for a specific
>> setting. Fork will only have less people looking at it.
>> >>>>>>>>> >>>>>> > > >>
>> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said
>> lint issues are real as pointed in the thread. If someone would like to
>> give it a try and see how it would look like for us that would be
>> interesting.
>> >>>>>>>>> >>>>>> > > >>
>> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna
>> Kucharczyk <ka...@gmail.com> wrote:
>> >>>>>>>>> >>>>>> > > >>>
>> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs
>> failures are caused by lint problems.
>> >>>>>>>>> >>>>>> > > >>> I think it would be great to have something
>> similar to Spotless in Java SDK (I heard there is problem with configuring
>> Black with IntelliJ).
>> >>>>>>>>> >>>>>> > > >>>
>> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw
>> <ro...@google.com> wrote:
>> >>>>>>>>> >>>>>> > > >>>>
>> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters,
>> though I haven't looked at
>> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might
>> have to go with
>> >>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>> >>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>> >>>>>>>>> >>>>>> > > >>>>
>> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>> pabloem@google.com> wrote:
>> >>>>>>>>> >>>>>> > > >>>>>
>> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a
>> couple people (myself included) who've been annoyed by having to take care
>> of lint issues that maybe a code formatter could save us.
>> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>> >>>>>>>>> >>>>>> > > >>>>> -P.
>> >>>>>>>>> >>>>>> > > >>>>>
>> >>>>>>>>> >>>>>> > > >>>>>
>> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>> iemejia@gmail.com> wrote:
>> >>>>>>>>> >>>>>> > > >>>>>>
>> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python
>> code auto formatter that
>> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for
>> python, and wanted to
>> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the
>> python people to get this
>> >>>>>>>>> >>>>>> > > >>>>>> into the build?
>> >>>>>>>>> >>>>>> > > >>>>>>
>> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has
>> been a good improvement and
>> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of
>> this too.
>> >>>>>>>>> >>>>>> > > >>>>>>
>> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
>> >>>>>>>>> >>>>>> > > >>>>>>
>> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Michał Walenia
>> >> Polidea | Software Engineer
>> >>
>> >> M: +48 791 432 002
>> >> E: michal.walenia@polidea.com
>> >>
>> >> Unique Tech
>> >> Check out our projects!
>>
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Kamil Wasilewski <ka...@polidea.com>.
PR is ready: https://github.com/apache/beam/pull/10684. Please share your
comments ;-) I've managed to reduce the impact a bit:
501 files changed, 18245 insertions(+), 19495 deletions(-)

We still need to consider how to enforce the usage of autoformatter.
Pre-commit sounds like a nice addition, but it still needs to be installed
manually by a developer. On the other hand, Jenkins precommit job that
fails if any unformatted code is detected looks like too strict. What do
you think?

On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw <ro...@google.com> wrote:

> Thanks! Now we get to debate what knobs to twiddle :-P
>
> FYI, I did a simple run (just pushed to
> https://github.com/apache/beam/compare/master...robertwb:yapf) to see
> the impact. The diff is
>
>     $ git diff --stat master
>     ...
>      547 files changed, 22118 insertions(+), 21129 deletions(-)
>
> For reference
>
>     $ find sdks/python/apache_beam -name '*.py' | xargs wc
>     ...
>     200424  612002 7431637 total
>
> which means a little over 10% of lines get touched. I think there are
> some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
> COALESCE_BRACKETS, that will conform more to the style we are already
> (mostly) following.
>
>
> On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
> <ka...@polidea.com> wrote:
> >
> > Thank you Michał for creating the ticket. I have some free time and I'd
> like to volunteer myself for this task.
> > Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf`
> first.
> >
> > Best,
> > Kamil
> >
> >
> > On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <
> michal.walenia@polidea.com> wrote:
> >>
> >> Hi all,
> >> I created a JIRA issue for this and summarized the available tools
> >>
> >> https://issues.apache.org/jira/browse/BEAM-9175
> >>
> >> Cheers,
> >> Michal
> >>
> >> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
> >>>
> >>> Sorry, backing off on this due to time constraints.
> >>>
> >>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
> >>>>
> >>>> It sounds like there's a consensus for yapf. I volunteer to take this
> on
> >>>>
> >>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
> >>>>>
> >>>>> +1 to autoformatting
> >>>>>
> >>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
> >>>>>>
> >>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one
> time pass to apply the spotless formatting.
> >>>>>>
> >>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com>
> wrote:
> >>>>>>>
> >>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained
> project. I do support making a one time pass to apply formatting the whole
> code base.
> >>>>>>>
> >>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com>
> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> It'd be good if there was a way to only apply to violating (or at
> >>>>>>>>> least changed) lines.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I assumed the first thing we’d do is convert all of the code in
> one go, since it’s a very safe operation. Did you have something else in
> mind?
> >>>>>>>>
> >>>>>>>> -chad
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com>
> wrote:
> >>>>>>>>> >
> >>>>>>>>> > +1 to autoformatting
> >>>>>>>>> >
> >>>>>>>>> > Let me add some nuance to that.
> >>>>>>>>> >
> >>>>>>>>> > The way I see it there are 2 varieties of formatters:  those
> which take the original formatting into consideration (autopep8) and those
> which disregard it (yapf, black).
> >>>>>>>>> >
> >>>>>>>>> > I much prefer yapf to black, because you have plenty of
> options to tweak with yapf (enough to make the output a pretty close match
> to the current Beam style), and you can mark areas to preserve the original
> formatting, which could be very useful with Pipeline building with pipe
> operators.  Please don't pick black.
> >>>>>>>>> >
> >>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it
> only corrects code that breaks the project's style rules.  The big problem
> with Beam's current style is that it is so esoteric that autopep8 can't
> enforce it -- and I'm not just talking about 2-spaces, which I don't really
> have a problem with -- the problem is the use of either 2 or 4 spaces
> depending on context (expression start vs hanging indent, etc).  This is my
> *biggest* gripe about the current style.  PyCharm doesn't have enough
> control either.  So, if we can choose a style that can be expressed by
> flake8 or pycodestyle then we can use autopep8 to enforce it.
> >>>>>>>>> >
> >>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little
> wiggle room to influence the style, but on a big project like Beam all that
> wiggle room ends up to minor but noticeable inconsistencies in style
> throughout the project.  yapf ensures completely consistent style, but the
> tradeoff is that it's sometimes ugly, especially in scenarios with similar
> repeated entries like argparse, where yapf might insert line breaks in
> visually inconsistent and unappealing ways depending on the lengths of the
> keywords and expressions involved.
> >>>>>>>>> >
> >>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be
> a nice addition to setup a pre-commit [1] config so that people can opt in
> to running *lightweight* autofixers prior to commit.  This will not only
> reduce dev frustration but will also reduce the amount of cpu cycles that
> Jenkins spends pointing out lint errors.
> >>>>>>>>> >
> >>>>>>>>> > [1] https://pre-commit.com/
> >>>>>>>>> >
> >>>>>>>>> > -chad
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> >>>>>>>>> >>
> >>>>>>>>> >> Last time we discussed this there seems not to be much
> progress into autoformatting.
> >>>>>>>>> >> This tool looks more tweakable, so maybe it could be more
> appropriate for Beam's use case.
> >>>>>>>>> >> https://github.com/google/yapf/
> >>>>>>>>> >> WDYT?
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
> lgajowy@apache.org> wrote:
> >>>>>>>>> >>>
> >>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job.
> My experience is that since spotless in Java SDK I would never start a new
> Java project without it. So many great benefits not only for one person
> coding but for all community.
> >>>>>>>>> >>>
> >>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past
> the reformat. It is not actually that hard, but does take a couple extra
> clicks to get GitHub to display blame before a reformat. It is easier with
> the command line. I do a lot of code history digging and the global Java
> reformat is not really a problem.
> >>>>>>>>> >>>
> >>>>>>>>> >>> It's actually one more click on Github but I agree it's not
> the best way to search the history. The most convenient and clear one I've
> found so far is in Jetbrains IDEs (Intelij) where you can:
> >>>>>>>>> >>>
> >>>>>>>>> >>> right click on line number -> "annotate" -> click again ->
> "annotate previous revision" -> ...
> >>>>>>>>> >>>
> >>>>>>>>> >>> You can also use "compare with" to see the diff between two
> revisions.
> >>>>>>>>> >>>
> >>>>>>>>> >>> Łukasz
> >>>>>>>>> >>>
> >>>>>>>>> >>>
> >>>>>>>>> >>>
> >>>>>>>>> >>>
> >>>>>>>>> >>>
> >>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
> napisał(a):
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems
> there are some issues?)
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
> ka.kucharczyk@gmail.com> wrote:
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review
> iteration. We will skip some of conversations about code style.
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> ...
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> Last but not least, new contributor may be less
> discouraged. When I started contribute I didn’t know how to format my code
> and I lost a lot of time to add pylint and adjust IntelliJ. I eventually
> failed. Currently I write code intuitively and when I don’t forget I rerun
> tox.
> >>>>>>>>> >>>>
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much
> for Java. It is a community benefit. You do not have to be a contributor to
> the Python SDK to support this. That is why I am writing here. Just
> eliminate all discussion of formatting. It doesn't really matter what the
> resulting format is, if it is not crazy to read. I strongly oppose
> maintaining a non-default format.
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
> reformat touched 50k lines. It does not really matter how big it is.
> Definitely do it all at once if you think the tool is good enough. And you
> should pin a version, so churn is not a problem. You can upgrade the
> version and reformat in a PR later and that is also easy.
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past
> the reformat. It is not actually that hard, but does take a couple extra
> clicks to get GitHub to display blame before a reformat. It is easier with
> the command line. I do a lot of code history digging and the global Java
> reformat is not really a problem.
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> Kenn
> >>>>>>>>> >>>>
> >>>>>>>>> >>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> Also everything will be formatted in a same way, so
> eventually it would be easier to read.
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot
> of Jenkins failures won’t take place, so we save time and resources.
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom
> syntax and after formatting they looks a little bit weird, but maybe
> extending the only configurable option in Black - lines, from 88 to 110
> would be solution.
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I
> don’t know how big obstacle it would be.
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> I believe there are two options how it would be possible
> to introduce Black. First: just do it, it will hurt but then it would be ok
> (same as a dentist appointment). Of course it may require some work to
> adjust linters. On the other hand we can do it gradually and start
> including sdk parts one by one - maybe it will be less painful?
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> As an example I can share one of projects [2] I know that
> uses Black (they use also other cool checkers and pre-commit [3]). This is
> how looks their build with all checks [4].
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> To sum up I believe that if we want improve our coding
> experience, we should improve our toolset. Black seems be recent and quite
> popular tool what makes think they won’t stop developing it.
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> [1]
> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> [2]
> https://github.com/GoogleCloudPlatform/oozie-to-airflow
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> [3] https://pre-commit.com
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> [4]
> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>>
> >>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it
> would change nearly every single line in the codebase (and the loss of all
> context as well as that particular line).
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we
> don't conform to that either--we use 2 spaces for indentation, but 4 for
> continuation indentation. (As for the history of this, this goes back to
> Google's internal style guide, probably motivated by consistency with C++,
> Java, ... and the fact that with an indent level of 4 one ends up wrapping
> lines quite frequently (it's telling that black's default line length is
> 88)). This turns out to be an easy change to the codebase.
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found
> that this tool introduces a huge amount of vertical whitespace (e.g.
> closing parentheses on their own line), e.g.
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> def foo(
> >>>>>>>>> >>>>>>     args
> >>>>>>>>> >>>>>> ):
> >>>>>>>>> >>>>>>   if (
> >>>>>>>>> >>>>>>       long expression)
> >>>>>>>>> >>>>>>   ):
> >>>>>>>>> >>>>>>     func(
> >>>>>>>>> >>>>>>         args
> >>>>>>>>> >>>>>>     )
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> I wrote a simple post-processor to put closing
> parentheses on the same lines, as well as omit the newline after "if (",
> and disabling formatting of strings, which reduce the churn in our codebase
> to 15k lines (adding about 4k) out of 200k total.
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> It's still very opinionated, often in different ways then
> me, and doesn't understand the semantics of the code, but possibly
> something we could live with given the huge advantages of an autoformatter.
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> An intermediate point would be to allow, but not require,
> autoformatting of changed lines.
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a
> decent number of contributors and in my book being in the python github
> project is a strong positive signal. But, due to the above issues, I think
> we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4
> space issue is a 2-line change, and the rest implemented as a
> post-processing step (for now, incomplete), so it'd be easy to stay in sync
> with upstream.)
> >>>>>>>>> >>>>>>
> >>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> >>>>>>>>> >>>>>> >
> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a
> way to fit our
> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible
> to reformat the
> >>>>>>>>> >>>>>> > > entire Python SDK.
> >>>>>>>>> >>>>>> >
> >>>>>>>>> >>>>>> > It cannot be configured to do what we actually do
> because Black is
> >>>>>>>>> >>>>>> > configurable only to support the standard python
> codestyle guidelines
> >>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most
> projects in the
> >>>>>>>>> >>>>>> > python world use.
> >>>>>>>>> >>>>>> >
> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
> history. This
> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I
> have the feeling
> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python
> because the linter has
> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >>>>>>>>> >>>>>> >
> >>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always
> tradeoffs we have
> >>>>>>>>> >>>>>> > to deal with.
> >>>>>>>>> >>>>>> >
> >>>>>>>>> >>>>>> >
> >>>>>>>>> >>>>>> >
> >>>>>>>>> >>>>>> >
> >>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
> mxm@apache.org> wrote:
> >>>>>>>>> >>>>>> > >
> >>>>>>>>> >>>>>> > > I think the question is if it can be configured in a
> way to fit our
> >>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible
> to reformat the
> >>>>>>>>> >>>>>> > > entire Python SDK.
> >>>>>>>>> >>>>>> > >
> >>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
> history. This
> >>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I
> have the feeling
> >>>>>>>>> >>>>>> > > that this might be less of a problem with Python
> because the linter has
> >>>>>>>>> >>>>>> > > more rules than Checkstyle had.
> >>>>>>>>> >>>>>> > >
> >>>>>>>>> >>>>>> > > -Max
> >>>>>>>>> >>>>>> > >
> >>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
> >>>>>>>>> >>>>>> > > >> My concerns are:
> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
> warning.
> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project.
> For the same reason I also strongly prefer not using a fork for a specific
> setting. Fork will only have less people looking at it.
> >>>>>>>>> >>>>>> > > >
> >>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it
> is recent, it was
> >>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things
> can change since
> >>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think
> beta in that case is
> >>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
> contribution page [1]
> >>>>>>>>> >>>>>> > > > you can notice that it is less and less a single
> person project, there
> >>>>>>>>> >>>>>> > > > have been 93 independent contributions since the
> project became
> >>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the
> python organization
> >>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project
> continuity.
> >>>>>>>>> >>>>>> > > >
> >>>>>>>>> >>>>>> > > > You are right however about the fact that the main
> author seems to be
> >>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces
> issue he can seem
> >>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style
> guide recommendations
> >>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow
> the 4 spaces
> >>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python
> style guide [4],
> >>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the
> current policy to
> >>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
> >>>>>>>>> >>>>>> > > >
> >>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7
> compatibility on
> >>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
> positive given the
> >>>>>>>>> >>>>>> > > > size of our code base.
> >>>>>>>>> >>>>>> > > >
> >>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4
> files failed to reformat.
> >>>>>>>>> >>>>>> > > >
> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
> >>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
> >>>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
> >>>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
> file=sys.stderr)
> >>>>>>>>> >>>>>> > > > error: cannot format
> >>>>>>>>> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
> >>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc ==
> last_pc else '    ',
> >>>>>>>>> >>>>>> > > > end=' ')
> >>>>>>>>> >>>>>> > > >
> >>>>>>>>> >>>>>> > > > I still think this can be positive for the project
> but well I am
> >>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I
> let you the python
> >>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it
> seems like a good
> >>>>>>>>> >>>>>> > > > improvement for the project.
> >>>>>>>>> >>>>>> > > >
> >>>>>>>>> >>>>>> > > > [1]
> https://github.com/python/black/graphs/contributors
> >>>>>>>>> >>>>>> > > > [2] https://github.com/python
> >>>>>>>>> >>>>>> > > > [3]
> https://www.python.org/dev/peps/pep-0008/#indentation
> >>>>>>>>> >>>>>> > > > [4]
> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
> >>>>>>>>> >>>>>> > > >
> >>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
> altay@google.com> wrote:
> >>>>>>>>> >>>>>> > > >>
> >>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor
> of autoformatters but I am not familiar with this one. My concerns are:
> >>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
> warning.
> >>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project.
> For the same reason I also strongly prefer not using a fork for a specific
> setting. Fork will only have less people looking at it.
> >>>>>>>>> >>>>>> > > >>
> >>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said
> lint issues are real as pointed in the thread. If someone would like to
> give it a try and see how it would look like for us that would be
> interesting.
> >>>>>>>>> >>>>>> > > >>
> >>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna
> Kucharczyk <ka...@gmail.com> wrote:
> >>>>>>>>> >>>>>> > > >>>
> >>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs
> failures are caused by lint problems.
> >>>>>>>>> >>>>>> > > >>> I think it would be great to have something
> similar to Spotless in Java SDK (I heard there is problem with configuring
> Black with IntelliJ).
> >>>>>>>>> >>>>>> > > >>>
> >>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>>>>> >>>>>> > > >>>>
> >>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though
> I haven't looked at
> >>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might
> have to go with
> >>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
> >>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
> >>>>>>>>> >>>>>> > > >>>>
> >>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
> pabloem@google.com> wrote:
> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a
> couple people (myself included) who've been annoyed by having to take care
> of lint issues that maybe a code formatter could save us.
> >>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
> >>>>>>>>> >>>>>> > > >>>>> -P.
> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>>>> >>>>>> > > >>>>>
> >>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python
> code auto formatter that
> >>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for
> python, and wanted to
> >>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the
> python people to get this
> >>>>>>>>> >>>>>> > > >>>>>> into the build?
> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been
> a good improvement and
> >>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this
> too.
> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>>>> >>>>>> > > >>>>>> WDYT?
> >>>>>>>>> >>>>>> > > >>>>>>
> >>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
> >>
> >>
> >>
> >> --
> >>
> >> Michał Walenia
> >> Polidea | Software Engineer
> >>
> >> M: +48 791 432 002
> >> E: michal.walenia@polidea.com
> >>
> >> Unique Tech
> >> Check out our projects!
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Robert Bradshaw <ro...@google.com>.
Thanks! Now we get to debate what knobs to twiddle :-P

FYI, I did a simple run (just pushed to
https://github.com/apache/beam/compare/master...robertwb:yapf) to see
the impact. The diff is

    $ git diff --stat master
    ...
     547 files changed, 22118 insertions(+), 21129 deletions(-)

For reference

    $ find sdks/python/apache_beam -name '*.py' | xargs wc
    ...
    200424  612002 7431637 total

which means a little over 10% of lines get touched. I think there are
some options, such as SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES and
COALESCE_BRACKETS, that will conform more to the style we are already
(mostly) following.


On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski
<ka...@polidea.com> wrote:
>
> Thank you Michał for creating the ticket. I have some free time and I'd like to volunteer myself for this task.
> Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf` first.
>
> Best,
> Kamil
>
>
> On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <mi...@polidea.com> wrote:
>>
>> Hi all,
>> I created a JIRA issue for this and summarized the available tools
>>
>> https://issues.apache.org/jira/browse/BEAM-9175
>>
>> Cheers,
>> Michal
>>
>> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>> Sorry, backing off on this due to time constraints.
>>>
>>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>> It sounds like there's a consensus for yapf. I volunteer to take this on
>>>>
>>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>>>>>
>>>>> +1 to autoformatting
>>>>>
>>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one time pass to apply the spotless formatting.
>>>>>>
>>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>>
>>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained project. I do support making a one time pass to apply formatting the whole code base.
>>>>>>>
>>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It'd be good if there was a way to only apply to violating (or at
>>>>>>>>> least changed) lines.
>>>>>>>>
>>>>>>>>
>>>>>>>> I assumed the first thing we’d do is convert all of the code in one go, since it’s a very safe operation. Did you have something else in mind?
>>>>>>>>
>>>>>>>> -chad
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > +1 to autoformatting
>>>>>>>>> >
>>>>>>>>> > Let me add some nuance to that.
>>>>>>>>> >
>>>>>>>>> > The way I see it there are 2 varieties of formatters:  those which take the original formatting into consideration (autopep8) and those which disregard it (yapf, black).
>>>>>>>>> >
>>>>>>>>> > I much prefer yapf to black, because you have plenty of options to tweak with yapf (enough to make the output a pretty close match to the current Beam style), and you can mark areas to preserve the original formatting, which could be very useful with Pipeline building with pipe operators.  Please don't pick black.
>>>>>>>>> >
>>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it only corrects code that breaks the project's style rules.  The big problem with Beam's current style is that it is so esoteric that autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which I don't really have a problem with -- the problem is the use of either 2 or 4 spaces depending on context (expression start vs hanging indent, etc).  This is my *biggest* gripe about the current style.  PyCharm doesn't have enough control either.  So, if we can choose a style that can be expressed by flake8 or pycodestyle then we can use autopep8 to enforce it.
>>>>>>>>> >
>>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle room to influence the style, but on a big project like Beam all that wiggle room ends up to minor but noticeable inconsistencies in style throughout the project.  yapf ensures completely consistent style, but the tradeoff is that it's sometimes ugly, especially in scenarios with similar repeated entries like argparse, where yapf might insert line breaks in visually inconsistent and unappealing ways depending on the lengths of the keywords and expressions involved.
>>>>>>>>> >
>>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be a nice addition to setup a pre-commit [1] config so that people can opt in to running *lightweight* autofixers prior to commit.  This will not only reduce dev frustration but will also reduce the amount of cpu cycles that Jenkins spends pointing out lint errors.
>>>>>>>>> >
>>>>>>>>> > [1] https://pre-commit.com/
>>>>>>>>> >
>>>>>>>>> > -chad
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Last time we discussed this there seems not to be much progress into autoformatting.
>>>>>>>>> >> This tool looks more tweakable, so maybe it could be more appropriate for Beam's use case.
>>>>>>>>> >> https://github.com/google/yapf/
>>>>>>>>> >> WDYT?
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My experience is that since spotless in Java SDK I would never start a new Java project without it. So many great benefits not only for one person coding but for all community.
>>>>>>>>> >>>
>>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>>>>>>>>> >>>
>>>>>>>>> >>> It's actually one more click on Github but I agree it's not the best way to search the history. The most convenient and clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>>>>>> >>>
>>>>>>>>> >>> right click on line number -> "annotate" -> click again -> "annotate previous revision" -> ...
>>>>>>>>> >>>
>>>>>>>>> >>> You can also use "compare with" to see the diff between two revisions.
>>>>>>>>> >>>
>>>>>>>>> >>> Łukasz
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
>>>>>>>>> >>>>
>>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there are some issues?)
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review iteration. We will skip some of conversations about code style.
>>>>>>>>> >>>>
>>>>>>>>> >>>> ...
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Last but not least, new contributor may be less discouraged. When I started contribute I didn’t know how to format my code and I lost a lot of time to add pylint and adjust IntelliJ. I eventually failed. Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much for Java. It is a community benefit. You do not have to be a contributor to the Python SDK to support this. That is why I am writing here. Just eliminate all discussion of formatting. It doesn't really matter what the resulting format is, if it is not crazy to read. I strongly oppose maintaining a non-default format.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat touched 50k lines. It does not really matter how big it is. Definitely do it all at once if you think the tool is good enough. And you should pin a version, so churn is not a problem. You can upgrade the version and reformat in a PR later and that is also easy.
>>>>>>>>> >>>>
>>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Kenn
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Also everything will be formatted in a same way, so eventually it would be easier to read.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins failures won’t take place, so we save time and resources.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and after formatting they looks a little bit weird, but maybe extending the only configurable option in Black - lines, from 88 to 110 would be solution.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t know how big obstacle it would be.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> I believe there are two options how it would be possible to introduce Black. First: just do it, it will hurt but then it would be ok (same as a dentist appointment). Of course it may require some work to adjust linters. On the other hand we can do it gradually and start including sdk parts one by one - maybe it will be less painful?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> As an example I can share one of projects [2] I know that uses Black (they use also other cool checkers and pre-commit [3]). This is how looks their build with all checks [4].
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> To sum up I believe that if we want improve our coding experience, we should improve our toolset. Black seems be recent and quite popular tool what makes think they won’t stop developing it.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> [1] https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> [3] https://pre-commit.com
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> [4] https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change nearly every single line in the codebase (and the loss of all context as well as that particular line).
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't conform to that either--we use 2 spaces for indentation, but 4 for continuation indentation. (As for the history of this, this goes back to Google's internal style guide, probably motivated by consistency with C++, Java, ... and the fact that with an indent level of 4 one ends up wrapping lines quite frequently (it's telling that black's default line length is 88)). This turns out to be an easy change to the codebase.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool introduces a huge amount of vertical whitespace (e.g. closing parentheses on their own line), e.g.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> def foo(
>>>>>>>>> >>>>>>     args
>>>>>>>>> >>>>>> ):
>>>>>>>>> >>>>>>   if (
>>>>>>>>> >>>>>>       long expression)
>>>>>>>>> >>>>>>   ):
>>>>>>>>> >>>>>>     func(
>>>>>>>>> >>>>>>         args
>>>>>>>>> >>>>>>     )
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on the same lines, as well as omit the newline after "if (", and disabling formatting of strings, which reduce the churn in our codebase to 15k lines (adding about 4k) out of 200k total.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> It's still very opinionated, often in different ways then me, and doesn't understand the semantics of the code, but possibly something we could live with given the huge advantages of an autoformatter.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> An intermediate point would be to allow, but not require, autoformatting of changed lines.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent number of contributors and in my book being in the python github project is a strong positive signal. But, due to the above issues, I think we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest implemented as a post-processing step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>>>> >>>>>> >
>>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>>>> >>>>>> >
>>>>>>>>> >>>>>> > It cannot be configured to do what we actually do because Black is
>>>>>>>>> >>>>>> > configurable only to support the standard python codestyle guidelines
>>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>>>>>>> >>>>>> > python world use.
>>>>>>>>> >>>>>> >
>>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
>>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>>>> >>>>>> >
>>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>>>>>>> >>>>>> > to deal with.
>>>>>>>>> >>>>>> >
>>>>>>>>> >>>>>> >
>>>>>>>>> >>>>>> >
>>>>>>>>> >>>>>> >
>>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org> wrote:
>>>>>>>>> >>>>>> > >
>>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>>>> >>>>>> > >
>>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>>>>>>> >>>>>> > > that this might be less of a problem with Python because the linter has
>>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>>>> >>>>>> > >
>>>>>>>>> >>>>>> > > -Max
>>>>>>>>> >>>>>> > >
>>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>>>>> >>>>>> > > >> My concerns are:
>>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>>>>>>>>> >>>>>> > > >
>>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is recent, it was
>>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can change since
>>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that case is
>>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the contribution page [1]
>>>>>>>>> >>>>>> > > > you can notice that it is less and less a single person project, there
>>>>>>>>> >>>>>> > > > have been 93 independent contributions since the project became
>>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python organization
>>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project continuity.
>>>>>>>>> >>>>>> > > >
>>>>>>>>> >>>>>> > > > You are right however about the fact that the main author seems to be
>>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide recommendations
>>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide [4],
>>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the current policy to
>>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>>>>>> >>>>>> > > >
>>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is positive given the
>>>>>>>>> >>>>>> > > > size of our code base.
>>>>>>>>> >>>>>> > > >
>>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to reformat.
>>>>>>>>> >>>>>> > > >
>>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string, file=sys.stderr)
>>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>>> >>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '    ',
>>>>>>>>> >>>>>> > > > end=' ')
>>>>>>>>> >>>>>> > > >
>>>>>>>>> >>>>>> > > > I still think this can be positive for the project but well I am
>>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I let you the python
>>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>>>>>>> >>>>>> > > > improvement for the project.
>>>>>>>>> >>>>>> > > >
>>>>>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>>>>>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>>>>> >>>>>> > > > [4] https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>>>>> >>>>>> > > >
>>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>>>> >>>>>> > > >>
>>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of autoformatters but I am not familiar with this one. My concerns are:
>>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>>>>>>>>> >>>>>> > > >>
>>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint issues are real as pointed in the thread. If someone would like to give it a try and see how it would look like for us that would be interesting.
>>>>>>>>> >>>>>> > > >>
>>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>>>>>>> >>>>>> > > >>>
>>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are caused by lint problems.
>>>>>>>>> >>>>>> > > >>> I think it would be great to have something similar to Spotless in Java SDK (I heard there is problem with configuring Black with IntelliJ).
>>>>>>>>> >>>>>> > > >>>
>>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>>>> >>>>>> > > >>>>
>>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't looked at
>>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might have to go with
>>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>>>>>> >>>>>> > > >>>>
>>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <pa...@google.com> wrote:
>>>>>>>>> >>>>>> > > >>>>>
>>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple people (myself included) who've been annoyed by having to take care of lint issues that maybe a code formatter could save us.
>>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>>>>> >>>>>> > > >>>>> -P.
>>>>>>>>> >>>>>> > > >>>>>
>>>>>>>>> >>>>>> > > >>>>>
>>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto formatter that
>>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and wanted to
>>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python people to get this
>>>>>>>>> >>>>>> > > >>>>>> into the build?
>>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good improvement and
>>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>
>>
>>
>> --
>>
>> Michał Walenia
>> Polidea | Software Engineer
>>
>> M: +48 791 432 002
>> E: michal.walenia@polidea.com
>>
>> Unique Tech
>> Check out our projects!

Re: [DISCUSS] Autoformat python code with Black

Posted by Kamil Wasilewski <ka...@polidea.com>.
Thank you Michał for creating the ticket. I have some free time and I'd
like to volunteer myself for this task.
Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf`
first.

Best,
Kamil


On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <mi...@polidea.com>
wrote:

> Hi all,
> I created a JIRA issue for this and summarized the available tools
>
> https://issues.apache.org/jira/browse/BEAM-9175
>
> Cheers,
> Michal
>
> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
>
>> Sorry, backing off on this due to time constraints.
>>
>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> It sounds like there's a consensus for yapf. I volunteer to take this on
>>>
>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>>>
>>>> +1 to autoformatting
>>>>
>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one time
>>>>> pass to apply the spotless formatting.
>>>>>
>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained
>>>>>> project. I do support making a one time pass to apply formatting the whole
>>>>>> code base.
>>>>>>
>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>> It'd be good if there was a way to only apply to violating (or at
>>>>>>>> least changed) lines.
>>>>>>>
>>>>>>>
>>>>>>> I assumed the first thing we’d do is convert all of the code in one
>>>>>>> go, since it’s a very safe operation. Did you have something else in mind?
>>>>>>>
>>>>>>> -chad
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > +1 to autoformatting
>>>>>>>> >
>>>>>>>> > Let me add some nuance to that.
>>>>>>>> >
>>>>>>>> > The way I see it there are 2 varieties of formatters:  those
>>>>>>>> which take the original formatting into consideration (autopep8) and those
>>>>>>>> which disregard it (yapf, black).
>>>>>>>> >
>>>>>>>> > I much prefer yapf to black, because you have plenty of options
>>>>>>>> to tweak with yapf (enough to make the output a pretty close match to the
>>>>>>>> current Beam style), and you can mark areas to preserve the original
>>>>>>>> formatting, which could be very useful with Pipeline building with pipe
>>>>>>>> operators.  Please don't pick black.
>>>>>>>> >
>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it only
>>>>>>>> corrects code that breaks the project's style rules.  The big problem with
>>>>>>>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>>>>>>>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>>>>>>>> problem with -- the problem is the use of either 2 or 4 spaces depending on
>>>>>>>> context (expression start vs hanging indent, etc).  This is my *biggest*
>>>>>>>> gripe about the current style.  PyCharm doesn't have enough control
>>>>>>>> either.  So, if we can choose a style that can be expressed by flake8 or
>>>>>>>> pycodestyle then we can use autopep8 to enforce it.
>>>>>>>> >
>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle
>>>>>>>> room to influence the style, but on a big project like Beam all that wiggle
>>>>>>>> room ends up to minor but noticeable inconsistencies in style throughout
>>>>>>>> the project.  yapf ensures completely consistent style, but the tradeoff is
>>>>>>>> that it's sometimes ugly, especially in scenarios with similar repeated
>>>>>>>> entries like argparse, where yapf might insert line breaks in visually
>>>>>>>> inconsistent and unappealing ways depending on the lengths of the keywords
>>>>>>>> and expressions involved.
>>>>>>>> >
>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be a
>>>>>>>> nice addition to setup a pre-commit [1] config so that people can opt in to
>>>>>>>> running *lightweight* autofixers prior to commit.  This will not only
>>>>>>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>>>>>>> Jenkins spends pointing out lint errors.
>>>>>>>> >
>>>>>>>> > [1] https://pre-commit.com/
>>>>>>>> >
>>>>>>>> > -chad
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> Last time we discussed this there seems not to be much progress
>>>>>>>> into autoformatting.
>>>>>>>> >> This tool looks more tweakable, so maybe it could be more
>>>>>>>> appropriate for Beam's use case.
>>>>>>>> >> https://github.com/google/yapf/
>>>>>>>> >> WDYT?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>>>>>>>> lgajowy@apache.org> wrote:
>>>>>>>> >>>
>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My
>>>>>>>> experience is that since spotless in Java SDK I would never start a new
>>>>>>>> Java project without it. So many great benefits not only for one person
>>>>>>>> coding but for all community.
>>>>>>>> >>>
>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the
>>>>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>>>>> command line. I do a lot of code history digging and the global Java
>>>>>>>> reformat is not really a problem.
>>>>>>>> >>>
>>>>>>>> >>> It's actually one more click on Github but I agree it's not the
>>>>>>>> best way to search the history. The most convenient and clear one I've
>>>>>>>> found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>>>>> >>>
>>>>>>>> >>> right click on line number -> "annotate" -> click again ->
>>>>>>>> "annotate previous revision" -> ...
>>>>>>>> >>>
>>>>>>>> >>> You can also use "compare with" to see the diff between two
>>>>>>>> revisions.
>>>>>>>> >>>
>>>>>>>> >>> Łukasz
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>>>>>>>> napisał(a):
>>>>>>>> >>>>
>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems
>>>>>>>> there are some issues?)
>>>>>>>> >>>>
>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>>>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review
>>>>>>>> iteration. We will skip some of conversations about code style.
>>>>>>>> >>>>
>>>>>>>> >>>> ...
>>>>>>>> >>>>>
>>>>>>>> >>>>> Last but not least, new contributor may be less discouraged.
>>>>>>>> When I started contribute I didn’t know how to format my code and I lost a
>>>>>>>> lot of time to add pylint and adjust IntelliJ. I eventually failed.
>>>>>>>> Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much for
>>>>>>>> Java. It is a community benefit. You do not have to be a contributor to the
>>>>>>>> Python SDK to support this. That is why I am writing here. Just eliminate
>>>>>>>> all discussion of formatting. It doesn't really matter what the resulting
>>>>>>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>>>>>>> non-default format.
>>>>>>>> >>>>
>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
>>>>>>>> reformat touched 50k lines. It does not really matter how big it is.
>>>>>>>> Definitely do it all at once if you think the tool is good enough. And you
>>>>>>>> should pin a version, so churn is not a problem. You can upgrade the
>>>>>>>> version and reformat in a PR later and that is also easy.
>>>>>>>> >>>>
>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>>>>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>>>>> command line. I do a lot of code history digging and the global Java
>>>>>>>> reformat is not really a problem.
>>>>>>>> >>>>
>>>>>>>> >>>> Kenn
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> Also everything will be formatted in a same way, so
>>>>>>>> eventually it would be easier to read.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of
>>>>>>>> Jenkins failures won’t take place, so we save time and resources.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax
>>>>>>>> and after formatting they looks a little bit weird, but maybe extending the
>>>>>>>> only configurable option in Black - lines, from 88 to 110 would be solution.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t
>>>>>>>> know how big obstacle it would be.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> I believe there are two options how it would be possible to
>>>>>>>> introduce Black. First: just do it, it will hurt but then it would be ok
>>>>>>>> (same as a dentist appointment). Of course it may require some work to
>>>>>>>> adjust linters. On the other hand we can do it gradually and start
>>>>>>>> including sdk parts one by one - maybe it will be less painful?
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> As an example I can share one of projects [2] I know that
>>>>>>>> uses Black (they use also other cool checkers and pre-commit [3]). This is
>>>>>>>> how looks their build with all checks [4].
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> To sum up I believe that if we want improve our coding
>>>>>>>> experience, we should improve our toolset. Black seems be recent and quite
>>>>>>>> popular tool what makes think they won’t stop developing it.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> [1]
>>>>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> [3] https://pre-commit.com
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> [4]
>>>>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it
>>>>>>>> would change nearly every single line in the codebase (and the loss of all
>>>>>>>> context as well as that particular line).
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we
>>>>>>>> don't conform to that either--we use 2 spaces for indentation, but 4 for
>>>>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>>>>> lines quite frequently (it's telling that black's default line length is
>>>>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found
>>>>>>>> that this tool introduces a huge amount of vertical whitespace (e.g.
>>>>>>>> closing parentheses on their own line), e.g.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> def foo(
>>>>>>>> >>>>>>     args
>>>>>>>> >>>>>> ):
>>>>>>>> >>>>>>   if (
>>>>>>>> >>>>>>       long expression)
>>>>>>>> >>>>>>   ):
>>>>>>>> >>>>>>     func(
>>>>>>>> >>>>>>         args
>>>>>>>> >>>>>>     )
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses
>>>>>>>> on the same lines, as well as omit the newline after "if (", and disabling
>>>>>>>> formatting of strings, which reduce the churn in our codebase to 15k lines
>>>>>>>> (adding about 4k) out of 200k total.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> It's still very opinionated, often in different ways then
>>>>>>>> me, and doesn't understand the semantics of the code, but possibly
>>>>>>>> something we could live with given the huge advantages of an autoformatter.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> An intermediate point would be to allow, but not require,
>>>>>>>> autoformatting of changed lines.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent
>>>>>>>> number of contributors and in my book being in the python github project is
>>>>>>>> a strong positive signal. But, due to the above issues, I think we'd have
>>>>>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space
>>>>>>>> issue is a 2-line change, and the rest implemented as a post-processing
>>>>>>>> step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>>>>>>>> iemejia@gmail.com> wrote:
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way
>>>>>>>> to fit our
>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>>>>> reformat the
>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > It cannot be configured to do what we actually do because
>>>>>>>> Black is
>>>>>>>> >>>>>> > configurable only to support the standard python codestyle
>>>>>>>> guidelines
>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most
>>>>>>>> projects in the
>>>>>>>> >>>>>> > python world use.
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>>>>> history. This
>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>>>>> the feeling
>>>>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>>>>> the linter has
>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always
>>>>>>>> tradeoffs we have
>>>>>>>> >>>>>> > to deal with.
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>>>>>>> mxm@apache.org> wrote:
>>>>>>>> >>>>>> > >
>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way
>>>>>>>> to fit our
>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>>>>> reformat the
>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>>> >>>>>> > >
>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>>>>> history. This
>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>>>>> the feeling
>>>>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>>>>> the linter has
>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>>> >>>>>> > >
>>>>>>>> >>>>>> > > -Max
>>>>>>>> >>>>>> > >
>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>>>> >>>>>> > > >> My concerns are:
>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>>>>> warning.
>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For
>>>>>>>> the same reason I also strongly prefer not using a fork for a specific
>>>>>>>> setting. Fork will only have less people looking at it.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is
>>>>>>>> recent, it was
>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>>>>>>>> change since
>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta
>>>>>>>> in that case is
>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>>>>>>> contribution page [1]
>>>>>>>> >>>>>> > > > you can notice that it is less and less a single
>>>>>>>> person project, there
>>>>>>>> >>>>>> > > > have been 93 independent contributions since the
>>>>>>>> project became
>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python
>>>>>>>> organization
>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project
>>>>>>>> continuity.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > You are right however about the fact that the main
>>>>>>>> author seems to be
>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue
>>>>>>>> he can seem
>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>>>>> recommendations
>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4
>>>>>>>> spaces
>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python
>>>>>>>> style guide [4],
>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the
>>>>>>>> current policy to
>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7
>>>>>>>> compatibility on
>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
>>>>>>>> positive given the
>>>>>>>> >>>>>> > > > size of our code base.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4
>>>>>>>> files failed to reformat.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>> >>>>>> > > >
>>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>> >>>>>> > > >
>>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>> >>>>>> > > >
>>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>>>>> file=sys.stderr)
>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>> >>>>>> > > >
>>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc ==
>>>>>>>> last_pc else '    ',
>>>>>>>> >>>>>> > > > end=' ')
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > I still think this can be positive for the project but
>>>>>>>> well I am
>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I let
>>>>>>>> you the python
>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems
>>>>>>>> like a good
>>>>>>>> >>>>>> > > > improvement for the project.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > [1]
>>>>>>>> https://github.com/python/black/graphs/contributors
>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>>>>>> >>>>>> > > > [3]
>>>>>>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>>>> >>>>>> > > > [4]
>>>>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>>>>>>> altay@google.com> wrote:
>>>>>>>> >>>>>> > > >>
>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>>>>> warning.
>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For
>>>>>>>> the same reason I also strongly prefer not using a fork for a specific
>>>>>>>> setting. Fork will only have less people looking at it.
>>>>>>>> >>>>>> > > >>
>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>>>>>>>> issues are real as pointed in the thread. If someone would like to give it
>>>>>>>> a try and see how it would look like for us that would be interesting.
>>>>>>>> >>>>>> > > >>
>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>>>>> >>>>>> > > >>>
>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs
>>>>>>>> failures are caused by lint problems.
>>>>>>>> >>>>>> > > >>> I think it would be great to have something similar
>>>>>>>> to Spotless in Java SDK (I heard there is problem with configuring Black
>>>>>>>> with IntelliJ).
>>>>>>>> >>>>>> > > >>>
>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>> >>>>>> > > >>>>
>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>>>>>>>> haven't looked at
>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might have
>>>>>>>> to go with
>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>>>>> >>>>>> > > >>>>
>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>>>>> pabloem@google.com> wrote:
>>>>>>>> >>>>>> > > >>>>>
>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple
>>>>>>>> people (myself included) who've been annoyed by having to take care of lint
>>>>>>>> issues that maybe a code formatter could save us.
>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>>>> >>>>>> > > >>>>> -P.
>>>>>>>> >>>>>> > > >>>>>
>>>>>>>> >>>>>> > > >>>>>
>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>>>>> iemejia@gmail.com> wrote:
>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code
>>>>>>>> auto formatter that
>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for
>>>>>>>> python, and wanted to
>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>>>>>>>> people to get this
>>>>>>>> >>>>>> > > >>>>>> into the build?
>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a
>>>>>>>> good improvement and
>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this
>>>>>>>> too.
>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>>>>
>>>>>>>
>
> --
>
> Michał Walenia
> Polidea <https://www.polidea.com/> | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.walenia@polidea.com
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Ismaël Mejía <ie...@gmail.com>.
Great to see general consensus and this finally happening!
I think there is less arguments now about the tool with a consensus towards
`yapf`.
So better just to do a PoC with `yapf` tuned to the Beam's configuration to
see how impacting it would be.
Thanks Michal and Kamil for taking a look into this.

On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia <mi...@polidea.com>
wrote:

> Hi all,
> I created a JIRA issue for this and summarized the available tools
>
> https://issues.apache.org/jira/browse/BEAM-9175
>
> Cheers,
> Michal
>
> On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:
>
>> Sorry, backing off on this due to time constraints.
>>
>> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> It sounds like there's a consensus for yapf. I volunteer to take this on
>>>
>>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>>>
>>>> +1 to autoformatting
>>>>
>>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> +1 to autoformatters. Also the Beam Java SDK went through a one time
>>>>> pass to apply the spotless formatting.
>>>>>
>>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> +1 to autoformatters and yapf. It appears to be a well maintained
>>>>>> project. I do support making a one time pass to apply formatting the whole
>>>>>> code base.
>>>>>>
>>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>> It'd be good if there was a way to only apply to violating (or at
>>>>>>>> least changed) lines.
>>>>>>>
>>>>>>>
>>>>>>> I assumed the first thing we’d do is convert all of the code in one
>>>>>>> go, since it’s a very safe operation. Did you have something else in mind?
>>>>>>>
>>>>>>> -chad
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > +1 to autoformatting
>>>>>>>> >
>>>>>>>> > Let me add some nuance to that.
>>>>>>>> >
>>>>>>>> > The way I see it there are 2 varieties of formatters:  those
>>>>>>>> which take the original formatting into consideration (autopep8) and those
>>>>>>>> which disregard it (yapf, black).
>>>>>>>> >
>>>>>>>> > I much prefer yapf to black, because you have plenty of options
>>>>>>>> to tweak with yapf (enough to make the output a pretty close match to the
>>>>>>>> current Beam style), and you can mark areas to preserve the original
>>>>>>>> formatting, which could be very useful with Pipeline building with pipe
>>>>>>>> operators.  Please don't pick black.
>>>>>>>> >
>>>>>>>> > autopep8 is more along the lines of spotless in Java -- it only
>>>>>>>> corrects code that breaks the project's style rules.  The big problem with
>>>>>>>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>>>>>>>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>>>>>>>> problem with -- the problem is the use of either 2 or 4 spaces depending on
>>>>>>>> context (expression start vs hanging indent, etc).  This is my *biggest*
>>>>>>>> gripe about the current style.  PyCharm doesn't have enough control
>>>>>>>> either.  So, if we can choose a style that can be expressed by flake8 or
>>>>>>>> pycodestyle then we can use autopep8 to enforce it.
>>>>>>>> >
>>>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle
>>>>>>>> room to influence the style, but on a big project like Beam all that wiggle
>>>>>>>> room ends up to minor but noticeable inconsistencies in style throughout
>>>>>>>> the project.  yapf ensures completely consistent style, but the tradeoff is
>>>>>>>> that it's sometimes ugly, especially in scenarios with similar repeated
>>>>>>>> entries like argparse, where yapf might insert line breaks in visually
>>>>>>>> inconsistent and unappealing ways depending on the lengths of the keywords
>>>>>>>> and expressions involved.
>>>>>>>> >
>>>>>>>> > Either way (but especially if we choose yapf) I think it'd be a
>>>>>>>> nice addition to setup a pre-commit [1] config so that people can opt in to
>>>>>>>> running *lightweight* autofixers prior to commit.  This will not only
>>>>>>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>>>>>>> Jenkins spends pointing out lint errors.
>>>>>>>> >
>>>>>>>> > [1] https://pre-commit.com/
>>>>>>>> >
>>>>>>>> > -chad
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> Last time we discussed this there seems not to be much progress
>>>>>>>> into autoformatting.
>>>>>>>> >> This tool looks more tweakable, so maybe it could be more
>>>>>>>> appropriate for Beam's use case.
>>>>>>>> >> https://github.com/google/yapf/
>>>>>>>> >> WDYT?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>>>>>>>> lgajowy@apache.org> wrote:
>>>>>>>> >>>
>>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My
>>>>>>>> experience is that since spotless in Java SDK I would never start a new
>>>>>>>> Java project without it. So many great benefits not only for one person
>>>>>>>> coding but for all community.
>>>>>>>> >>>
>>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the
>>>>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>>>>> command line. I do a lot of code history digging and the global Java
>>>>>>>> reformat is not really a problem.
>>>>>>>> >>>
>>>>>>>> >>> It's actually one more click on Github but I agree it's not the
>>>>>>>> best way to search the history. The most convenient and clear one I've
>>>>>>>> found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>>>>> >>>
>>>>>>>> >>> right click on line number -> "annotate" -> click again ->
>>>>>>>> "annotate previous revision" -> ...
>>>>>>>> >>>
>>>>>>>> >>> You can also use "compare with" to see the diff between two
>>>>>>>> revisions.
>>>>>>>> >>>
>>>>>>>> >>> Łukasz
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>>>>>>>> napisał(a):
>>>>>>>> >>>>
>>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems
>>>>>>>> there are some issues?)
>>>>>>>> >>>>
>>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>>>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> What else actually we gain? My guess is faster PR review
>>>>>>>> iteration. We will skip some of conversations about code style.
>>>>>>>> >>>>
>>>>>>>> >>>> ...
>>>>>>>> >>>>>
>>>>>>>> >>>>> Last but not least, new contributor may be less discouraged.
>>>>>>>> When I started contribute I didn’t know how to format my code and I lost a
>>>>>>>> lot of time to add pylint and adjust IntelliJ. I eventually failed.
>>>>>>>> Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> This is a huge benefit. This is why I supported it so much for
>>>>>>>> Java. It is a community benefit. You do not have to be a contributor to the
>>>>>>>> Python SDK to support this. That is why I am writing here. Just eliminate
>>>>>>>> all discussion of formatting. It doesn't really matter what the resulting
>>>>>>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>>>>>>> non-default format.
>>>>>>>> >>>>
>>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
>>>>>>>> reformat touched 50k lines. It does not really matter how big it is.
>>>>>>>> Definitely do it all at once if you think the tool is good enough. And you
>>>>>>>> should pin a version, so churn is not a problem. You can upgrade the
>>>>>>>> version and reformat in a PR later and that is also easy.
>>>>>>>> >>>>
>>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>>>>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>>>>> command line. I do a lot of code history digging and the global Java
>>>>>>>> reformat is not really a problem.
>>>>>>>> >>>>
>>>>>>>> >>>> Kenn
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> Also everything will be formatted in a same way, so
>>>>>>>> eventually it would be easier to read.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of
>>>>>>>> Jenkins failures won’t take place, so we save time and resources.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax
>>>>>>>> and after formatting they looks a little bit weird, but maybe extending the
>>>>>>>> only configurable option in Black - lines, from 88 to 110 would be solution.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t
>>>>>>>> know how big obstacle it would be.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> I believe there are two options how it would be possible to
>>>>>>>> introduce Black. First: just do it, it will hurt but then it would be ok
>>>>>>>> (same as a dentist appointment). Of course it may require some work to
>>>>>>>> adjust linters. On the other hand we can do it gradually and start
>>>>>>>> including sdk parts one by one - maybe it will be less painful?
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> As an example I can share one of projects [2] I know that
>>>>>>>> uses Black (they use also other cool checkers and pre-commit [3]). This is
>>>>>>>> how looks their build with all checks [4].
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> To sum up I believe that if we want improve our coding
>>>>>>>> experience, we should improve our toolset. Black seems be recent and quite
>>>>>>>> popular tool what makes think they won’t stop developing it.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> [1]
>>>>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> [3] https://pre-commit.com
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> [4]
>>>>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it
>>>>>>>> would change nearly every single line in the codebase (and the loss of all
>>>>>>>> context as well as that particular line).
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we
>>>>>>>> don't conform to that either--we use 2 spaces for indentation, but 4 for
>>>>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>>>>> lines quite frequently (it's telling that black's default line length is
>>>>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found
>>>>>>>> that this tool introduces a huge amount of vertical whitespace (e.g.
>>>>>>>> closing parentheses on their own line), e.g.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> def foo(
>>>>>>>> >>>>>>     args
>>>>>>>> >>>>>> ):
>>>>>>>> >>>>>>   if (
>>>>>>>> >>>>>>       long expression)
>>>>>>>> >>>>>>   ):
>>>>>>>> >>>>>>     func(
>>>>>>>> >>>>>>         args
>>>>>>>> >>>>>>     )
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses
>>>>>>>> on the same lines, as well as omit the newline after "if (", and disabling
>>>>>>>> formatting of strings, which reduce the churn in our codebase to 15k lines
>>>>>>>> (adding about 4k) out of 200k total.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> It's still very opinionated, often in different ways then
>>>>>>>> me, and doesn't understand the semantics of the code, but possibly
>>>>>>>> something we could live with given the huge advantages of an autoformatter.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> An intermediate point would be to allow, but not require,
>>>>>>>> autoformatting of changed lines.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent
>>>>>>>> number of contributors and in my book being in the python github project is
>>>>>>>> a strong positive signal. But, due to the above issues, I think we'd have
>>>>>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space
>>>>>>>> issue is a 2-line change, and the rest implemented as a post-processing
>>>>>>>> step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>>>>>>>> iemejia@gmail.com> wrote:
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way
>>>>>>>> to fit our
>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>>>>> reformat the
>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > It cannot be configured to do what we actually do because
>>>>>>>> Black is
>>>>>>>> >>>>>> > configurable only to support the standard python codestyle
>>>>>>>> guidelines
>>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most
>>>>>>>> projects in the
>>>>>>>> >>>>>> > python world use.
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>>>>> history. This
>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>>>>> the feeling
>>>>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>>>>> the linter has
>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always
>>>>>>>> tradeoffs we have
>>>>>>>> >>>>>> > to deal with.
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> >
>>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>>>>>>> mxm@apache.org> wrote:
>>>>>>>> >>>>>> > >
>>>>>>>> >>>>>> > > I think the question is if it can be configured in a way
>>>>>>>> to fit our
>>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>>>>> reformat the
>>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>>> >>>>>> > >
>>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>>>>> history. This
>>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>>>>> the feeling
>>>>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>>>>> the linter has
>>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>>> >>>>>> > >
>>>>>>>> >>>>>> > > -Max
>>>>>>>> >>>>>> > >
>>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>>>> >>>>>> > > >> My concerns are:
>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>>>>> warning.
>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For
>>>>>>>> the same reason I also strongly prefer not using a fork for a specific
>>>>>>>> setting. Fork will only have less people looking at it.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is
>>>>>>>> recent, it was
>>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>>>>>>>> change since
>>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta
>>>>>>>> in that case is
>>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>>>>>>> contribution page [1]
>>>>>>>> >>>>>> > > > you can notice that it is less and less a single
>>>>>>>> person project, there
>>>>>>>> >>>>>> > > > have been 93 independent contributions since the
>>>>>>>> project became
>>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python
>>>>>>>> organization
>>>>>>>> >>>>>> > > > github [2] gives some confidence on the project
>>>>>>>> continuity.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > You are right however about the fact that the main
>>>>>>>> author seems to be
>>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue
>>>>>>>> he can seem
>>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>>>>> recommendations
>>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4
>>>>>>>> spaces
>>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python
>>>>>>>> style guide [4],
>>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the
>>>>>>>> current policy to
>>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7
>>>>>>>> compatibility on
>>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
>>>>>>>> positive given the
>>>>>>>> >>>>>> > > > size of our code base.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4
>>>>>>>> files failed to reformat.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>> >>>>>> > > >
>>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>> >>>>>> > > >
>>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>> >>>>>> > > >
>>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>>>>> file=sys.stderr)
>>>>>>>> >>>>>> > > > error: cannot format
>>>>>>>> >>>>>> > > >
>>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc ==
>>>>>>>> last_pc else '    ',
>>>>>>>> >>>>>> > > > end=' ')
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > I still think this can be positive for the project but
>>>>>>>> well I am
>>>>>>>> >>>>>> > > > barely a contributor to the python code base so I let
>>>>>>>> you the python
>>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems
>>>>>>>> like a good
>>>>>>>> >>>>>> > > > improvement for the project.
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > [1]
>>>>>>>> https://github.com/python/black/graphs/contributors
>>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>>>>>> >>>>>> > > > [3]
>>>>>>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>>>> >>>>>> > > > [4]
>>>>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>>>> >>>>>> > > >
>>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>>>>>>> altay@google.com> wrote:
>>>>>>>> >>>>>> > > >>
>>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>>>>> warning.
>>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For
>>>>>>>> the same reason I also strongly prefer not using a fork for a specific
>>>>>>>> setting. Fork will only have less people looking at it.
>>>>>>>> >>>>>> > > >>
>>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>>>>>>>> issues are real as pointed in the thread. If someone would like to give it
>>>>>>>> a try and see how it would look like for us that would be interesting.
>>>>>>>> >>>>>> > > >>
>>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>>>>> >>>>>> > > >>>
>>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs
>>>>>>>> failures are caused by lint problems.
>>>>>>>> >>>>>> > > >>> I think it would be great to have something similar
>>>>>>>> to Spotless in Java SDK (I heard there is problem with configuring Black
>>>>>>>> with IntelliJ).
>>>>>>>> >>>>>> > > >>>
>>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>> >>>>>> > > >>>>
>>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>>>>>>>> haven't looked at
>>>>>>>> >>>>>> > > >>>> how well this particular one works. We might have
>>>>>>>> to go with
>>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>>>>> >>>>>> > > >>>>
>>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>>>>> pabloem@google.com> wrote:
>>>>>>>> >>>>>> > > >>>>>
>>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple
>>>>>>>> people (myself included) who've been annoyed by having to take care of lint
>>>>>>>> issues that maybe a code formatter could save us.
>>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>>>> >>>>>> > > >>>>> -P.
>>>>>>>> >>>>>> > > >>>>>
>>>>>>>> >>>>>> > > >>>>>
>>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>>>>> iemejia@gmail.com> wrote:
>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code
>>>>>>>> auto formatter that
>>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for
>>>>>>>> python, and wanted to
>>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>>>>>>>> people to get this
>>>>>>>> >>>>>> > > >>>>>> into the build?
>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a
>>>>>>>> good improvement and
>>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this
>>>>>>>> too.
>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>>>>>> >>>>>> > > >>>>>>
>>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>>>>
>>>>>>>
>
> --
>
> Michał Walenia
> Polidea <https://www.polidea.com/> | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.walenia@polidea.com
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Michał Walenia <mi...@polidea.com>.
Hi all,
I created a JIRA issue for this and summarized the available tools

https://issues.apache.org/jira/browse/BEAM-9175

Cheers,
Michal

On Thu, Jan 23, 2020 at 1:49 AM Udi Meiri <eh...@google.com> wrote:

> Sorry, backing off on this due to time constraints.
>
> On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:
>
>> It sounds like there's a consensus for yapf. I volunteer to take this on
>>
>> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>>
>>> +1 to autoformatting
>>>
>>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> +1 to autoformatters. Also the Beam Java SDK went through a one time
>>>> pass to apply the spotless formatting.
>>>>
>>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> +1 to autoformatters and yapf. It appears to be a well maintained
>>>>> project. I do support making a one time pass to apply formatting the whole
>>>>> code base.
>>>>>
>>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>> It'd be good if there was a way to only apply to violating (or at
>>>>>>> least changed) lines.
>>>>>>
>>>>>>
>>>>>> I assumed the first thing we’d do is convert all of the code in one
>>>>>> go, since it’s a very safe operation. Did you have something else in mind?
>>>>>>
>>>>>> -chad
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > +1 to autoformatting
>>>>>>> >
>>>>>>> > Let me add some nuance to that.
>>>>>>> >
>>>>>>> > The way I see it there are 2 varieties of formatters:  those which
>>>>>>> take the original formatting into consideration (autopep8) and those which
>>>>>>> disregard it (yapf, black).
>>>>>>> >
>>>>>>> > I much prefer yapf to black, because you have plenty of options to
>>>>>>> tweak with yapf (enough to make the output a pretty close match to the
>>>>>>> current Beam style), and you can mark areas to preserve the original
>>>>>>> formatting, which could be very useful with Pipeline building with pipe
>>>>>>> operators.  Please don't pick black.
>>>>>>> >
>>>>>>> > autopep8 is more along the lines of spotless in Java -- it only
>>>>>>> corrects code that breaks the project's style rules.  The big problem with
>>>>>>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>>>>>>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>>>>>>> problem with -- the problem is the use of either 2 or 4 spaces depending on
>>>>>>> context (expression start vs hanging indent, etc).  This is my *biggest*
>>>>>>> gripe about the current style.  PyCharm doesn't have enough control
>>>>>>> either.  So, if we can choose a style that can be expressed by flake8 or
>>>>>>> pycodestyle then we can use autopep8 to enforce it.
>>>>>>> >
>>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle
>>>>>>> room to influence the style, but on a big project like Beam all that wiggle
>>>>>>> room ends up to minor but noticeable inconsistencies in style throughout
>>>>>>> the project.  yapf ensures completely consistent style, but the tradeoff is
>>>>>>> that it's sometimes ugly, especially in scenarios with similar repeated
>>>>>>> entries like argparse, where yapf might insert line breaks in visually
>>>>>>> inconsistent and unappealing ways depending on the lengths of the keywords
>>>>>>> and expressions involved.
>>>>>>> >
>>>>>>> > Either way (but especially if we choose yapf) I think it'd be a
>>>>>>> nice addition to setup a pre-commit [1] config so that people can opt in to
>>>>>>> running *lightweight* autofixers prior to commit.  This will not only
>>>>>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>>>>>> Jenkins spends pointing out lint errors.
>>>>>>> >
>>>>>>> > [1] https://pre-commit.com/
>>>>>>> >
>>>>>>> > -chad
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Last time we discussed this there seems not to be much progress
>>>>>>> into autoformatting.
>>>>>>> >> This tool looks more tweakable, so maybe it could be more
>>>>>>> appropriate for Beam's use case.
>>>>>>> >> https://github.com/google/yapf/
>>>>>>> >> WDYT?
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <
>>>>>>> lgajowy@apache.org> wrote:
>>>>>>> >>>
>>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My
>>>>>>> experience is that since spotless in Java SDK I would never start a new
>>>>>>> Java project without it. So many great benefits not only for one person
>>>>>>> coding but for all community.
>>>>>>> >>>
>>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the
>>>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>>>> command line. I do a lot of code history digging and the global Java
>>>>>>> reformat is not really a problem.
>>>>>>> >>>
>>>>>>> >>> It's actually one more click on Github but I agree it's not the
>>>>>>> best way to search the history. The most convenient and clear one I've
>>>>>>> found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>>>> >>>
>>>>>>> >>> right click on line number -> "annotate" -> click again ->
>>>>>>> "annotate previous revision" -> ...
>>>>>>> >>>
>>>>>>> >>> You can also use "compare with" to see the diff between two
>>>>>>> revisions.
>>>>>>> >>>
>>>>>>> >>> Łukasz
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>>>>>>> napisał(a):
>>>>>>> >>>>
>>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems
>>>>>>> there are some issues?)
>>>>>>> >>>>
>>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> What else actually we gain? My guess is faster PR review
>>>>>>> iteration. We will skip some of conversations about code style.
>>>>>>> >>>>
>>>>>>> >>>> ...
>>>>>>> >>>>>
>>>>>>> >>>>> Last but not least, new contributor may be less discouraged.
>>>>>>> When I started contribute I didn’t know how to format my code and I lost a
>>>>>>> lot of time to add pylint and adjust IntelliJ. I eventually failed.
>>>>>>> Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> This is a huge benefit. This is why I supported it so much for
>>>>>>> Java. It is a community benefit. You do not have to be a contributor to the
>>>>>>> Python SDK to support this. That is why I am writing here. Just eliminate
>>>>>>> all discussion of formatting. It doesn't really matter what the resulting
>>>>>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>>>>>> non-default format.
>>>>>>> >>>>
>>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
>>>>>>> reformat touched 50k lines. It does not really matter how big it is.
>>>>>>> Definitely do it all at once if you think the tool is good enough. And you
>>>>>>> should pin a version, so churn is not a problem. You can upgrade the
>>>>>>> version and reformat in a PR later and that is also easy.
>>>>>>> >>>>
>>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>>>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>>>> command line. I do a lot of code history digging and the global Java
>>>>>>> reformat is not really a problem.
>>>>>>> >>>>
>>>>>>> >>>> Kenn
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>>
>>>>>>> >>>>> Also everything will be formatted in a same way, so eventually
>>>>>>> it would be easier to read.
>>>>>>> >>>>>
>>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of
>>>>>>> Jenkins failures won’t take place, so we save time and resources.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax
>>>>>>> and after formatting they looks a little bit weird, but maybe extending the
>>>>>>> only configurable option in Black - lines, from 88 to 110 would be solution.
>>>>>>> >>>>>
>>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t
>>>>>>> know how big obstacle it would be.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> I believe there are two options how it would be possible to
>>>>>>> introduce Black. First: just do it, it will hurt but then it would be ok
>>>>>>> (same as a dentist appointment). Of course it may require some work to
>>>>>>> adjust linters. On the other hand we can do it gradually and start
>>>>>>> including sdk parts one by one - maybe it will be less painful?
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> As an example I can share one of projects [2] I know that uses
>>>>>>> Black (they use also other cool checkers and pre-commit [3]). This is how
>>>>>>> looks their build with all checks [4].
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> To sum up I believe that if we want improve our coding
>>>>>>> experience, we should improve our toolset. Black seems be recent and quite
>>>>>>> popular tool what makes think they won’t stop developing it.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> [1]
>>>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> [3] https://pre-commit.com
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> [4]
>>>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>>>>>> robertwb@google.com> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it
>>>>>>> would change nearly every single line in the codebase (and the loss of all
>>>>>>> context as well as that particular line).
>>>>>>> >>>>>>
>>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we
>>>>>>> don't conform to that either--we use 2 spaces for indentation, but 4 for
>>>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>>>> lines quite frequently (it's telling that black's default line length is
>>>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>>>> >>>>>>
>>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found
>>>>>>> that this tool introduces a huge amount of vertical whitespace (e.g.
>>>>>>> closing parentheses on their own line), e.g.
>>>>>>> >>>>>>
>>>>>>> >>>>>> def foo(
>>>>>>> >>>>>>     args
>>>>>>> >>>>>> ):
>>>>>>> >>>>>>   if (
>>>>>>> >>>>>>       long expression)
>>>>>>> >>>>>>   ):
>>>>>>> >>>>>>     func(
>>>>>>> >>>>>>         args
>>>>>>> >>>>>>     )
>>>>>>> >>>>>>
>>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on
>>>>>>> the same lines, as well as omit the newline after "if (", and disabling
>>>>>>> formatting of strings, which reduce the churn in our codebase to 15k lines
>>>>>>> (adding about 4k) out of 200k total.
>>>>>>> >>>>>>
>>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>>> >>>>>>
>>>>>>> >>>>>> It's still very opinionated, often in different ways then me,
>>>>>>> and doesn't understand the semantics of the code, but possibly something we
>>>>>>> could live with given the huge advantages of an autoformatter.
>>>>>>> >>>>>>
>>>>>>> >>>>>> An intermediate point would be to allow, but not require,
>>>>>>> autoformatting of changed lines.
>>>>>>> >>>>>>
>>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent
>>>>>>> number of contributors and in my book being in the python github project is
>>>>>>> a strong positive signal. But, due to the above issues, I think we'd have
>>>>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space
>>>>>>> issue is a 2-line change, and the rest implemented as a post-processing
>>>>>>> step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>>>>>>> iemejia@gmail.com> wrote:
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > > I think the question is if it can be configured in a way
>>>>>>> to fit our
>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>>>> reformat the
>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > It cannot be configured to do what we actually do because
>>>>>>> Black is
>>>>>>> >>>>>> > configurable only to support the standard python codestyle
>>>>>>> guidelines
>>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects
>>>>>>> in the
>>>>>>> >>>>>> > python world use.
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>>>> history. This
>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>>>> the feeling
>>>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>>>> the linter has
>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > Yes that’s the bad side effect but there are always
>>>>>>> tradeoffs we have
>>>>>>> >>>>>> > to deal with.
>>>>>>> >>>>>> >
>>>>>>> >>>>>> >
>>>>>>> >>>>>> >
>>>>>>> >>>>>> >
>>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>>>>>> mxm@apache.org> wrote:
>>>>>>> >>>>>> > >
>>>>>>> >>>>>> > > I think the question is if it can be configured in a way
>>>>>>> to fit our
>>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>>>> reformat the
>>>>>>> >>>>>> > > entire Python SDK.
>>>>>>> >>>>>> > >
>>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>>>> history. This
>>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>>>> the feeling
>>>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>>>> the linter has
>>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>>> >>>>>> > >
>>>>>>> >>>>>> > > -Max
>>>>>>> >>>>>> > >
>>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>>> >>>>>> > > >> My concerns are:
>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>>>> warning.
>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For
>>>>>>> the same reason I also strongly prefer not using a fork for a specific
>>>>>>> setting. Fork will only have less people looking at it.
>>>>>>> >>>>>> > > >
>>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is
>>>>>>> recent, it was
>>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>>>>>>> change since
>>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta
>>>>>>> in that case is
>>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>>>>>> contribution page [1]
>>>>>>> >>>>>> > > > you can notice that it is less and less a single person
>>>>>>> project, there
>>>>>>> >>>>>> > > > have been 93 independent contributions since the
>>>>>>> project became
>>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python
>>>>>>> organization
>>>>>>> >>>>>> > > > github [2] gives some confidence on the project
>>>>>>> continuity.
>>>>>>> >>>>>> > > >
>>>>>>> >>>>>> > > > You are right however about the fact that the main
>>>>>>> author seems to be
>>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he
>>>>>>> can seem
>>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>>>> recommendations
>>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4
>>>>>>> spaces
>>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python
>>>>>>> style guide [4],
>>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the
>>>>>>> current policy to
>>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>>>> >>>>>> > > >
>>>>>>> >>>>>> > > > I did a quick run of black with python 2.7
>>>>>>> compatibility on
>>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
>>>>>>> positive given the
>>>>>>> >>>>>> > > > size of our code base.
>>>>>>> >>>>>> > > >
>>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files
>>>>>>> failed to reformat.
>>>>>>> >>>>>> > > >
>>>>>>> >>>>>> > > > error: cannot format
>>>>>>> >>>>>> > > >
>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>>>> >>>>>> > > > error: cannot format
>>>>>>> >>>>>> > > >
>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>>>> >>>>>> > > > error: cannot format
>>>>>>> >>>>>> > > >
>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>>>> file=sys.stderr)
>>>>>>> >>>>>> > > > error: cannot format
>>>>>>> >>>>>> > > >
>>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc ==
>>>>>>> last_pc else '    ',
>>>>>>> >>>>>> > > > end=' ')
>>>>>>> >>>>>> > > >
>>>>>>> >>>>>> > > > I still think this can be positive for the project but
>>>>>>> well I am
>>>>>>> >>>>>> > > > barely a contributor to the python code base so I let
>>>>>>> you the python
>>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems
>>>>>>> like a good
>>>>>>> >>>>>> > > > improvement for the project.
>>>>>>> >>>>>> > > >
>>>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>>>> >>>>>> > > > [2] https://github.com/python
>>>>>>> >>>>>> > > > [3]
>>>>>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>>> >>>>>> > > > [4]
>>>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>>> >>>>>> > > >
>>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>>>>>> altay@google.com> wrote:
>>>>>>> >>>>>> > > >>
>>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>>>> warning.
>>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For
>>>>>>> the same reason I also strongly prefer not using a fork for a specific
>>>>>>> setting. Fork will only have less people looking at it.
>>>>>>> >>>>>> > > >>
>>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>>>>>>> issues are real as pointed in the thread. If someone would like to give it
>>>>>>> a try and see how it would look like for us that would be interesting.
>>>>>>> >>>>>> > > >>
>>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>>>> >>>>>> > > >>>
>>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs
>>>>>>> failures are caused by lint problems.
>>>>>>> >>>>>> > > >>> I think it would be great to have something similar
>>>>>>> to Spotless in Java SDK (I heard there is problem with configuring Black
>>>>>>> with IntelliJ).
>>>>>>> >>>>>> > > >>>
>>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>>>> robertwb@google.com> wrote:
>>>>>>> >>>>>> > > >>>>
>>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>>>>>>> haven't looked at
>>>>>>> >>>>>> > > >>>> how well this particular one works. We might have to
>>>>>>> go with
>>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>>>> >>>>>> > > >>>>
>>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>>>> pabloem@google.com> wrote:
>>>>>>> >>>>>> > > >>>>>
>>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple
>>>>>>> people (myself included) who've been annoyed by having to take care of lint
>>>>>>> issues that maybe a code formatter could save us.
>>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>>> >>>>>> > > >>>>> -P.
>>>>>>> >>>>>> > > >>>>>
>>>>>>> >>>>>> > > >>>>>
>>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>>>> iemejia@gmail.com> wrote:
>>>>>>> >>>>>> > > >>>>>>
>>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code
>>>>>>> auto formatter that
>>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for
>>>>>>> python, and wanted to
>>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>>>>>>> people to get this
>>>>>>> >>>>>> > > >>>>>> into the build?
>>>>>>> >>>>>> > > >>>>>>
>>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a
>>>>>>> good improvement and
>>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>>>> >>>>>> > > >>>>>>
>>>>>>> >>>>>> > > >>>>>> WDYT?
>>>>>>> >>>>>> > > >>>>>>
>>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>>>
>>>>>>

-- 

Michał Walenia
Polidea <https://www.polidea.com/> | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.walenia@polidea.com

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Re: [DISCUSS] Autoformat python code with Black

Posted by Udi Meiri <eh...@google.com>.
Sorry, backing off on this due to time constraints.

On Wed, Jan 22, 2020 at 3:39 PM Udi Meiri <eh...@google.com> wrote:

> It sounds like there's a consensus for yapf. I volunteer to take this on
>
> On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:
>
>> +1 to autoformatting
>>
>> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> +1 to autoformatters. Also the Beam Java SDK went through a one time
>>> pass to apply the spotless formatting.
>>>
>>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> +1 to autoformatters and yapf. It appears to be a well maintained
>>>> project. I do support making a one time pass to apply formatting the whole
>>>> code base.
>>>>
>>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>> It'd be good if there was a way to only apply to violating (or at
>>>>>> least changed) lines.
>>>>>
>>>>>
>>>>> I assumed the first thing we’d do is convert all of the code in one
>>>>> go, since it’s a very safe operation. Did you have something else in mind?
>>>>>
>>>>> -chad
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > +1 to autoformatting
>>>>>> >
>>>>>> > Let me add some nuance to that.
>>>>>> >
>>>>>> > The way I see it there are 2 varieties of formatters:  those which
>>>>>> take the original formatting into consideration (autopep8) and those which
>>>>>> disregard it (yapf, black).
>>>>>> >
>>>>>> > I much prefer yapf to black, because you have plenty of options to
>>>>>> tweak with yapf (enough to make the output a pretty close match to the
>>>>>> current Beam style), and you can mark areas to preserve the original
>>>>>> formatting, which could be very useful with Pipeline building with pipe
>>>>>> operators.  Please don't pick black.
>>>>>> >
>>>>>> > autopep8 is more along the lines of spotless in Java -- it only
>>>>>> corrects code that breaks the project's style rules.  The big problem with
>>>>>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>>>>>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>>>>>> problem with -- the problem is the use of either 2 or 4 spaces depending on
>>>>>> context (expression start vs hanging indent, etc).  This is my *biggest*
>>>>>> gripe about the current style.  PyCharm doesn't have enough control
>>>>>> either.  So, if we can choose a style that can be expressed by flake8 or
>>>>>> pycodestyle then we can use autopep8 to enforce it.
>>>>>> >
>>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle
>>>>>> room to influence the style, but on a big project like Beam all that wiggle
>>>>>> room ends up to minor but noticeable inconsistencies in style throughout
>>>>>> the project.  yapf ensures completely consistent style, but the tradeoff is
>>>>>> that it's sometimes ugly, especially in scenarios with similar repeated
>>>>>> entries like argparse, where yapf might insert line breaks in visually
>>>>>> inconsistent and unappealing ways depending on the lengths of the keywords
>>>>>> and expressions involved.
>>>>>> >
>>>>>> > Either way (but especially if we choose yapf) I think it'd be a
>>>>>> nice addition to setup a pre-commit [1] config so that people can opt in to
>>>>>> running *lightweight* autofixers prior to commit.  This will not only
>>>>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>>>>> Jenkins spends pointing out lint errors.
>>>>>> >
>>>>>> > [1] https://pre-commit.com/
>>>>>> >
>>>>>> > -chad
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Last time we discussed this there seems not to be much progress
>>>>>> into autoformatting.
>>>>>> >> This tool looks more tweakable, so maybe it could be more
>>>>>> appropriate for Beam's use case.
>>>>>> >> https://github.com/google/yapf/
>>>>>> >> WDYT?
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My
>>>>>> experience is that since spotless in Java SDK I would never start a new
>>>>>> Java project without it. So many great benefits not only for one person
>>>>>> coding but for all community.
>>>>>> >>>
>>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the
>>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>>> command line. I do a lot of code history digging and the global Java
>>>>>> reformat is not really a problem.
>>>>>> >>>
>>>>>> >>> It's actually one more click on Github but I agree it's not the
>>>>>> best way to search the history. The most convenient and clear one I've
>>>>>> found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>>> >>>
>>>>>> >>> right click on line number -> "annotate" -> click again ->
>>>>>> "annotate previous revision" -> ...
>>>>>> >>>
>>>>>> >>> You can also use "compare with" to see the diff between two
>>>>>> revisions.
>>>>>> >>>
>>>>>> >>> Łukasz
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>>>>>> napisał(a):
>>>>>> >>>>
>>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there
>>>>>> are some issues?)
>>>>>> >>>>
>>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>>> >>>>>
>>>>>> >>>>> What else actually we gain? My guess is faster PR review
>>>>>> iteration. We will skip some of conversations about code style.
>>>>>> >>>>
>>>>>> >>>> ...
>>>>>> >>>>>
>>>>>> >>>>> Last but not least, new contributor may be less discouraged.
>>>>>> When I started contribute I didn’t know how to format my code and I lost a
>>>>>> lot of time to add pylint and adjust IntelliJ. I eventually failed.
>>>>>> Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> This is a huge benefit. This is why I supported it so much for
>>>>>> Java. It is a community benefit. You do not have to be a contributor to the
>>>>>> Python SDK to support this. That is why I am writing here. Just eliminate
>>>>>> all discussion of formatting. It doesn't really matter what the resulting
>>>>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>>>>> non-default format.
>>>>>> >>>>
>>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
>>>>>> reformat touched 50k lines. It does not really matter how big it is.
>>>>>> Definitely do it all at once if you think the tool is good enough. And you
>>>>>> should pin a version, so churn is not a problem. You can upgrade the
>>>>>> version and reformat in a PR later and that is also easy.
>>>>>> >>>>
>>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>>> command line. I do a lot of code history digging and the global Java
>>>>>> reformat is not really a problem.
>>>>>> >>>>
>>>>>> >>>> Kenn
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>>
>>>>>> >>>>> Also everything will be formatted in a same way, so eventually
>>>>>> it would be easier to read.
>>>>>> >>>>>
>>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of
>>>>>> Jenkins failures won’t take place, so we save time and resources.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax
>>>>>> and after formatting they looks a little bit weird, but maybe extending the
>>>>>> only configurable option in Black - lines, from 88 to 110 would be solution.
>>>>>> >>>>>
>>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t
>>>>>> know how big obstacle it would be.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> I believe there are two options how it would be possible to
>>>>>> introduce Black. First: just do it, it will hurt but then it would be ok
>>>>>> (same as a dentist appointment). Of course it may require some work to
>>>>>> adjust linters. On the other hand we can do it gradually and start
>>>>>> including sdk parts one by one - maybe it will be less painful?
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> As an example I can share one of projects [2] I know that uses
>>>>>> Black (they use also other cool checkers and pre-commit [3]). This is how
>>>>>> looks their build with all checks [4].
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> To sum up I believe that if we want improve our coding
>>>>>> experience, we should improve our toolset. Black seems be recent and quite
>>>>>> popular tool what makes think they won’t stop developing it.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> [1]
>>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> [3] https://pre-commit.com
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> [4]
>>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>>>>> robertwb@google.com> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it
>>>>>> would change nearly every single line in the codebase (and the loss of all
>>>>>> context as well as that particular line).
>>>>>> >>>>>>
>>>>>> >>>>>> This is probably why the 2-space fork exists. However, we
>>>>>> don't conform to that either--we use 2 spaces for indentation, but 4 for
>>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>>> lines quite frequently (it's telling that black's default line length is
>>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>>> >>>>>>
>>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that
>>>>>> this tool introduces a huge amount of vertical whitespace (e.g. closing
>>>>>> parentheses on their own line), e.g.
>>>>>> >>>>>>
>>>>>> >>>>>> def foo(
>>>>>> >>>>>>     args
>>>>>> >>>>>> ):
>>>>>> >>>>>>   if (
>>>>>> >>>>>>       long expression)
>>>>>> >>>>>>   ):
>>>>>> >>>>>>     func(
>>>>>> >>>>>>         args
>>>>>> >>>>>>     )
>>>>>> >>>>>>
>>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on
>>>>>> the same lines, as well as omit the newline after "if (", and disabling
>>>>>> formatting of strings, which reduce the churn in our codebase to 15k lines
>>>>>> (adding about 4k) out of 200k total.
>>>>>> >>>>>>
>>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>> >>>>>>
>>>>>> >>>>>> It's still very opinionated, often in different ways then me,
>>>>>> and doesn't understand the semantics of the code, but possibly something we
>>>>>> could live with given the huge advantages of an autoformatter.
>>>>>> >>>>>>
>>>>>> >>>>>> An intermediate point would be to allow, but not require,
>>>>>> autoformatting of changed lines.
>>>>>> >>>>>>
>>>>>> >>>>>> As for being beta quality, it looks like it's got a decent
>>>>>> number of contributors and in my book being in the python github project is
>>>>>> a strong positive signal. But, due to the above issues, I think we'd have
>>>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space
>>>>>> issue is a 2-line change, and the rest implemented as a post-processing
>>>>>> step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>> >>>>>>
>>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>>>>>> iemejia@gmail.com> wrote:
>>>>>> >>>>>> >
>>>>>> >>>>>> > > I think the question is if it can be configured in a way
>>>>>> to fit our
>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>>> reformat the
>>>>>> >>>>>> > > entire Python SDK.
>>>>>> >>>>>> >
>>>>>> >>>>>> > It cannot be configured to do what we actually do because
>>>>>> Black is
>>>>>> >>>>>> > configurable only to support the standard python codestyle
>>>>>> guidelines
>>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects
>>>>>> in the
>>>>>> >>>>>> > python world use.
>>>>>> >>>>>> >
>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>>> history. This
>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>>> the feeling
>>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>>> the linter has
>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>> >>>>>> >
>>>>>> >>>>>> > Yes that’s the bad side effect but there are always
>>>>>> tradeoffs we have
>>>>>> >>>>>> > to deal with.
>>>>>> >>>>>> >
>>>>>> >>>>>> >
>>>>>> >>>>>> >
>>>>>> >>>>>> >
>>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>>>>> mxm@apache.org> wrote:
>>>>>> >>>>>> > >
>>>>>> >>>>>> > > I think the question is if it can be configured in a way
>>>>>> to fit our
>>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>>> reformat the
>>>>>> >>>>>> > > entire Python SDK.
>>>>>> >>>>>> > >
>>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>>> history. This
>>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>>> the feeling
>>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>>> the linter has
>>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>>> >>>>>> > >
>>>>>> >>>>>> > > -Max
>>>>>> >>>>>> > >
>>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>> >>>>>> > > >> My concerns are:
>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>>> warning.
>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>>>>> same reason I also strongly prefer not using a fork for a specific setting.
>>>>>> Fork will only have less people looking at it.
>>>>>> >>>>>> > > >
>>>>>> >>>>>> > > > I suppose the project is marked as beta because it is
>>>>>> recent, it was
>>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>>>>>> change since
>>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta
>>>>>> in that case is
>>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>>>>> contribution page [1]
>>>>>> >>>>>> > > > you can notice that it is less and less a single person
>>>>>> project, there
>>>>>> >>>>>> > > > have been 93 independent contributions since the project
>>>>>> became
>>>>>> >>>>>> > > > public, and the fact that it is hosted in the python
>>>>>> organization
>>>>>> >>>>>> > > > github [2] gives some confidence on the project
>>>>>> continuity.
>>>>>> >>>>>> > > >
>>>>>> >>>>>> > > > You are right however about the fact that the main
>>>>>> author seems to be
>>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he
>>>>>> can seem
>>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>>> recommendations
>>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4
>>>>>> spaces
>>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python
>>>>>> style guide [4],
>>>>>> >>>>>> > > > So, probably it should be to us to reconsider the
>>>>>> current policy to
>>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>>> >>>>>> > > >
>>>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility
>>>>>> on
>>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
>>>>>> positive given the
>>>>>> >>>>>> > > > size of our code base.
>>>>>> >>>>>> > > >
>>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files
>>>>>> failed to reformat.
>>>>>> >>>>>> > > >
>>>>>> >>>>>> > > > error: cannot format
>>>>>> >>>>>> > > >
>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>>> >>>>>> > > > error: cannot format
>>>>>> >>>>>> > > >
>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>>> >>>>>> > > > error: cannot format
>>>>>> >>>>>> > > >
>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>>> file=sys.stderr)
>>>>>> >>>>>> > > > error: cannot format
>>>>>> >>>>>> > > >
>>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc
>>>>>> else '    ',
>>>>>> >>>>>> > > > end=' ')
>>>>>> >>>>>> > > >
>>>>>> >>>>>> > > > I still think this can be positive for the project but
>>>>>> well I am
>>>>>> >>>>>> > > > barely a contributor to the python code base so I let
>>>>>> you the python
>>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems
>>>>>> like a good
>>>>>> >>>>>> > > > improvement for the project.
>>>>>> >>>>>> > > >
>>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>>> >>>>>> > > > [2] https://github.com/python
>>>>>> >>>>>> > > > [3]
>>>>>> https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>> >>>>>> > > > [4]
>>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>> >>>>>> > > >
>>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>>>>> altay@google.com> wrote:
>>>>>> >>>>>> > > >>
>>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>>> warning.
>>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>>>>> same reason I also strongly prefer not using a fork for a specific setting.
>>>>>> Fork will only have less people looking at it.
>>>>>> >>>>>> > > >>
>>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>>>>>> issues are real as pointed in the thread. If someone would like to give it
>>>>>> a try and see how it would look like for us that would be interesting.
>>>>>> >>>>>> > > >>
>>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>>> >>>>>> > > >>>
>>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs
>>>>>> failures are caused by lint problems.
>>>>>> >>>>>> > > >>> I think it would be great to have something similar to
>>>>>> Spotless in Java SDK (I heard there is problem with configuring Black with
>>>>>> IntelliJ).
>>>>>> >>>>>> > > >>>
>>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>>> robertwb@google.com> wrote:
>>>>>> >>>>>> > > >>>>
>>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>>>>>> haven't looked at
>>>>>> >>>>>> > > >>>> how well this particular one works. We might have to
>>>>>> go with
>>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>>> >>>>>> > > >>>>
>>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>>> pabloem@google.com> wrote:
>>>>>> >>>>>> > > >>>>>
>>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple
>>>>>> people (myself included) who've been annoyed by having to take care of lint
>>>>>> issues that maybe a code formatter could save us.
>>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>> >>>>>> > > >>>>> -P.
>>>>>> >>>>>> > > >>>>>
>>>>>> >>>>>> > > >>>>>
>>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>>> iemejia@gmail.com> wrote:
>>>>>> >>>>>> > > >>>>>>
>>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code
>>>>>> auto formatter that
>>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for
>>>>>> python, and wanted to
>>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>>>>>> people to get this
>>>>>> >>>>>> > > >>>>>> into the build?
>>>>>> >>>>>> > > >>>>>>
>>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a
>>>>>> good improvement and
>>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>>> >>>>>> > > >>>>>>
>>>>>> >>>>>> > > >>>>>> WDYT?
>>>>>> >>>>>> > > >>>>>>
>>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>>
>>>>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Udi Meiri <eh...@google.com>.
It sounds like there's a consensus for yapf. I volunteer to take this on

On Wed, Jan 22, 2020, 10:31 Udi Meiri <eh...@google.com> wrote:

> +1 to autoformatting
>
> On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:
>
>> +1 to autoformatters. Also the Beam Java SDK went through a one time pass
>> to apply the spotless formatting.
>>
>> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> +1 to autoformatters and yapf. It appears to be a well maintained
>>> project. I do support making a one time pass to apply formatting the whole
>>> code base.
>>>
>>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>
>>>>
>>>>> It'd be good if there was a way to only apply to violating (or at
>>>>> least changed) lines.
>>>>
>>>>
>>>> I assumed the first thing we’d do is convert all of the code in one go,
>>>> since it’s a very safe operation. Did you have something else in mind?
>>>>
>>>> -chad
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > +1 to autoformatting
>>>>> >
>>>>> > Let me add some nuance to that.
>>>>> >
>>>>> > The way I see it there are 2 varieties of formatters:  those which
>>>>> take the original formatting into consideration (autopep8) and those which
>>>>> disregard it (yapf, black).
>>>>> >
>>>>> > I much prefer yapf to black, because you have plenty of options to
>>>>> tweak with yapf (enough to make the output a pretty close match to the
>>>>> current Beam style), and you can mark areas to preserve the original
>>>>> formatting, which could be very useful with Pipeline building with pipe
>>>>> operators.  Please don't pick black.
>>>>> >
>>>>> > autopep8 is more along the lines of spotless in Java -- it only
>>>>> corrects code that breaks the project's style rules.  The big problem with
>>>>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>>>>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>>>>> problem with -- the problem is the use of either 2 or 4 spaces depending on
>>>>> context (expression start vs hanging indent, etc).  This is my *biggest*
>>>>> gripe about the current style.  PyCharm doesn't have enough control
>>>>> either.  So, if we can choose a style that can be expressed by flake8 or
>>>>> pycodestyle then we can use autopep8 to enforce it.
>>>>> >
>>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle
>>>>> room to influence the style, but on a big project like Beam all that wiggle
>>>>> room ends up to minor but noticeable inconsistencies in style throughout
>>>>> the project.  yapf ensures completely consistent style, but the tradeoff is
>>>>> that it's sometimes ugly, especially in scenarios with similar repeated
>>>>> entries like argparse, where yapf might insert line breaks in visually
>>>>> inconsistent and unappealing ways depending on the lengths of the keywords
>>>>> and expressions involved.
>>>>> >
>>>>> > Either way (but especially if we choose yapf) I think it'd be a nice
>>>>> addition to setup a pre-commit [1] config so that people can opt in to
>>>>> running *lightweight* autofixers prior to commit.  This will not only
>>>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>>>> Jenkins spends pointing out lint errors.
>>>>> >
>>>>> > [1] https://pre-commit.com/
>>>>> >
>>>>> > -chad
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Last time we discussed this there seems not to be much progress
>>>>> into autoformatting.
>>>>> >> This tool looks more tweakable, so maybe it could be more
>>>>> appropriate for Beam's use case.
>>>>> >> https://github.com/google/yapf/
>>>>> >> WDYT?
>>>>> >>
>>>>> >>
>>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org>
>>>>> wrote:
>>>>> >>>
>>>>> >>> +1 for any autoformatter for Python SDK that does the job. My
>>>>> experience is that since spotless in Java SDK I would never start a new
>>>>> Java project without it. So many great benefits not only for one person
>>>>> coding but for all community.
>>>>> >>>
>>>>> >>> It is a GitHub UI issue that you cannot easily browse past the
>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>> command line. I do a lot of code history digging and the global Java
>>>>> reformat is not really a problem.
>>>>> >>>
>>>>> >>> It's actually one more click on Github but I agree it's not the
>>>>> best way to search the history. The most convenient and clear one I've
>>>>> found so far is in Jetbrains IDEs (Intelij) where you can:
>>>>> >>>
>>>>> >>> right click on line number -> "annotate" -> click again ->
>>>>> "annotate previous revision" -> ...
>>>>> >>>
>>>>> >>> You can also use "compare with" to see the diff between two
>>>>> revisions.
>>>>> >>>
>>>>> >>> Łukasz
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>>>>> napisał(a):
>>>>> >>>>
>>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there
>>>>> are some issues?)
>>>>> >>>>
>>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>> >>>>>
>>>>> >>>>> What else actually we gain? My guess is faster PR review
>>>>> iteration. We will skip some of conversations about code style.
>>>>> >>>>
>>>>> >>>> ...
>>>>> >>>>>
>>>>> >>>>> Last but not least, new contributor may be less discouraged.
>>>>> When I started contribute I didn’t know how to format my code and I lost a
>>>>> lot of time to add pylint and adjust IntelliJ. I eventually failed.
>>>>> Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> This is a huge benefit. This is why I supported it so much for
>>>>> Java. It is a community benefit. You do not have to be a contributor to the
>>>>> Python SDK to support this. That is why I am writing here. Just eliminate
>>>>> all discussion of formatting. It doesn't really matter what the resulting
>>>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>>>> non-default format.
>>>>> >>>>
>>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
>>>>> reformat touched 50k lines. It does not really matter how big it is.
>>>>> Definitely do it all at once if you think the tool is good enough. And you
>>>>> should pin a version, so churn is not a problem. You can upgrade the
>>>>> version and reformat in a PR later and that is also easy.
>>>>> >>>>
>>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>>> command line. I do a lot of code history digging and the global Java
>>>>> reformat is not really a problem.
>>>>> >>>>
>>>>> >>>> Kenn
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>>
>>>>> >>>>> Also everything will be formatted in a same way, so eventually
>>>>> it would be easier to read.
>>>>> >>>>>
>>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of
>>>>> Jenkins failures won’t take place, so we save time and resources.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and
>>>>> after formatting they looks a little bit weird, but maybe extending the
>>>>> only configurable option in Black - lines, from 88 to 110 would be solution.
>>>>> >>>>>
>>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t
>>>>> know how big obstacle it would be.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> I believe there are two options how it would be possible to
>>>>> introduce Black. First: just do it, it will hurt but then it would be ok
>>>>> (same as a dentist appointment). Of course it may require some work to
>>>>> adjust linters. On the other hand we can do it gradually and start
>>>>> including sdk parts one by one - maybe it will be less painful?
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> As an example I can share one of projects [2] I know that uses
>>>>> Black (they use also other cool checkers and pre-commit [3]). This is how
>>>>> looks their build with all checks [4].
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> To sum up I believe that if we want improve our coding
>>>>> experience, we should improve our toolset. Black seems be recent and quite
>>>>> popular tool what makes think they won’t stop developing it.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [1]
>>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [3] https://pre-commit.com
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> [4]
>>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>>>> robertwb@google.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would
>>>>> change nearly every single line in the codebase (and the loss of all
>>>>> context as well as that particular line).
>>>>> >>>>>>
>>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't
>>>>> conform to that either--we use 2 spaces for indentation, but 4 for
>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>> lines quite frequently (it's telling that black's default line length is
>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>> >>>>>>
>>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that
>>>>> this tool introduces a huge amount of vertical whitespace (e.g. closing
>>>>> parentheses on their own line), e.g.
>>>>> >>>>>>
>>>>> >>>>>> def foo(
>>>>> >>>>>>     args
>>>>> >>>>>> ):
>>>>> >>>>>>   if (
>>>>> >>>>>>       long expression)
>>>>> >>>>>>   ):
>>>>> >>>>>>     func(
>>>>> >>>>>>         args
>>>>> >>>>>>     )
>>>>> >>>>>>
>>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on
>>>>> the same lines, as well as omit the newline after "if (", and disabling
>>>>> formatting of strings, which reduce the churn in our codebase to 15k lines
>>>>> (adding about 4k) out of 200k total.
>>>>> >>>>>>
>>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>>> >>>>>>
>>>>> >>>>>> It's still very opinionated, often in different ways then me,
>>>>> and doesn't understand the semantics of the code, but possibly something we
>>>>> could live with given the huge advantages of an autoformatter.
>>>>> >>>>>>
>>>>> >>>>>> An intermediate point would be to allow, but not require,
>>>>> autoformatting of changed lines.
>>>>> >>>>>>
>>>>> >>>>>> As for being beta quality, it looks like it's got a decent
>>>>> number of contributors and in my book being in the python github project is
>>>>> a strong positive signal. But, due to the above issues, I think we'd have
>>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space
>>>>> issue is a 2-line change, and the rest implemented as a post-processing
>>>>> step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>> >>>>>>
>>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <
>>>>> iemejia@gmail.com> wrote:
>>>>> >>>>>> >
>>>>> >>>>>> > > I think the question is if it can be configured in a way to
>>>>> fit our
>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>> reformat the
>>>>> >>>>>> > > entire Python SDK.
>>>>> >>>>>> >
>>>>> >>>>>> > It cannot be configured to do what we actually do because
>>>>> Black is
>>>>> >>>>>> > configurable only to support the standard python codestyle
>>>>> guidelines
>>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects
>>>>> in the
>>>>> >>>>>> > python world use.
>>>>> >>>>>> >
>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>> history. This
>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>> the feeling
>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>> the linter has
>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>> >>>>>> >
>>>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs
>>>>> we have
>>>>> >>>>>> > to deal with.
>>>>> >>>>>> >
>>>>> >>>>>> >
>>>>> >>>>>> >
>>>>> >>>>>> >
>>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>>>> mxm@apache.org> wrote:
>>>>> >>>>>> > >
>>>>> >>>>>> > > I think the question is if it can be configured in a way to
>>>>> fit our
>>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>>> reformat the
>>>>> >>>>>> > > entire Python SDK.
>>>>> >>>>>> > >
>>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>>> history. This
>>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have
>>>>> the feeling
>>>>> >>>>>> > > that this might be less of a problem with Python because
>>>>> the linter has
>>>>> >>>>>> > > more rules than Checkstyle had.
>>>>> >>>>>> > >
>>>>> >>>>>> > > -Max
>>>>> >>>>>> > >
>>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>> >>>>>> > > >> My concerns are:
>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>> warning.
>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>>>> same reason I also strongly prefer not using a fork for a specific setting.
>>>>> Fork will only have less people looking at it.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > I suppose the project is marked as beta because it is
>>>>> recent, it was
>>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>>>>> change since
>>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in
>>>>> that case is
>>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>>>> contribution page [1]
>>>>> >>>>>> > > > you can notice that it is less and less a single person
>>>>> project, there
>>>>> >>>>>> > > > have been 93 independent contributions since the project
>>>>> became
>>>>> >>>>>> > > > public, and the fact that it is hosted in the python
>>>>> organization
>>>>> >>>>>> > > > github [2] gives some confidence on the project
>>>>> continuity.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > You are right however about the fact that the main author
>>>>> seems to be
>>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he
>>>>> can seem
>>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>> recommendations
>>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4
>>>>> spaces
>>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style
>>>>> guide [4],
>>>>> >>>>>> > > > So, probably it should be to us to reconsider the current
>>>>> policy to
>>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility
>>>>> on
>>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
>>>>> positive given the
>>>>> >>>>>> > > > size of our code base.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files
>>>>> failed to reformat.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > error: cannot format
>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>> >>>>>> > > > error: cannot format
>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>> >>>>>> > > > error: cannot format
>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>> file=sys.stderr)
>>>>> >>>>>> > > > error: cannot format
>>>>> >>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc
>>>>> else '    ',
>>>>> >>>>>> > > > end=' ')
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > I still think this can be positive for the project but
>>>>> well I am
>>>>> >>>>>> > > > barely a contributor to the python code base so I let you
>>>>> the python
>>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like
>>>>> a good
>>>>> >>>>>> > > > improvement for the project.
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>> >>>>>> > > > [2] https://github.com/python
>>>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>> >>>>>> > > > [4]
>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>> >>>>>> > > >
>>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>>>> altay@google.com> wrote:
>>>>> >>>>>> > > >>
>>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>>> warning.
>>>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>>>> same reason I also strongly prefer not using a fork for a specific setting.
>>>>> Fork will only have less people looking at it.
>>>>> >>>>>> > > >>
>>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>>>>> issues are real as pointed in the thread. If someone would like to give it
>>>>> a try and see how it would look like for us that would be interesting.
>>>>> >>>>>> > > >>
>>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>> >>>>>> > > >>>
>>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures
>>>>> are caused by lint problems.
>>>>> >>>>>> > > >>> I think it would be great to have something similar to
>>>>> Spotless in Java SDK (I heard there is problem with configuring Black with
>>>>> IntelliJ).
>>>>> >>>>>> > > >>>
>>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>> robertwb@google.com> wrote:
>>>>> >>>>>> > > >>>>
>>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>>>>> haven't looked at
>>>>> >>>>>> > > >>>> how well this particular one works. We might have to
>>>>> go with
>>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>> >>>>>> > > >>>>
>>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>> pabloem@google.com> wrote:
>>>>> >>>>>> > > >>>>>
>>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple
>>>>> people (myself included) who've been annoyed by having to take care of lint
>>>>> issues that maybe a code formatter could save us.
>>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>> >>>>>> > > >>>>> -P.
>>>>> >>>>>> > > >>>>>
>>>>> >>>>>> > > >>>>>
>>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>> iemejia@gmail.com> wrote:
>>>>> >>>>>> > > >>>>>>
>>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code
>>>>> auto formatter that
>>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for
>>>>> python, and wanted to
>>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>>>>> people to get this
>>>>> >>>>>> > > >>>>>> into the build?
>>>>> >>>>>> > > >>>>>>
>>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a
>>>>> good improvement and
>>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>> >>>>>> > > >>>>>>
>>>>> >>>>>> > > >>>>>> WDYT?
>>>>> >>>>>> > > >>>>>>
>>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>
>>>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Udi Meiri <eh...@google.com>.
+1 to autoformatting

On Wed, Jan 22, 2020 at 9:57 AM Luke Cwik <lc...@google.com> wrote:

> +1 to autoformatters. Also the Beam Java SDK went through a one time pass
> to apply the spotless formatting.
>
> On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:
>
>> +1 to autoformatters and yapf. It appears to be a well maintained
>> project. I do support making a one time pass to apply formatting the whole
>> code base.
>>
>> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:
>>
>>>
>>>> It'd be good if there was a way to only apply to violating (or at
>>>> least changed) lines.
>>>
>>>
>>> I assumed the first thing we’d do is convert all of the code in one go,
>>> since it’s a very safe operation. Did you have something else in mind?
>>>
>>> -chad
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com>
>>>> wrote:
>>>> >
>>>> > +1 to autoformatting
>>>> >
>>>> > Let me add some nuance to that.
>>>> >
>>>> > The way I see it there are 2 varieties of formatters:  those which
>>>> take the original formatting into consideration (autopep8) and those which
>>>> disregard it (yapf, black).
>>>> >
>>>> > I much prefer yapf to black, because you have plenty of options to
>>>> tweak with yapf (enough to make the output a pretty close match to the
>>>> current Beam style), and you can mark areas to preserve the original
>>>> formatting, which could be very useful with Pipeline building with pipe
>>>> operators.  Please don't pick black.
>>>> >
>>>> > autopep8 is more along the lines of spotless in Java -- it only
>>>> corrects code that breaks the project's style rules.  The big problem with
>>>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>>>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>>>> problem with -- the problem is the use of either 2 or 4 spaces depending on
>>>> context (expression start vs hanging indent, etc).  This is my *biggest*
>>>> gripe about the current style.  PyCharm doesn't have enough control
>>>> either.  So, if we can choose a style that can be expressed by flake8 or
>>>> pycodestyle then we can use autopep8 to enforce it.
>>>> >
>>>> > I'd prefer autopep8 to yapf because I like having a little wiggle
>>>> room to influence the style, but on a big project like Beam all that wiggle
>>>> room ends up to minor but noticeable inconsistencies in style throughout
>>>> the project.  yapf ensures completely consistent style, but the tradeoff is
>>>> that it's sometimes ugly, especially in scenarios with similar repeated
>>>> entries like argparse, where yapf might insert line breaks in visually
>>>> inconsistent and unappealing ways depending on the lengths of the keywords
>>>> and expressions involved.
>>>> >
>>>> > Either way (but especially if we choose yapf) I think it'd be a nice
>>>> addition to setup a pre-commit [1] config so that people can opt in to
>>>> running *lightweight* autofixers prior to commit.  This will not only
>>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>>> Jenkins spends pointing out lint errors.
>>>> >
>>>> > [1] https://pre-commit.com/
>>>> >
>>>> > -chad
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Last time we discussed this there seems not to be much progress into
>>>> autoformatting.
>>>> >> This tool looks more tweakable, so maybe it could be more
>>>> appropriate for Beam's use case.
>>>> >> https://github.com/google/yapf/
>>>> >> WDYT?
>>>> >>
>>>> >>
>>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org>
>>>> wrote:
>>>> >>>
>>>> >>> +1 for any autoformatter for Python SDK that does the job. My
>>>> experience is that since spotless in Java SDK I would never start a new
>>>> Java project without it. So many great benefits not only for one person
>>>> coding but for all community.
>>>> >>>
>>>> >>> It is a GitHub UI issue that you cannot easily browse past the
>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>> command line. I do a lot of code history digging and the global Java
>>>> reformat is not really a problem.
>>>> >>>
>>>> >>> It's actually one more click on Github but I agree it's not the
>>>> best way to search the history. The most convenient and clear one I've
>>>> found so far is in Jetbrains IDEs (Intelij) where you can:
>>>> >>>
>>>> >>> right click on line number -> "annotate" -> click again ->
>>>> "annotate previous revision" -> ...
>>>> >>>
>>>> >>> You can also use "compare with" to see the diff between two
>>>> revisions.
>>>> >>>
>>>> >>> Łukasz
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>>>> napisał(a):
>>>> >>>>
>>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there
>>>> are some issues?)
>>>> >>>>
>>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>>> ka.kucharczyk@gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> What else actually we gain? My guess is faster PR review
>>>> iteration. We will skip some of conversations about code style.
>>>> >>>>
>>>> >>>> ...
>>>> >>>>>
>>>> >>>>> Last but not least, new contributor may be less discouraged. When
>>>> I started contribute I didn’t know how to format my code and I lost a lot
>>>> of time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>>>> write code intuitively and when I don’t forget I rerun tox.
>>>> >>>>
>>>> >>>>
>>>> >>>> This is a huge benefit. This is why I supported it so much for
>>>> Java. It is a community benefit. You do not have to be a contributor to the
>>>> Python SDK to support this. That is why I am writing here. Just eliminate
>>>> all discussion of formatting. It doesn't really matter what the resulting
>>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>>> non-default format.
>>>> >>>>
>>>> >>>> Reformating 20k lines or 200k is not hard. The Java global
>>>> reformat touched 50k lines. It does not really matter how big it is.
>>>> Definitely do it all at once if you think the tool is good enough. And you
>>>> should pin a version, so churn is not a problem. You can upgrade the
>>>> version and reformat in a PR later and that is also easy.
>>>> >>>>
>>>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>>> to get GitHub to display blame before a reformat. It is easier with the
>>>> command line. I do a lot of code history digging and the global Java
>>>> reformat is not really a problem.
>>>> >>>>
>>>> >>>> Kenn
>>>> >>>>
>>>> >>>>
>>>> >>>>>
>>>> >>>>> Also everything will be formatted in a same way, so eventually it
>>>> would be easier to read.
>>>> >>>>>
>>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of
>>>> Jenkins failures won’t take place, so we save time and resources.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> One of disadvantages is that our pipelines has custom syntax and
>>>> after formatting they looks a little bit weird, but maybe extending the
>>>> only configurable option in Black - lines, from 88 to 110 would be solution.
>>>> >>>>>
>>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t
>>>> know how big obstacle it would be.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> I believe there are two options how it would be possible to
>>>> introduce Black. First: just do it, it will hurt but then it would be ok
>>>> (same as a dentist appointment). Of course it may require some work to
>>>> adjust linters. On the other hand we can do it gradually and start
>>>> including sdk parts one by one - maybe it will be less painful?
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> As an example I can share one of projects [2] I know that uses
>>>> Black (they use also other cool checkers and pre-commit [3]). This is how
>>>> looks their build with all checks [4].
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> To sum up I believe that if we want improve our coding
>>>> experience, we should improve our toolset. Black seems be recent and quite
>>>> popular tool what makes think they won’t stop developing it.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> [1]
>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> [3] https://pre-commit.com
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> [4]
>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>>> robertwb@google.com> wrote:
>>>> >>>>>>
>>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would
>>>> change nearly every single line in the codebase (and the loss of all
>>>> context as well as that particular line).
>>>> >>>>>>
>>>> >>>>>> This is probably why the 2-space fork exists. However, we don't
>>>> conform to that either--we use 2 spaces for indentation, but 4 for
>>>> continuation indentation. (As for the history of this, this goes back to
>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>> lines quite frequently (it's telling that black's default line length is
>>>> 88)). This turns out to be an easy change to the codebase.
>>>> >>>>>>
>>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that
>>>> this tool introduces a huge amount of vertical whitespace (e.g. closing
>>>> parentheses on their own line), e.g.
>>>> >>>>>>
>>>> >>>>>> def foo(
>>>> >>>>>>     args
>>>> >>>>>> ):
>>>> >>>>>>   if (
>>>> >>>>>>       long expression)
>>>> >>>>>>   ):
>>>> >>>>>>     func(
>>>> >>>>>>         args
>>>> >>>>>>     )
>>>> >>>>>>
>>>> >>>>>> I wrote a simple post-processor to put closing parentheses on
>>>> the same lines, as well as omit the newline after "if (", and disabling
>>>> formatting of strings, which reduce the churn in our codebase to 15k lines
>>>> (adding about 4k) out of 200k total.
>>>> >>>>>>
>>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>>> >>>>>>
>>>> >>>>>> It's still very opinionated, often in different ways then me,
>>>> and doesn't understand the semantics of the code, but possibly something we
>>>> could live with given the huge advantages of an autoformatter.
>>>> >>>>>>
>>>> >>>>>> An intermediate point would be to allow, but not require,
>>>> autoformatting of changed lines.
>>>> >>>>>>
>>>> >>>>>> As for being beta quality, it looks like it's got a decent
>>>> number of contributors and in my book being in the python github project is
>>>> a strong positive signal. But, due to the above issues, I think we'd have
>>>> to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space
>>>> issue is a 2-line change, and the rest implemented as a post-processing
>>>> step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>> >>>>>>
>>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com>
>>>> wrote:
>>>> >>>>>> >
>>>> >>>>>> > > I think the question is if it can be configured in a way to
>>>> fit our
>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>> reformat the
>>>> >>>>>> > > entire Python SDK.
>>>> >>>>>> >
>>>> >>>>>> > It cannot be configured to do what we actually do because
>>>> Black is
>>>> >>>>>> > configurable only to support the standard python codestyle
>>>> guidelines
>>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in
>>>> the
>>>> >>>>>> > python world use.
>>>> >>>>>> >
>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>> history. This
>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the
>>>> feeling
>>>> >>>>>> > > that this might be less of a problem with Python because the
>>>> linter has
>>>> >>>>>> > > more rules than Checkstyle had.
>>>> >>>>>> >
>>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs
>>>> we have
>>>> >>>>>> > to deal with.
>>>> >>>>>> >
>>>> >>>>>> >
>>>> >>>>>> >
>>>> >>>>>> >
>>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>>> mxm@apache.org> wrote:
>>>> >>>>>> > >
>>>> >>>>>> > > I think the question is if it can be configured in a way to
>>>> fit our
>>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>>> reformat the
>>>> >>>>>> > > entire Python SDK.
>>>> >>>>>> > >
>>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>>> history. This
>>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the
>>>> feeling
>>>> >>>>>> > > that this might be less of a problem with Python because the
>>>> linter has
>>>> >>>>>> > > more rules than Checkstyle had.
>>>> >>>>>> > >
>>>> >>>>>> > > -Max
>>>> >>>>>> > >
>>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>> >>>>>> > > >> My concerns are:
>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>> warning.
>>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>>> same reason I also strongly prefer not using a fork for a specific setting.
>>>> Fork will only have less people looking at it.
>>>> >>>>>> > > >
>>>> >>>>>> > > > I suppose the project is marked as beta because it is
>>>> recent, it was
>>>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>>>> change since
>>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in
>>>> that case is
>>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>>> contribution page [1]
>>>> >>>>>> > > > you can notice that it is less and less a single person
>>>> project, there
>>>> >>>>>> > > > have been 93 independent contributions since the project
>>>> became
>>>> >>>>>> > > > public, and the fact that it is hosted in the python
>>>> organization
>>>> >>>>>> > > > github [2] gives some confidence on the project continuity.
>>>> >>>>>> > > >
>>>> >>>>>> > > > You are right however about the fact that the main author
>>>> seems to be
>>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he
>>>> can seem
>>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>> recommendations
>>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4
>>>> spaces
>>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style
>>>> guide [4],
>>>> >>>>>> > > > So, probably it should be to us to reconsider the current
>>>> policy to
>>>> >>>>>> > > > adapt to the standards (and the tool).
>>>> >>>>>> > > >
>>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is
>>>> positive given the
>>>> >>>>>> > > > size of our code base.
>>>> >>>>>> > > >
>>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files
>>>> failed to reformat.
>>>> >>>>>> > > >
>>>> >>>>>> > > > error: cannot format
>>>> >>>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>> >>>>>> > > > error: cannot format
>>>> >>>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>> >>>>>> > > > error: cannot format
>>>> >>>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>> file=sys.stderr)
>>>> >>>>>> > > > error: cannot format
>>>> >>>>>> > > >
>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc
>>>> else '    ',
>>>> >>>>>> > > > end=' ')
>>>> >>>>>> > > >
>>>> >>>>>> > > > I still think this can be positive for the project but
>>>> well I am
>>>> >>>>>> > > > barely a contributor to the python code base so I let you
>>>> the python
>>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like
>>>> a good
>>>> >>>>>> > > > improvement for the project.
>>>> >>>>>> > > >
>>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>> >>>>>> > > > [2] https://github.com/python
>>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>> >>>>>> > > > [4]
>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>> >>>>>> > > >
>>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>>> altay@google.com> wrote:
>>>> >>>>>> > > >>
>>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>> >>>>>> > > >> - The product is clearly marked as beta with a big
>>>> warning.
>>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>>> same reason I also strongly prefer not using a fork for a specific setting.
>>>> Fork will only have less people looking at it.
>>>> >>>>>> > > >>
>>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>>>> issues are real as pointed in the thread. If someone would like to give it
>>>> a try and see how it would look like for us that would be interesting.
>>>> >>>>>> > > >>
>>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>> ka.kucharczyk@gmail.com> wrote:
>>>> >>>>>> > > >>>
>>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures
>>>> are caused by lint problems.
>>>> >>>>>> > > >>> I think it would be great to have something similar to
>>>> Spotless in Java SDK (I heard there is problem with configuring Black with
>>>> IntelliJ).
>>>> >>>>>> > > >>>
>>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>> robertwb@google.com> wrote:
>>>> >>>>>> > > >>>>
>>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>>>> haven't looked at
>>>> >>>>>> > > >>>> how well this particular one works. We might have to go
>>>> with
>>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>> >>>>>> > > >>>>
>>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>> pabloem@google.com> wrote:
>>>> >>>>>> > > >>>>>
>>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple
>>>> people (myself included) who've been annoyed by having to take care of lint
>>>> issues that maybe a code formatter could save us.
>>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>>> >>>>>> > > >>>>> -P.
>>>> >>>>>> > > >>>>>
>>>> >>>>>> > > >>>>>
>>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>> iemejia@gmail.com> wrote:
>>>> >>>>>> > > >>>>>>
>>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code
>>>> auto formatter that
>>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python,
>>>> and wanted to
>>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>>>> people to get this
>>>> >>>>>> > > >>>>>> into the build?
>>>> >>>>>> > > >>>>>>
>>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good
>>>> improvement and
>>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>> >>>>>> > > >>>>>>
>>>> >>>>>> > > >>>>>> WDYT?
>>>> >>>>>> > > >>>>>>
>>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>
>>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Luke Cwik <lc...@google.com>.
+1 to autoformatters. Also the Beam Java SDK went through a one time pass
to apply the spotless formatting.

On Tue, Jan 21, 2020 at 9:52 PM Ahmet Altay <al...@google.com> wrote:

> +1 to autoformatters and yapf. It appears to be a well maintained project.
> I do support making a one time pass to apply formatting the whole code base.
>
> On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:
>
>>
>>> It'd be good if there was a way to only apply to violating (or at
>>> least changed) lines.
>>
>>
>> I assumed the first thing we’d do is convert all of the code in one go,
>> since it’s a very safe operation. Did you have something else in mind?
>>
>> -chad
>>
>>
>>
>>
>>
>>>
>>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
>>> >
>>> > +1 to autoformatting
>>> >
>>> > Let me add some nuance to that.
>>> >
>>> > The way I see it there are 2 varieties of formatters:  those which
>>> take the original formatting into consideration (autopep8) and those which
>>> disregard it (yapf, black).
>>> >
>>> > I much prefer yapf to black, because you have plenty of options to
>>> tweak with yapf (enough to make the output a pretty close match to the
>>> current Beam style), and you can mark areas to preserve the original
>>> formatting, which could be very useful with Pipeline building with pipe
>>> operators.  Please don't pick black.
>>> >
>>> > autopep8 is more along the lines of spotless in Java -- it only
>>> corrects code that breaks the project's style rules.  The big problem with
>>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>>> problem with -- the problem is the use of either 2 or 4 spaces depending on
>>> context (expression start vs hanging indent, etc).  This is my *biggest*
>>> gripe about the current style.  PyCharm doesn't have enough control
>>> either.  So, if we can choose a style that can be expressed by flake8 or
>>> pycodestyle then we can use autopep8 to enforce it.
>>> >
>>> > I'd prefer autopep8 to yapf because I like having a little wiggle room
>>> to influence the style, but on a big project like Beam all that wiggle room
>>> ends up to minor but noticeable inconsistencies in style throughout the
>>> project.  yapf ensures completely consistent style, but the tradeoff is
>>> that it's sometimes ugly, especially in scenarios with similar repeated
>>> entries like argparse, where yapf might insert line breaks in visually
>>> inconsistent and unappealing ways depending on the lengths of the keywords
>>> and expressions involved.
>>> >
>>> > Either way (but especially if we choose yapf) I think it'd be a nice
>>> addition to setup a pre-commit [1] config so that people can opt in to
>>> running *lightweight* autofixers prior to commit.  This will not only
>>> reduce dev frustration but will also reduce the amount of cpu cycles that
>>> Jenkins spends pointing out lint errors.
>>> >
>>> > [1] https://pre-commit.com/
>>> >
>>> > -chad
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com>
>>> wrote:
>>> >>
>>> >> Last time we discussed this there seems not to be much progress into
>>> autoformatting.
>>> >> This tool looks more tweakable, so maybe it could be more appropriate
>>> for Beam's use case.
>>> >> https://github.com/google/yapf/
>>> >> WDYT?
>>> >>
>>> >>
>>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org>
>>> wrote:
>>> >>>
>>> >>> +1 for any autoformatter for Python SDK that does the job. My
>>> experience is that since spotless in Java SDK I would never start a new
>>> Java project without it. So many great benefits not only for one person
>>> coding but for all community.
>>> >>>
>>> >>> It is a GitHub UI issue that you cannot easily browse past the
>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>> to get GitHub to display blame before a reformat. It is easier with the
>>> command line. I do a lot of code history digging and the global Java
>>> reformat is not really a problem.
>>> >>>
>>> >>> It's actually one more click on Github but I agree it's not the best
>>> way to search the history. The most convenient and clear one I've found so
>>> far is in Jetbrains IDEs (Intelij) where you can:
>>> >>>
>>> >>> right click on line number -> "annotate" -> click again -> "annotate
>>> previous revision" -> ...
>>> >>>
>>> >>> You can also use "compare with" to see the diff between two
>>> revisions.
>>> >>>
>>> >>> Łukasz
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>>> napisał(a):
>>> >>>>
>>> >>>> +1 pending good enough tooling (I can't quite tell - seems there
>>> are some issues?)
>>> >>>>
>>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>> ka.kucharczyk@gmail.com> wrote:
>>> >>>>>
>>> >>>>> What else actually we gain? My guess is faster PR review
>>> iteration. We will skip some of conversations about code style.
>>> >>>>
>>> >>>> ...
>>> >>>>>
>>> >>>>> Last but not least, new contributor may be less discouraged. When
>>> I started contribute I didn’t know how to format my code and I lost a lot
>>> of time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>>> write code intuitively and when I don’t forget I rerun tox.
>>> >>>>
>>> >>>>
>>> >>>> This is a huge benefit. This is why I supported it so much for
>>> Java. It is a community benefit. You do not have to be a contributor to the
>>> Python SDK to support this. That is why I am writing here. Just eliminate
>>> all discussion of formatting. It doesn't really matter what the resulting
>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>> non-default format.
>>> >>>>
>>> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat
>>> touched 50k lines. It does not really matter how big it is. Definitely do
>>> it all at once if you think the tool is good enough. And you should pin a
>>> version, so churn is not a problem. You can upgrade the version and
>>> reformat in a PR later and that is also easy.
>>> >>>>
>>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>>> reformat. It is not actually that hard, but does take a couple extra clicks
>>> to get GitHub to display blame before a reformat. It is easier with the
>>> command line. I do a lot of code history digging and the global Java
>>> reformat is not really a problem.
>>> >>>>
>>> >>>> Kenn
>>> >>>>
>>> >>>>
>>> >>>>>
>>> >>>>> Also everything will be formatted in a same way, so eventually it
>>> would be easier to read.
>>> >>>>>
>>> >>>>> Moreover, as it was mentioned in previous emails - a lot of
>>> Jenkins failures won’t take place, so we save time and resources.
>>> >>>>>
>>> >>>>>
>>> >>>>> One of disadvantages is that our pipelines has custom syntax and
>>> after formatting they looks a little bit weird, but maybe extending the
>>> only configurable option in Black - lines, from 88 to 110 would be solution.
>>> >>>>>
>>> >>>>> Second one is that Black requires Python 3 to be run. I don’t know
>>> how big obstacle it would be.
>>> >>>>>
>>> >>>>>
>>> >>>>> I believe there are two options how it would be possible to
>>> introduce Black. First: just do it, it will hurt but then it would be ok
>>> (same as a dentist appointment). Of course it may require some work to
>>> adjust linters. On the other hand we can do it gradually and start
>>> including sdk parts one by one - maybe it will be less painful?
>>> >>>>>
>>> >>>>>
>>> >>>>> As an example I can share one of projects [2] I know that uses
>>> Black (they use also other cool checkers and pre-commit [3]). This is how
>>> looks their build with all checks [4].
>>> >>>>>
>>> >>>>>
>>> >>>>> To sum up I believe that if we want improve our coding experience,
>>> we should improve our toolset. Black seems be recent and quite popular tool
>>> what makes think they won’t stop developing it.
>>> >>>>>
>>> >>>>>
>>> >>>>> [1]
>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>> >>>>>
>>> >>>>>
>>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>> >>>>>
>>> >>>>>
>>> >>>>> [3] https://pre-commit.com
>>> >>>>>
>>> >>>>>
>>> >>>>> [4]
>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>>> robertwb@google.com> wrote:
>>> >>>>>>
>>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would
>>> change nearly every single line in the codebase (and the loss of all
>>> context as well as that particular line).
>>> >>>>>>
>>> >>>>>> This is probably why the 2-space fork exists. However, we don't
>>> conform to that either--we use 2 spaces for indentation, but 4 for
>>> continuation indentation. (As for the history of this, this goes back to
>>> Google's internal style guide, probably motivated by consistency with C++,
>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>> lines quite frequently (it's telling that black's default line length is
>>> 88)). This turns out to be an easy change to the codebase.
>>> >>>>>>
>>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that
>>> this tool introduces a huge amount of vertical whitespace (e.g. closing
>>> parentheses on their own line), e.g.
>>> >>>>>>
>>> >>>>>> def foo(
>>> >>>>>>     args
>>> >>>>>> ):
>>> >>>>>>   if (
>>> >>>>>>       long expression)
>>> >>>>>>   ):
>>> >>>>>>     func(
>>> >>>>>>         args
>>> >>>>>>     )
>>> >>>>>>
>>> >>>>>> I wrote a simple post-processor to put closing parentheses on the
>>> same lines, as well as omit the newline after "if (", and disabling
>>> formatting of strings, which reduce the churn in our codebase to 15k lines
>>> (adding about 4k) out of 200k total.
>>> >>>>>>
>>> >>>>>> https://github.com/apache/beam/pull/8712/files
>>> >>>>>>
>>> >>>>>> It's still very opinionated, often in different ways then me, and
>>> doesn't understand the semantics of the code, but possibly something we
>>> could live with given the huge advantages of an autoformatter.
>>> >>>>>>
>>> >>>>>> An intermediate point would be to allow, but not require,
>>> autoformatting of changed lines.
>>> >>>>>>
>>> >>>>>> As for being beta quality, it looks like it's got a decent number
>>> of contributors and in my book being in the python github project is a
>>> strong positive signal. But, due to the above issues, I think we'd have to
>>> maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue
>>> is a 2-line change, and the rest implemented as a post-processing step (for
>>> now, incomplete), so it'd be easy to stay in sync with upstream.)
>>> >>>>>>
>>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com>
>>> wrote:
>>> >>>>>> >
>>> >>>>>> > > I think the question is if it can be configured in a way to
>>> fit our
>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>> reformat the
>>> >>>>>> > > entire Python SDK.
>>> >>>>>> >
>>> >>>>>> > It cannot be configured to do what we actually do because Black
>>> is
>>> >>>>>> > configurable only to support the standard python codestyle
>>> guidelines
>>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in
>>> the
>>> >>>>>> > python world use.
>>> >>>>>> >
>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>> history. This
>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the
>>> feeling
>>> >>>>>> > > that this might be less of a problem with Python because the
>>> linter has
>>> >>>>>> > > more rules than Checkstyle had.
>>> >>>>>> >
>>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs
>>> we have
>>> >>>>>> > to deal with.
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>>> mxm@apache.org> wrote:
>>> >>>>>> > >
>>> >>>>>> > > I think the question is if it can be configured in a way to
>>> fit our
>>> >>>>>> > > current linter's style. I don't think it is feasible to
>>> reformat the
>>> >>>>>> > > entire Python SDK.
>>> >>>>>> > >
>>> >>>>>> > > Reformatted lines don't allow quick access to the Git
>>> history. This
>>> >>>>>> > > effect is still visible in the Java SDK. However, I have the
>>> feeling
>>> >>>>>> > > that this might be less of a problem with Python because the
>>> linter has
>>> >>>>>> > > more rules than Checkstyle had.
>>> >>>>>> > >
>>> >>>>>> > > -Max
>>> >>>>>> > >
>>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>> >>>>>> > > >> My concerns are:
>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>> same reason I also strongly prefer not using a fork for a specific setting.
>>> Fork will only have less people looking at it.
>>> >>>>>> > > >
>>> >>>>>> > > > I suppose the project is marked as beta because it is
>>> recent, it was
>>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>>> change since
>>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in
>>> that case is
>>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>>> contribution page [1]
>>> >>>>>> > > > you can notice that it is less and less a single person
>>> project, there
>>> >>>>>> > > > have been 93 independent contributions since the project
>>> became
>>> >>>>>> > > > public, and the fact that it is hosted in the python
>>> organization
>>> >>>>>> > > > github [2] gives some confidence on the project continuity.
>>> >>>>>> > > >
>>> >>>>>> > > > You are right however about the fact that the main author
>>> seems to be
>>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can
>>> seem
>>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>>> recommendations
>>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4
>>> spaces
>>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style
>>> guide [4],
>>> >>>>>> > > > So, probably it should be to us to reconsider the current
>>> policy to
>>> >>>>>> > > > adapt to the standards (and the tool).
>>> >>>>>> > > >
>>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>> >>>>>> > > > sdks/python and got only 4 parsing errors which is positive
>>> given the
>>> >>>>>> > > > size of our code base.
>>> >>>>>> > > >
>>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files
>>> failed to reformat.
>>> >>>>>> > > >
>>> >>>>>> > > > error: cannot format
>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>> >>>>>> > > > error: cannot format
>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>> >>>>>> > > > error: cannot format
>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>> file=sys.stderr)
>>> >>>>>> > > > error: cannot format
>>> >>>>>> > > >
>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc
>>> else '    ',
>>> >>>>>> > > > end=' ')
>>> >>>>>> > > >
>>> >>>>>> > > > I still think this can be positive for the project but well
>>> I am
>>> >>>>>> > > > barely a contributor to the python code base so I let you
>>> the python
>>> >>>>>> > > > maintainers to reconsider this, in any case it seems like a
>>> good
>>> >>>>>> > > > improvement for the project.
>>> >>>>>> > > >
>>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>> >>>>>> > > > [2] https://github.com/python
>>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>> >>>>>> > > > [4]
>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>> >>>>>> > > >
>>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>>> altay@google.com> wrote:
>>> >>>>>> > > >>
>>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>> autoformatters but I am not familiar with this one. My concerns are:
>>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>> >>>>>> > > >> - It looks like mostly a single person project. For the
>>> same reason I also strongly prefer not using a fork for a specific setting.
>>> Fork will only have less people looking at it.
>>> >>>>>> > > >>
>>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>>> issues are real as pointed in the thread. If someone would like to give it
>>> a try and see how it would look like for us that would be interesting.
>>> >>>>>> > > >>
>>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>> ka.kucharczyk@gmail.com> wrote:
>>> >>>>>> > > >>>
>>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures
>>> are caused by lint problems.
>>> >>>>>> > > >>> I think it would be great to have something similar to
>>> Spotless in Java SDK (I heard there is problem with configuring Black with
>>> IntelliJ).
>>> >>>>>> > > >>>
>>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>> robertwb@google.com> wrote:
>>> >>>>>> > > >>>>
>>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>>> haven't looked at
>>> >>>>>> > > >>>> how well this particular one works. We might have to go
>>> with
>>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>> >>>>>> > > >>>>
>>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>> pabloem@google.com> wrote:
>>> >>>>>> > > >>>>>
>>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple
>>> people (myself included) who've been annoyed by having to take care of lint
>>> issues that maybe a code formatter could save us.
>>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>>> >>>>>> > > >>>>> -P.
>>> >>>>>> > > >>>>>
>>> >>>>>> > > >>>>>
>>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>> iemejia@gmail.com> wrote:
>>> >>>>>> > > >>>>>>
>>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto
>>> formatter that
>>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python,
>>> and wanted to
>>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>>> people to get this
>>> >>>>>> > > >>>>>> into the build?
>>> >>>>>> > > >>>>>>
>>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good
>>> improvement and
>>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>> >>>>>> > > >>>>>>
>>> >>>>>> > > >>>>>> WDYT?
>>> >>>>>> > > >>>>>>
>>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>>
>>

Re: [DISCUSS] Autoformat python code with Black

Posted by Ahmet Altay <al...@google.com>.
+1 to autoformatters and yapf. It appears to be a well maintained project.
I do support making a one time pass to apply formatting the whole code base.

On Tue, Jan 21, 2020 at 5:38 PM Chad Dombrova <ch...@gmail.com> wrote:

>
>> It'd be good if there was a way to only apply to violating (or at
>> least changed) lines.
>
>
> I assumed the first thing we’d do is convert all of the code in one go,
> since it’s a very safe operation. Did you have something else in mind?
>
> -chad
>
>
>
>
>
>>
>> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
>> >
>> > +1 to autoformatting
>> >
>> > Let me add some nuance to that.
>> >
>> > The way I see it there are 2 varieties of formatters:  those which take
>> the original formatting into consideration (autopep8) and those which
>> disregard it (yapf, black).
>> >
>> > I much prefer yapf to black, because you have plenty of options to
>> tweak with yapf (enough to make the output a pretty close match to the
>> current Beam style), and you can mark areas to preserve the original
>> formatting, which could be very useful with Pipeline building with pipe
>> operators.  Please don't pick black.
>> >
>> > autopep8 is more along the lines of spotless in Java -- it only
>> corrects code that breaks the project's style rules.  The big problem with
>> Beam's current style is that it is so esoteric that autopep8 can't enforce
>> it -- and I'm not just talking about 2-spaces, which I don't really have a
>> problem with -- the problem is the use of either 2 or 4 spaces depending on
>> context (expression start vs hanging indent, etc).  This is my *biggest*
>> gripe about the current style.  PyCharm doesn't have enough control
>> either.  So, if we can choose a style that can be expressed by flake8 or
>> pycodestyle then we can use autopep8 to enforce it.
>> >
>> > I'd prefer autopep8 to yapf because I like having a little wiggle room
>> to influence the style, but on a big project like Beam all that wiggle room
>> ends up to minor but noticeable inconsistencies in style throughout the
>> project.  yapf ensures completely consistent style, but the tradeoff is
>> that it's sometimes ugly, especially in scenarios with similar repeated
>> entries like argparse, where yapf might insert line breaks in visually
>> inconsistent and unappealing ways depending on the lengths of the keywords
>> and expressions involved.
>> >
>> > Either way (but especially if we choose yapf) I think it'd be a nice
>> addition to setup a pre-commit [1] config so that people can opt in to
>> running *lightweight* autofixers prior to commit.  This will not only
>> reduce dev frustration but will also reduce the amount of cpu cycles that
>> Jenkins spends pointing out lint errors.
>> >
>> > [1] https://pre-commit.com/
>> >
>> > -chad
>> >
>> >
>> >
>> >
>> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com>
>> wrote:
>> >>
>> >> Last time we discussed this there seems not to be much progress into
>> autoformatting.
>> >> This tool looks more tweakable, so maybe it could be more appropriate
>> for Beam's use case.
>> >> https://github.com/google/yapf/
>> >> WDYT?
>> >>
>> >>
>> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org>
>> wrote:
>> >>>
>> >>> +1 for any autoformatter for Python SDK that does the job. My
>> experience is that since spotless in Java SDK I would never start a new
>> Java project without it. So many great benefits not only for one person
>> coding but for all community.
>> >>>
>> >>> It is a GitHub UI issue that you cannot easily browse past the
>> reformat. It is not actually that hard, but does take a couple extra clicks
>> to get GitHub to display blame before a reformat. It is easier with the
>> command line. I do a lot of code history digging and the global Java
>> reformat is not really a problem.
>> >>>
>> >>> It's actually one more click on Github but I agree it's not the best
>> way to search the history. The most convenient and clear one I've found so
>> far is in Jetbrains IDEs (Intelij) where you can:
>> >>>
>> >>> right click on line number -> "annotate" -> click again -> "annotate
>> previous revision" -> ...
>> >>>
>> >>> You can also use "compare with" to see the diff between two revisions.
>> >>>
>> >>> Łukasz
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
>> napisał(a):
>> >>>>
>> >>>> +1 pending good enough tooling (I can't quite tell - seems there are
>> some issues?)
>> >>>>
>> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>> ka.kucharczyk@gmail.com> wrote:
>> >>>>>
>> >>>>> What else actually we gain? My guess is faster PR review iteration.
>> We will skip some of conversations about code style.
>> >>>>
>> >>>> ...
>> >>>>>
>> >>>>> Last but not least, new contributor may be less discouraged. When I
>> started contribute I didn’t know how to format my code and I lost a lot of
>> time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>> write code intuitively and when I don’t forget I rerun tox.
>> >>>>
>> >>>>
>> >>>> This is a huge benefit. This is why I supported it so much for Java.
>> It is a community benefit. You do not have to be a contributor to the
>> Python SDK to support this. That is why I am writing here. Just eliminate
>> all discussion of formatting. It doesn't really matter what the resulting
>> format is, if it is not crazy to read. I strongly oppose maintaining a
>> non-default format.
>> >>>>
>> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat
>> touched 50k lines. It does not really matter how big it is. Definitely do
>> it all at once if you think the tool is good enough. And you should pin a
>> version, so churn is not a problem. You can upgrade the version and
>> reformat in a PR later and that is also easy.
>> >>>>
>> >>>> It is a GitHub UI issue that you cannot easily browse past the
>> reformat. It is not actually that hard, but does take a couple extra clicks
>> to get GitHub to display blame before a reformat. It is easier with the
>> command line. I do a lot of code history digging and the global Java
>> reformat is not really a problem.
>> >>>>
>> >>>> Kenn
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> Also everything will be formatted in a same way, so eventually it
>> would be easier to read.
>> >>>>>
>> >>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins
>> failures won’t take place, so we save time and resources.
>> >>>>>
>> >>>>>
>> >>>>> One of disadvantages is that our pipelines has custom syntax and
>> after formatting they looks a little bit weird, but maybe extending the
>> only configurable option in Black - lines, from 88 to 110 would be solution.
>> >>>>>
>> >>>>> Second one is that Black requires Python 3 to be run. I don’t know
>> how big obstacle it would be.
>> >>>>>
>> >>>>>
>> >>>>> I believe there are two options how it would be possible to
>> introduce Black. First: just do it, it will hurt but then it would be ok
>> (same as a dentist appointment). Of course it may require some work to
>> adjust linters. On the other hand we can do it gradually and start
>> including sdk parts one by one - maybe it will be less painful?
>> >>>>>
>> >>>>>
>> >>>>> As an example I can share one of projects [2] I know that uses
>> Black (they use also other cool checkers and pre-commit [3]). This is how
>> looks their build with all checks [4].
>> >>>>>
>> >>>>>
>> >>>>> To sum up I believe that if we want improve our coding experience,
>> we should improve our toolset. Black seems be recent and quite popular tool
>> what makes think they won’t stop developing it.
>> >>>>>
>> >>>>>
>> >>>>> [1]
>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>> >>>>>
>> >>>>>
>> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>> >>>>>
>> >>>>>
>> >>>>> [3] https://pre-commit.com
>> >>>>>
>> >>>>>
>> >>>>> [4]
>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>>
>> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would
>> change nearly every single line in the codebase (and the loss of all
>> context as well as that particular line).
>> >>>>>>
>> >>>>>> This is probably why the 2-space fork exists. However, we don't
>> conform to that either--we use 2 spaces for indentation, but 4 for
>> continuation indentation. (As for the history of this, this goes back to
>> Google's internal style guide, probably motivated by consistency with C++,
>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>> lines quite frequently (it's telling that black's default line length is
>> 88)). This turns out to be an easy change to the codebase.
>> >>>>>>
>> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that
>> this tool introduces a huge amount of vertical whitespace (e.g. closing
>> parentheses on their own line), e.g.
>> >>>>>>
>> >>>>>> def foo(
>> >>>>>>     args
>> >>>>>> ):
>> >>>>>>   if (
>> >>>>>>       long expression)
>> >>>>>>   ):
>> >>>>>>     func(
>> >>>>>>         args
>> >>>>>>     )
>> >>>>>>
>> >>>>>> I wrote a simple post-processor to put closing parentheses on the
>> same lines, as well as omit the newline after "if (", and disabling
>> formatting of strings, which reduce the churn in our codebase to 15k lines
>> (adding about 4k) out of 200k total.
>> >>>>>>
>> >>>>>> https://github.com/apache/beam/pull/8712/files
>> >>>>>>
>> >>>>>> It's still very opinionated, often in different ways then me, and
>> doesn't understand the semantics of the code, but possibly something we
>> could live with given the huge advantages of an autoformatter.
>> >>>>>>
>> >>>>>> An intermediate point would be to allow, but not require,
>> autoformatting of changed lines.
>> >>>>>>
>> >>>>>> As for being beta quality, it looks like it's got a decent number
>> of contributors and in my book being in the python github project is a
>> strong positive signal. But, due to the above issues, I think we'd have to
>> maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue
>> is a 2-line change, and the rest implemented as a post-processing step (for
>> now, incomplete), so it'd be easy to stay in sync with upstream.)
>> >>>>>>
>> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com>
>> wrote:
>> >>>>>> >
>> >>>>>> > > I think the question is if it can be configured in a way to
>> fit our
>> >>>>>> > > current linter's style. I don't think it is feasible to
>> reformat the
>> >>>>>> > > entire Python SDK.
>> >>>>>> >
>> >>>>>> > It cannot be configured to do what we actually do because Black
>> is
>> >>>>>> > configurable only to support the standard python codestyle
>> guidelines
>> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in
>> the
>> >>>>>> > python world use.
>> >>>>>> >
>> >>>>>> > > Reformatted lines don't allow quick access to the Git history.
>> This
>> >>>>>> > > effect is still visible in the Java SDK. However, I have the
>> feeling
>> >>>>>> > > that this might be less of a problem with Python because the
>> linter has
>> >>>>>> > > more rules than Checkstyle had.
>> >>>>>> >
>> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs we
>> have
>> >>>>>> > to deal with.
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
>> mxm@apache.org> wrote:
>> >>>>>> > >
>> >>>>>> > > I think the question is if it can be configured in a way to
>> fit our
>> >>>>>> > > current linter's style. I don't think it is feasible to
>> reformat the
>> >>>>>> > > entire Python SDK.
>> >>>>>> > >
>> >>>>>> > > Reformatted lines don't allow quick access to the Git history.
>> This
>> >>>>>> > > effect is still visible in the Java SDK. However, I have the
>> feeling
>> >>>>>> > > that this might be less of a problem with Python because the
>> linter has
>> >>>>>> > > more rules than Checkstyle had.
>> >>>>>> > >
>> >>>>>> > > -Max
>> >>>>>> > >
>> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>> >>>>>> > > >> My concerns are:
>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>> >>>>>> > > >> - It looks like mostly a single person project. For the
>> same reason I also strongly prefer not using a fork for a specific setting.
>> Fork will only have less people looking at it.
>> >>>>>> > > >
>> >>>>>> > > > I suppose the project is marked as beta because it is
>> recent, it was
>> >>>>>> > > > presented in 2018’s pycon, and because some things can
>> change since
>> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in
>> that case is
>> >>>>>> > > > like our own ‘@Experimental’. If you look at the
>> contribution page [1]
>> >>>>>> > > > you can notice that it is less and less a single person
>> project, there
>> >>>>>> > > > have been 93 independent contributions since the project
>> became
>> >>>>>> > > > public, and the fact that it is hosted in the python
>> organization
>> >>>>>> > > > github [2] gives some confidence on the project continuity.
>> >>>>>> > > >
>> >>>>>> > > > You are right however about the fact that the main author
>> seems to be
>> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can
>> seem
>> >>>>>> > > > arbitrary, but he is just following pep8 style guide
>> recommendations
>> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style
>> guide [4],
>> >>>>>> > > > So, probably it should be to us to reconsider the current
>> policy to
>> >>>>>> > > > adapt to the standards (and the tool).
>> >>>>>> > > >
>> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>> >>>>>> > > > sdks/python and got only 4 parsing errors which is positive
>> given the
>> >>>>>> > > > size of our code base.
>> >>>>>> > > >
>> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files
>> failed to reformat.
>> >>>>>> > > >
>> >>>>>> > > > error: cannot format
>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>> >>>>>> > > > error: cannot format
>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>> >>>>>> > > > error: cannot format
>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>> file=sys.stderr)
>> >>>>>> > > > error: cannot format
>> >>>>>> > > >
>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc
>> else '    ',
>> >>>>>> > > > end=' ')
>> >>>>>> > > >
>> >>>>>> > > > I still think this can be positive for the project but well
>> I am
>> >>>>>> > > > barely a contributor to the python code base so I let you
>> the python
>> >>>>>> > > > maintainers to reconsider this, in any case it seems like a
>> good
>> >>>>>> > > > improvement for the project.
>> >>>>>> > > >
>> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>> >>>>>> > > > [2] https://github.com/python
>> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>> >>>>>> > > > [4]
>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>> >>>>>> > > >
>> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
>> altay@google.com> wrote:
>> >>>>>> > > >>
>> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
>> autoformatters but I am not familiar with this one. My concerns are:
>> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
>> >>>>>> > > >> - It looks like mostly a single person project. For the
>> same reason I also strongly prefer not using a fork for a specific setting.
>> Fork will only have less people looking at it.
>> >>>>>> > > >>
>> >>>>>> > > >> IMO, this is in an early stage for us. That said lint
>> issues are real as pointed in the thread. If someone would like to give it
>> a try and see how it would look like for us that would be interesting.
>> >>>>>> > > >>
>> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>> ka.kucharczyk@gmail.com> wrote:
>> >>>>>> > > >>>
>> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures
>> are caused by lint problems.
>> >>>>>> > > >>> I think it would be great to have something similar to
>> Spotless in Java SDK (I heard there is problem with configuring Black with
>> IntelliJ).
>> >>>>>> > > >>>
>> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>>> > > >>>>
>> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I
>> haven't looked at
>> >>>>>> > > >>>> how well this particular one works. We might have to go
>> with
>> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
>> >>>>>> > > >>>>
>> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>> pabloem@google.com> wrote:
>> >>>>>> > > >>>>>
>> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple people
>> (myself included) who've been annoyed by having to take care of lint issues
>> that maybe a code formatter could save us.
>> >>>>>> > > >>>>> Thanks for sharing Ismael.
>> >>>>>> > > >>>>> -P.
>> >>>>>> > > >>>>>
>> >>>>>> > > >>>>>
>> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>> iemejia@gmail.com> wrote:
>> >>>>>> > > >>>>>>
>> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto
>> formatter that
>> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python,
>> and wanted to
>> >>>>>> > > >>>>>> bring to the ML Is there interest from the python
>> people to get this
>> >>>>>> > > >>>>>> into the build?
>> >>>>>> > > >>>>>>
>> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good
>> improvement and
>> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>> >>>>>> > > >>>>>>
>> >>>>>> > > >>>>>> WDYT?
>> >>>>>> > > >>>>>>
>> >>>>>> > > >>>>>> [1] https://github.com/python/black
>>
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Chad Dombrova <ch...@gmail.com>.
>
>
> It'd be good if there was a way to only apply to violating (or at
> least changed) lines.


I assumed the first thing we’d do is convert all of the code in one go,
since it’s a very safe operation. Did you have something else in mind?

-chad





>
> On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
> >
> > +1 to autoformatting
> >
> > Let me add some nuance to that.
> >
> > The way I see it there are 2 varieties of formatters:  those which take
> the original formatting into consideration (autopep8) and those which
> disregard it (yapf, black).
> >
> > I much prefer yapf to black, because you have plenty of options to tweak
> with yapf (enough to make the output a pretty close match to the current
> Beam style), and you can mark areas to preserve the original formatting,
> which could be very useful with Pipeline building with pipe operators.
> Please don't pick black.
> >
> > autopep8 is more along the lines of spotless in Java -- it only corrects
> code that breaks the project's style rules.  The big problem with Beam's
> current style is that it is so esoteric that autopep8 can't enforce it --
> and I'm not just talking about 2-spaces, which I don't really have a
> problem with -- the problem is the use of either 2 or 4 spaces depending on
> context (expression start vs hanging indent, etc).  This is my *biggest*
> gripe about the current style.  PyCharm doesn't have enough control
> either.  So, if we can choose a style that can be expressed by flake8 or
> pycodestyle then we can use autopep8 to enforce it.
> >
> > I'd prefer autopep8 to yapf because I like having a little wiggle room
> to influence the style, but on a big project like Beam all that wiggle room
> ends up to minor but noticeable inconsistencies in style throughout the
> project.  yapf ensures completely consistent style, but the tradeoff is
> that it's sometimes ugly, especially in scenarios with similar repeated
> entries like argparse, where yapf might insert line breaks in visually
> inconsistent and unappealing ways depending on the lengths of the keywords
> and expressions involved.
> >
> > Either way (but especially if we choose yapf) I think it'd be a nice
> addition to setup a pre-commit [1] config so that people can opt in to
> running *lightweight* autofixers prior to commit.  This will not only
> reduce dev frustration but will also reduce the amount of cpu cycles that
> Jenkins spends pointing out lint errors.
> >
> > [1] https://pre-commit.com/
> >
> > -chad
> >
> >
> >
> >
> > On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:
> >>
> >> Last time we discussed this there seems not to be much progress into
> autoformatting.
> >> This tool looks more tweakable, so maybe it could be more appropriate
> for Beam's use case.
> >> https://github.com/google/yapf/
> >> WDYT?
> >>
> >>
> >> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org>
> wrote:
> >>>
> >>> +1 for any autoformatter for Python SDK that does the job. My
> experience is that since spotless in Java SDK I would never start a new
> Java project without it. So many great benefits not only for one person
> coding but for all community.
> >>>
> >>> It is a GitHub UI issue that you cannot easily browse past the
> reformat. It is not actually that hard, but does take a couple extra clicks
> to get GitHub to display blame before a reformat. It is easier with the
> command line. I do a lot of code history digging and the global Java
> reformat is not really a problem.
> >>>
> >>> It's actually one more click on Github but I agree it's not the best
> way to search the history. The most convenient and clear one I've found so
> far is in Jetbrains IDEs (Intelij) where you can:
> >>>
> >>> right click on line number -> "annotate" -> click again -> "annotate
> previous revision" -> ...
> >>>
> >>> You can also use "compare with" to see the diff between two revisions.
> >>>
> >>> Łukasz
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org>
> napisał(a):
> >>>>
> >>>> +1 pending good enough tooling (I can't quite tell - seems there are
> some issues?)
> >>>>
> >>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
> ka.kucharczyk@gmail.com> wrote:
> >>>>>
> >>>>> What else actually we gain? My guess is faster PR review iteration.
> We will skip some of conversations about code style.
> >>>>
> >>>> ...
> >>>>>
> >>>>> Last but not least, new contributor may be less discouraged. When I
> started contribute I didn’t know how to format my code and I lost a lot of
> time to add pylint and adjust IntelliJ. I eventually failed. Currently I
> write code intuitively and when I don’t forget I rerun tox.
> >>>>
> >>>>
> >>>> This is a huge benefit. This is why I supported it so much for Java.
> It is a community benefit. You do not have to be a contributor to the
> Python SDK to support this. That is why I am writing here. Just eliminate
> all discussion of formatting. It doesn't really matter what the resulting
> format is, if it is not crazy to read. I strongly oppose maintaining a
> non-default format.
> >>>>
> >>>> Reformating 20k lines or 200k is not hard. The Java global reformat
> touched 50k lines. It does not really matter how big it is. Definitely do
> it all at once if you think the tool is good enough. And you should pin a
> version, so churn is not a problem. You can upgrade the version and
> reformat in a PR later and that is also easy.
> >>>>
> >>>> It is a GitHub UI issue that you cannot easily browse past the
> reformat. It is not actually that hard, but does take a couple extra clicks
> to get GitHub to display blame before a reformat. It is easier with the
> command line. I do a lot of code history digging and the global Java
> reformat is not really a problem.
> >>>>
> >>>> Kenn
> >>>>
> >>>>
> >>>>>
> >>>>> Also everything will be formatted in a same way, so eventually it
> would be easier to read.
> >>>>>
> >>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins
> failures won’t take place, so we save time and resources.
> >>>>>
> >>>>>
> >>>>> One of disadvantages is that our pipelines has custom syntax and
> after formatting they looks a little bit weird, but maybe extending the
> only configurable option in Black - lines, from 88 to 110 would be solution.
> >>>>>
> >>>>> Second one is that Black requires Python 3 to be run. I don’t know
> how big obstacle it would be.
> >>>>>
> >>>>>
> >>>>> I believe there are two options how it would be possible to
> introduce Black. First: just do it, it will hurt but then it would be ok
> (same as a dentist appointment). Of course it may require some work to
> adjust linters. On the other hand we can do it gradually and start
> including sdk parts one by one - maybe it will be less painful?
> >>>>>
> >>>>>
> >>>>> As an example I can share one of projects [2] I know that uses Black
> (they use also other cool checkers and pre-commit [3]). This is how looks
> their build with all checks [4].
> >>>>>
> >>>>>
> >>>>> To sum up I believe that if we want improve our coding experience,
> we should improve our toolset. Black seems be recent and quite popular tool
> what makes think they won’t stop developing it.
> >>>>>
> >>>>>
> >>>>> [1]
> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
> >>>>>
> >>>>>
> >>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
> >>>>>
> >>>>>
> >>>>> [3] https://pre-commit.com
> >>>>>
> >>>>>
> >>>>> [4]
> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>>>>>
> >>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would
> change nearly every single line in the codebase (and the loss of all
> context as well as that particular line).
> >>>>>>
> >>>>>> This is probably why the 2-space fork exists. However, we don't
> conform to that either--we use 2 spaces for indentation, but 4 for
> continuation indentation. (As for the history of this, this goes back to
> Google's internal style guide, probably motivated by consistency with C++,
> Java, ... and the fact that with an indent level of 4 one ends up wrapping
> lines quite frequently (it's telling that black's default line length is
> 88)). This turns out to be an easy change to the codebase.
> >>>>>>
> >>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this
> tool introduces a huge amount of vertical whitespace (e.g. closing
> parentheses on their own line), e.g.
> >>>>>>
> >>>>>> def foo(
> >>>>>>     args
> >>>>>> ):
> >>>>>>   if (
> >>>>>>       long expression)
> >>>>>>   ):
> >>>>>>     func(
> >>>>>>         args
> >>>>>>     )
> >>>>>>
> >>>>>> I wrote a simple post-processor to put closing parentheses on the
> same lines, as well as omit the newline after "if (", and disabling
> formatting of strings, which reduce the churn in our codebase to 15k lines
> (adding about 4k) out of 200k total.
> >>>>>>
> >>>>>> https://github.com/apache/beam/pull/8712/files
> >>>>>>
> >>>>>> It's still very opinionated, often in different ways then me, and
> doesn't understand the semantics of the code, but possibly something we
> could live with given the huge advantages of an autoformatter.
> >>>>>>
> >>>>>> An intermediate point would be to allow, but not require,
> autoformatting of changed lines.
> >>>>>>
> >>>>>> As for being beta quality, it looks like it's got a decent number
> of contributors and in my book being in the python github project is a
> strong positive signal. But, due to the above issues, I think we'd have to
> maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue
> is a 2-line change, and the rest implemented as a post-processing step (for
> now, incomplete), so it'd be easy to stay in sync with upstream.)
> >>>>>>
> >>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> >>>>>> >
> >>>>>> > > I think the question is if it can be configured in a way to fit
> our
> >>>>>> > > current linter's style. I don't think it is feasible to
> reformat the
> >>>>>> > > entire Python SDK.
> >>>>>> >
> >>>>>> > It cannot be configured to do what we actually do because Black is
> >>>>>> > configurable only to support the standard python codestyle
> guidelines
> >>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
> >>>>>> > python world use.
> >>>>>> >
> >>>>>> > > Reformatted lines don't allow quick access to the Git history.
> This
> >>>>>> > > effect is still visible in the Java SDK. However, I have the
> feeling
> >>>>>> > > that this might be less of a problem with Python because the
> linter has
> >>>>>> > > more rules than Checkstyle had.
> >>>>>> >
> >>>>>> > Yes that’s the bad side effect but there are always tradeoffs we
> have
> >>>>>> > to deal with.
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <
> mxm@apache.org> wrote:
> >>>>>> > >
> >>>>>> > > I think the question is if it can be configured in a way to fit
> our
> >>>>>> > > current linter's style. I don't think it is feasible to
> reformat the
> >>>>>> > > entire Python SDK.
> >>>>>> > >
> >>>>>> > > Reformatted lines don't allow quick access to the Git history.
> This
> >>>>>> > > effect is still visible in the Java SDK. However, I have the
> feeling
> >>>>>> > > that this might be less of a problem with Python because the
> linter has
> >>>>>> > > more rules than Checkstyle had.
> >>>>>> > >
> >>>>>> > > -Max
> >>>>>> > >
> >>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
> >>>>>> > > >> My concerns are:
> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
> >>>>>> > > >> - It looks like mostly a single person project. For the same
> reason I also strongly prefer not using a fork for a specific setting. Fork
> will only have less people looking at it.
> >>>>>> > > >
> >>>>>> > > > I suppose the project is marked as beta because it is recent,
> it was
> >>>>>> > > > presented in 2018’s pycon, and because some things can change
> since
> >>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in
> that case is
> >>>>>> > > > like our own ‘@Experimental’. If you look at the contribution
> page [1]
> >>>>>> > > > you can notice that it is less and less a single person
> project, there
> >>>>>> > > > have been 93 independent contributions since the project
> became
> >>>>>> > > > public, and the fact that it is hosted in the python
> organization
> >>>>>> > > > github [2] gives some confidence on the project continuity.
> >>>>>> > > >
> >>>>>> > > > You are right however about the fact that the main author
> seems to be
> >>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can
> seem
> >>>>>> > > > arbitrary, but he is just following pep8 style guide
> recommendations
> >>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
> >>>>>> > > > recommendation of PEP-8 or even Google's own Python style
> guide [4],
> >>>>>> > > > So, probably it should be to us to reconsider the current
> policy to
> >>>>>> > > > adapt to the standards (and the tool).
> >>>>>> > > >
> >>>>>> > > > I did a quick run of black with python 2.7 compatibility on
> >>>>>> > > > sdks/python and got only 4 parsing errors which is positive
> given the
> >>>>>> > > > size of our code base.
> >>>>>> > > >
> >>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files
> failed to reformat.
> >>>>>> > > >
> >>>>>> > > > error: cannot format
> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
> >>>>>> > > > Cannot parse: 47:22:   _display_progress = print
> >>>>>> > > > error: cannot format
> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
> >>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
> >>>>>> > > > error: cannot format
> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
> >>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
> file=sys.stderr)
> >>>>>> > > > error: cannot format
> >>>>>> > > >
> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
> >>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else
> '    ',
> >>>>>> > > > end=' ')
> >>>>>> > > >
> >>>>>> > > > I still think this can be positive for the project but well I
> am
> >>>>>> > > > barely a contributor to the python code base so I let you the
> python
> >>>>>> > > > maintainers to reconsider this, in any case it seems like a
> good
> >>>>>> > > > improvement for the project.
> >>>>>> > > >
> >>>>>> > > > [1] https://github.com/python/black/graphs/contributors
> >>>>>> > > > [2] https://github.com/python
> >>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
> >>>>>> > > > [4]
> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
> >>>>>> > > >
> >>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <
> altay@google.com> wrote:
> >>>>>> > > >>
> >>>>>> > > >> I am in the same boat with Robert, I am in favor of
> autoformatters but I am not familiar with this one. My concerns are:
> >>>>>> > > >> - The product is clearly marked as beta with a big warning.
> >>>>>> > > >> - It looks like mostly a single person project. For the same
> reason I also strongly prefer not using a fork for a specific setting. Fork
> will only have less people looking at it.
> >>>>>> > > >>
> >>>>>> > > >> IMO, this is in an early stage for us. That said lint issues
> are real as pointed in the thread. If someone would like to give it a try
> and see how it would look like for us that would be interesting.
> >>>>>> > > >>
> >>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
> ka.kucharczyk@gmail.com> wrote:
> >>>>>> > > >>>
> >>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are
> caused by lint problems.
> >>>>>> > > >>> I think it would be great to have something similar to
> Spotless in Java SDK (I heard there is problem with configuring Black with
> IntelliJ).
> >>>>>> > > >>>
> >>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
> robertwb@google.com> wrote:
> >>>>>> > > >>>>
> >>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't
> looked at
> >>>>>> > > >>>> how well this particular one works. We might have to go
> with
> >>>>>> > > >>>> https://github.com/desbma/black-2spaces given
> >>>>>> > > >>>> https://github.com/python/black/issues/378 .
> >>>>>> > > >>>>
> >>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
> pabloem@google.com> wrote:
> >>>>>> > > >>>>>
> >>>>>> > > >>>>> This looks pretty good:) I know at least a couple people
> (myself included) who've been annoyed by having to take care of lint issues
> that maybe a code formatter could save us.
> >>>>>> > > >>>>> Thanks for sharing Ismael.
> >>>>>> > > >>>>> -P.
> >>>>>> > > >>>>>
> >>>>>> > > >>>>>
> >>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
> iemejia@gmail.com> wrote:
> >>>>>> > > >>>>>>
> >>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto
> formatter that
> >>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python,
> and wanted to
> >>>>>> > > >>>>>> bring to the ML Is there interest from the python people
> to get this
> >>>>>> > > >>>>>> into the build?
> >>>>>> > > >>>>>>
> >>>>>> > > >>>>>> The introduction of spotless for Java has been a good
> improvement and
> >>>>>> > > >>>>>> maybe the python code base may benefit of this too.
> >>>>>> > > >>>>>>
> >>>>>> > > >>>>>> WDYT?
> >>>>>> > > >>>>>>
> >>>>>> > > >>>>>> [1] https://github.com/python/black
>

Re: [DISCUSS] Autoformat python code with Black

Posted by Robert Bradshaw <ro...@google.com>.
Huge +1 to autoformatters. Yapf solves my biggest complaint about
black, that it can't be customized (and I didn't want to take on
maintaining our own fork of black).

It'd be good if there was a way to only apply to violating (or at
least changed) lines.

On Tue, Jan 21, 2020 at 1:56 PM Chad Dombrova <ch...@gmail.com> wrote:
>
> +1 to autoformatting
>
> Let me add some nuance to that.
>
> The way I see it there are 2 varieties of formatters:  those which take the original formatting into consideration (autopep8) and those which disregard it (yapf, black).
>
> I much prefer yapf to black, because you have plenty of options to tweak with yapf (enough to make the output a pretty close match to the current Beam style), and you can mark areas to preserve the original formatting, which could be very useful with Pipeline building with pipe operators.  Please don't pick black.
>
> autopep8 is more along the lines of spotless in Java -- it only corrects code that breaks the project's style rules.  The big problem with Beam's current style is that it is so esoteric that autopep8 can't enforce it -- and I'm not just talking about 2-spaces, which I don't really have a problem with -- the problem is the use of either 2 or 4 spaces depending on context (expression start vs hanging indent, etc).  This is my *biggest* gripe about the current style.  PyCharm doesn't have enough control either.  So, if we can choose a style that can be expressed by flake8 or pycodestyle then we can use autopep8 to enforce it.
>
> I'd prefer autopep8 to yapf because I like having a little wiggle room to influence the style, but on a big project like Beam all that wiggle room ends up to minor but noticeable inconsistencies in style throughout the project.  yapf ensures completely consistent style, but the tradeoff is that it's sometimes ugly, especially in scenarios with similar repeated entries like argparse, where yapf might insert line breaks in visually inconsistent and unappealing ways depending on the lengths of the keywords and expressions involved.
>
> Either way (but especially if we choose yapf) I think it'd be a nice addition to setup a pre-commit [1] config so that people can opt in to running *lightweight* autofixers prior to commit.  This will not only reduce dev frustration but will also reduce the amount of cpu cycles that Jenkins spends pointing out lint errors.
>
> [1] https://pre-commit.com/
>
> -chad
>
>
>
>
> On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>> Last time we discussed this there seems not to be much progress into autoformatting.
>> This tool looks more tweakable, so maybe it could be more appropriate for Beam's use case.
>> https://github.com/google/yapf/
>> WDYT?
>>
>>
>> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:
>>>
>>> +1 for any autoformatter for Python SDK that does the job. My experience is that since spotless in Java SDK I would never start a new Java project without it. So many great benefits not only for one person coding but for all community.
>>>
>>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>>>
>>> It's actually one more click on Github but I agree it's not the best way to search the history. The most convenient and clear one I've found so far is in Jetbrains IDEs (Intelij) where you can:
>>>
>>> right click on line number -> "annotate" -> click again -> "annotate previous revision" -> ...
>>>
>>> You can also use "compare with" to see the diff between two revisions.
>>>
>>> Łukasz
>>>
>>>
>>>
>>>
>>>
>>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
>>>>
>>>> +1 pending good enough tooling (I can't quite tell - seems there are some issues?)
>>>>
>>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>>>
>>>>> What else actually we gain? My guess is faster PR review iteration. We will skip some of conversations about code style.
>>>>
>>>> ...
>>>>>
>>>>> Last but not least, new contributor may be less discouraged. When I started contribute I didn’t know how to format my code and I lost a lot of time to add pylint and adjust IntelliJ. I eventually failed. Currently I write code intuitively and when I don’t forget I rerun tox.
>>>>
>>>>
>>>> This is a huge benefit. This is why I supported it so much for Java. It is a community benefit. You do not have to be a contributor to the Python SDK to support this. That is why I am writing here. Just eliminate all discussion of formatting. It doesn't really matter what the resulting format is, if it is not crazy to read. I strongly oppose maintaining a non-default format.
>>>>
>>>> Reformating 20k lines or 200k is not hard. The Java global reformat touched 50k lines. It does not really matter how big it is. Definitely do it all at once if you think the tool is good enough. And you should pin a version, so churn is not a problem. You can upgrade the version and reformat in a PR later and that is also easy.
>>>>
>>>> It is a GitHub UI issue that you cannot easily browse past the reformat. It is not actually that hard, but does take a couple extra clicks to get GitHub to display blame before a reformat. It is easier with the command line. I do a lot of code history digging and the global Java reformat is not really a problem.
>>>>
>>>> Kenn
>>>>
>>>>
>>>>>
>>>>> Also everything will be formatted in a same way, so eventually it would be easier to read.
>>>>>
>>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins failures won’t take place, so we save time and resources.
>>>>>
>>>>>
>>>>> One of disadvantages is that our pipelines has custom syntax and after formatting they looks a little bit weird, but maybe extending the only configurable option in Black - lines, from 88 to 110 would be solution.
>>>>>
>>>>> Second one is that Black requires Python 3 to be run. I don’t know how big obstacle it would be.
>>>>>
>>>>>
>>>>> I believe there are two options how it would be possible to introduce Black. First: just do it, it will hurt but then it would be ok (same as a dentist appointment). Of course it may require some work to adjust linters. On the other hand we can do it gradually and start including sdk parts one by one - maybe it will be less painful?
>>>>>
>>>>>
>>>>> As an example I can share one of projects [2] I know that uses Black (they use also other cool checkers and pre-commit [3]). This is how looks their build with all checks [4].
>>>>>
>>>>>
>>>>> To sum up I believe that if we want improve our coding experience, we should improve our toolset. Black seems be recent and quite popular tool what makes think they won’t stop developing it.
>>>>>
>>>>>
>>>>> [1] https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>>
>>>>>
>>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>>
>>>>>
>>>>> [3] https://pre-commit.com
>>>>>
>>>>>
>>>>> [4] https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>
>>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change nearly every single line in the codebase (and the loss of all context as well as that particular line).
>>>>>>
>>>>>> This is probably why the 2-space fork exists. However, we don't conform to that either--we use 2 spaces for indentation, but 4 for continuation indentation. (As for the history of this, this goes back to Google's internal style guide, probably motivated by consistency with C++, Java, ... and the fact that with an indent level of 4 one ends up wrapping lines quite frequently (it's telling that black's default line length is 88)). This turns out to be an easy change to the codebase.
>>>>>>
>>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this tool introduces a huge amount of vertical whitespace (e.g. closing parentheses on their own line), e.g.
>>>>>>
>>>>>> def foo(
>>>>>>     args
>>>>>> ):
>>>>>>   if (
>>>>>>       long expression)
>>>>>>   ):
>>>>>>     func(
>>>>>>         args
>>>>>>     )
>>>>>>
>>>>>> I wrote a simple post-processor to put closing parentheses on the same lines, as well as omit the newline after "if (", and disabling formatting of strings, which reduce the churn in our codebase to 15k lines (adding about 4k) out of 200k total.
>>>>>>
>>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>>
>>>>>> It's still very opinionated, often in different ways then me, and doesn't understand the semantics of the code, but possibly something we could live with given the huge advantages of an autoformatter.
>>>>>>
>>>>>> An intermediate point would be to allow, but not require, autoformatting of changed lines.
>>>>>>
>>>>>> As for being beta quality, it looks like it's got a decent number of contributors and in my book being in the python github project is a strong positive signal. But, due to the above issues, I think we'd have to maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue is a 2-line change, and the rest implemented as a post-processing step (for now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>>
>>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>> >
>>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>>>> > > entire Python SDK.
>>>>>> >
>>>>>> > It cannot be configured to do what we actually do because Black is
>>>>>> > configurable only to support the standard python codestyle guidelines
>>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>>>> > python world use.
>>>>>> >
>>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>>>> > > that this might be less of a problem with Python because the linter has
>>>>>> > > more rules than Checkstyle had.
>>>>>> >
>>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>>>> > to deal with.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org> wrote:
>>>>>> > >
>>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>>> > > current linter's style. I don't think it is feasible to reformat the
>>>>>> > > entire Python SDK.
>>>>>> > >
>>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>>> > > effect is still visible in the Java SDK. However, I have the feeling
>>>>>> > > that this might be less of a problem with Python because the linter has
>>>>>> > > more rules than Checkstyle had.
>>>>>> > >
>>>>>> > > -Max
>>>>>> > >
>>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>>> > > >> My concerns are:
>>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>>>>>> > > >
>>>>>> > > > I suppose the project is marked as beta because it is recent, it was
>>>>>> > > > presented in 2018’s pycon, and because some things can change since
>>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that case is
>>>>>> > > > like our own ‘@Experimental’. If you look at the contribution page [1]
>>>>>> > > > you can notice that it is less and less a single person project, there
>>>>>> > > > have been 93 independent contributions since the project became
>>>>>> > > > public, and the fact that it is hosted in the python organization
>>>>>> > > > github [2] gives some confidence on the project continuity.
>>>>>> > > >
>>>>>> > > > You are right however about the fact that the main author seems to be
>>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>>>> > > > arbitrary, but he is just following pep8 style guide recommendations
>>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide [4],
>>>>>> > > > So, probably it should be to us to reconsider the current policy to
>>>>>> > > > adapt to the standards (and the tool).
>>>>>> > > >
>>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>>>> > > > sdks/python and got only 4 parsing errors which is positive given the
>>>>>> > > > size of our code base.
>>>>>> > > >
>>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed to reformat.
>>>>>> > > >
>>>>>> > > > error: cannot format
>>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>>> > > > error: cannot format
>>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>>> > > > error: cannot format
>>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>>> > > > Cannot parse: 160:34:       print(traceback_string, file=sys.stderr)
>>>>>> > > > error: cannot format
>>>>>> > > > /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '    ',
>>>>>> > > > end=' ')
>>>>>> > > >
>>>>>> > > > I still think this can be positive for the project but well I am
>>>>>> > > > barely a contributor to the python code base so I let you the python
>>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>>>> > > > improvement for the project.
>>>>>> > > >
>>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>>> > > > [2] https://github.com/python
>>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>>> > > > [4] https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>>> > > >
>>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com> wrote:
>>>>>> > > >>
>>>>>> > > >> I am in the same boat with Robert, I am in favor of autoformatters but I am not familiar with this one. My concerns are:
>>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>>> > > >> - It looks like mostly a single person project. For the same reason I also strongly prefer not using a fork for a specific setting. Fork will only have less people looking at it.
>>>>>> > > >>
>>>>>> > > >> IMO, this is in an early stage for us. That said lint issues are real as pointed in the thread. If someone would like to give it a try and see how it would look like for us that would be interesting.
>>>>>> > > >>
>>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <ka...@gmail.com> wrote:
>>>>>> > > >>>
>>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are caused by lint problems.
>>>>>> > > >>> I think it would be great to have something similar to Spotless in Java SDK (I heard there is problem with configuring Black with IntelliJ).
>>>>>> > > >>>
>>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>> > > >>>>
>>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't looked at
>>>>>> > > >>>> how well this particular one works. We might have to go with
>>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>>> > > >>>>
>>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <pa...@google.com> wrote:
>>>>>> > > >>>>>
>>>>>> > > >>>>> This looks pretty good:) I know at least a couple people (myself included) who've been annoyed by having to take care of lint issues that maybe a code formatter could save us.
>>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>>> > > >>>>> -P.
>>>>>> > > >>>>>
>>>>>> > > >>>>>
>>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>>> > > >>>>>>
>>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto formatter that
>>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and wanted to
>>>>>> > > >>>>>> bring to the ML Is there interest from the python people to get this
>>>>>> > > >>>>>> into the build?
>>>>>> > > >>>>>>
>>>>>> > > >>>>>> The introduction of spotless for Java has been a good improvement and
>>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>>> > > >>>>>>
>>>>>> > > >>>>>> WDYT?
>>>>>> > > >>>>>>
>>>>>> > > >>>>>> [1] https://github.com/python/black

Re: [DISCUSS] Autoformat python code with Black

Posted by Chad Dombrova <ch...@gmail.com>.
+1 to autoformatting

Let me add some nuance to that.

The way I see it there are 2 varieties of formatters:  those which take the
original formatting into consideration (autopep8) and those which disregard
it (yapf, black).

I much prefer yapf to black, because you have plenty of options to tweak
with yapf (enough to make the output a pretty close match to the current
Beam style), and you can mark areas to preserve the original formatting,
which could be very useful with Pipeline building with pipe operators.
Please don't pick black.

autopep8 is more along the lines of spotless in Java -- it only corrects
code that breaks the project's style rules.  The big problem with Beam's
current style is that it is so esoteric that autopep8 can't enforce it --
and I'm not just talking about 2-spaces, which I don't really have a
problem with -- the problem is the use of either 2 or 4 spaces depending on
context (expression start vs hanging indent, etc).  This is my *biggest*
gripe about the current style.  PyCharm doesn't have enough control
either.  So, if we can choose a style that can be expressed by flake8 or
pycodestyle then we can use autopep8 to enforce it.

I'd prefer autopep8 to yapf because I like having a little wiggle room to
influence the style, but on a big project like Beam all that wiggle room
ends up to minor but noticeable inconsistencies in style throughout the
project.  yapf ensures completely consistent style, but the tradeoff is
that it's sometimes ugly, especially in scenarios with similar repeated
entries like argparse, where yapf might insert line breaks in visually
inconsistent and unappealing ways depending on the lengths of the keywords
and expressions involved.

Either way (but especially if we choose yapf) I think it'd be a nice
addition to setup a pre-commit [1] config so that people can opt in to
running *lightweight* autofixers prior to commit.  This will not only
reduce dev frustration but will also reduce the amount of cpu cycles that
Jenkins spends pointing out lint errors.

[1] https://pre-commit.com/

-chad




On Tue, Jan 21, 2020 at 12:52 PM Ismaël Mejía <ie...@gmail.com> wrote:

> Last time we discussed this there seems not to be much progress into
> autoformatting.
> This tool looks more tweakable, so maybe it could be more appropriate for
> Beam's use case.
> https://github.com/google/yapf/
> WDYT?
>
>
> On Thu, May 30, 2019 at 10:50 AM Łukasz Gajowy <lg...@apache.org> wrote:
>
>> +1 for any autoformatter for Python SDK that does the job. My experience
>> is that since spotless in Java SDK I would never start a new Java project
>> without it. So many great benefits not only for one person coding but for
>> all community.
>>
>> It is a GitHub UI issue that you cannot easily browse past the reformat.
>> It is not actually that hard, but does take a couple extra clicks to get
>> GitHub to display blame before a reformat. It is easier with the command
>> line. I do a lot of code history digging and the global Java reformat is
>> not really a problem.
>>
>> It's actually one more click on Github but I agree it's not the best way
>> to search the history. The most convenient and clear one I've found so far
>> is in Jetbrains IDEs (Intelij) where you can:
>>
>> right click on line number -> "annotate" -> click again -> "annotate
>> previous revision" -> ...
>>
>> You can also use "compare with" to see the diff between two revisions.
>>
>> Łukasz
>>
>>
>>
>>
>>
>> czw., 30 maj 2019 o 06:15 Kenneth Knowles <ke...@apache.org> napisał(a):
>>
>>> +1 pending good enough tooling (I can't quite tell - seems there are
>>> some issues?)
>>>
>>> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
>>> ka.kucharczyk@gmail.com> wrote:
>>>
>>>> What else actually we gain? My guess is faster PR review iteration. We
>>>> will skip some of conversations about code style.
>>>>
>>> ...
>>>
>>>> Last but not least, new contributor may be less discouraged. When I
>>>> started contribute I didn’t know how to format my code and I lost a lot of
>>>> time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>>>> write code intuitively and when I don’t forget I rerun tox.
>>>>
>>>
>>> This is a huge benefit. This is why I supported it so much for Java. It
>>> is a community benefit. You do not have to be a contributor to the Python
>>> SDK to support this. That is why I am writing here. Just eliminate all
>>> discussion of formatting. It doesn't really matter what the resulting
>>> format is, if it is not crazy to read. I strongly oppose maintaining a
>>> non-default format.
>>>
>>> Reformating 20k lines or 200k is not hard. The Java global reformat
>>> touched 50k lines. It does not really matter how big it is. Definitely do
>>> it all at once if you think the tool is good enough. And you should pin a
>>> version, so churn is not a problem. You can upgrade the version and
>>> reformat in a PR later and that is also easy.
>>>
>>> It is a GitHub UI issue that you cannot easily browse past the reformat.
>>> It is not actually that hard, but does take a couple extra clicks to get
>>> GitHub to display blame before a reformat. It is easier with the command
>>> line. I do a lot of code history digging and the global Java reformat is
>>> not really a problem.
>>>
>>> Kenn
>>>
>>>
>>>
>>>> Also everything will be formatted in a same way, so eventually it would
>>>> be easier to read.
>>>>
>>>> Moreover, as it was mentioned in previous emails - a lot of Jenkins
>>>> failures won’t take place, so we save time and resources.
>>>>
>>>>
>>>> One of disadvantages is that our pipelines has custom syntax and after
>>>> formatting they looks a little bit weird, but maybe extending the only
>>>> configurable option in Black - lines, from 88 to 110 would be solution.
>>>>
>>>> Second one is that Black requires Python 3 to be run. I don’t know how
>>>> big obstacle it would be.
>>>>
>>>> I believe there are two options how it would be possible to introduce
>>>> Black. First: just do it, it will hurt but then it would be ok (same as a
>>>> dentist appointment). Of course it may require some work to adjust linters.
>>>> On the other hand we can do it gradually and start including sdk parts one
>>>> by one - maybe it will be less painful?
>>>>
>>>> As an example I can share one of projects [2] I know that uses Black
>>>> (they use also other cool checkers and pre-commit [3]). This is how looks
>>>> their build with all checks [4].
>>>>
>>>> To sum up I believe that if we want improve our coding experience, we
>>>> should improve our toolset. Black seems be recent and quite popular tool
>>>> what makes think they won’t stop developing it.
>>>>
>>>> [1]
>>>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>>>
>>>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>>>
>>>> [3] https://pre-commit.com
>>>>
>>>> [4]
>>>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>>>
>>>>
>>>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> Reformatting to 4 spaces seems a non-starter to me, as it would change
>>>>> nearly every single line in the codebase (and the loss of all context as
>>>>> well as that particular line).
>>>>>
>>>>> This is probably why the 2-space fork exists. However, we don't
>>>>> conform to that either--we use 2 spaces for indentation, but 4 for
>>>>> continuation indentation. (As for the history of this, this goes back to
>>>>> Google's internal style guide, probably motivated by consistency with C++,
>>>>> Java, ... and the fact that with an indent level of 4 one ends up wrapping
>>>>> lines quite frequently (it's telling that black's default line length is
>>>>> 88)). This turns out to be an easy change to the codebase.
>>>>>
>>>>> Once we move beyond the 2 vs. 4 whitespace thing, I found that this
>>>>> tool introduces a huge amount of vertical whitespace (e.g. closing
>>>>> parentheses on their own line), e.g.
>>>>>
>>>>> def foo(
>>>>>     args
>>>>> ):
>>>>>   if (
>>>>>       long expression)
>>>>>   ):
>>>>>     func(
>>>>>         args
>>>>>     )
>>>>>
>>>>> I wrote a simple post-processor to put closing parentheses on the same
>>>>> lines, as well as omit the newline after "if (", and disabling formatting
>>>>> of strings, which reduce the churn in our codebase to 15k lines (adding
>>>>> about 4k) out of 200k total.
>>>>>
>>>>> https://github.com/apache/beam/pull/8712/files
>>>>>
>>>>> It's still very opinionated, often in different ways then me, and
>>>>> doesn't understand the semantics of the code, but possibly something we
>>>>> could live with given the huge advantages of an autoformatter.
>>>>>
>>>>> An intermediate point would be to allow, but not require,
>>>>> autoformatting of changed lines.
>>>>>
>>>>> As for being beta quality, it looks like it's got a decent number of
>>>>> contributors and in my book being in the python github project is a strong
>>>>> positive signal. But, due to the above issues, I think we'd have to
>>>>> maintain a fork. (The code is pretty lightweight, the 2 vs. 4 space issue
>>>>> is a 2-line change, and the rest implemented as a post-processing step (for
>>>>> now, incomplete), so it'd be easy to stay in sync with upstream.)
>>>>>
>>>>> On Wed, May 29, 2019 at 11:03 AM Ismaël Mejía <ie...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>> > > current linter's style. I don't think it is feasible to reformat
>>>>> the
>>>>> > > entire Python SDK.
>>>>> >
>>>>> > It cannot be configured to do what we actually do because Black is
>>>>> > configurable only to support the standard python codestyle guidelines
>>>>> > (PEP-8) which recommends 4 spaces and is what most projects in the
>>>>> > python world use.
>>>>> >
>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>> > > effect is still visible in the Java SDK. However, I have the
>>>>> feeling
>>>>> > > that this might be less of a problem with Python because the
>>>>> linter has
>>>>> > > more rules than Checkstyle had.
>>>>> >
>>>>> > Yes that’s the bad side effect but there are always tradeoffs we have
>>>>> > to deal with.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Wed, May 29, 2019 at 10:52 AM Maximilian Michels <mx...@apache.org>
>>>>> wrote:
>>>>> > >
>>>>> > > I think the question is if it can be configured in a way to fit our
>>>>> > > current linter's style. I don't think it is feasible to reformat
>>>>> the
>>>>> > > entire Python SDK.
>>>>> > >
>>>>> > > Reformatted lines don't allow quick access to the Git history. This
>>>>> > > effect is still visible in the Java SDK. However, I have the
>>>>> feeling
>>>>> > > that this might be less of a problem with Python because the
>>>>> linter has
>>>>> > > more rules than Checkstyle had.
>>>>> > >
>>>>> > > -Max
>>>>> > >
>>>>> > > On 29.05.19 10:16, Ismaël Mejía wrote:
>>>>> > > >> My concerns are:
>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>> > > >> - It looks like mostly a single person project. For the same
>>>>> reason I also strongly prefer not using a fork for a specific setting. Fork
>>>>> will only have less people looking at it.
>>>>> > > >
>>>>> > > > I suppose the project is marked as beta because it is recent, it
>>>>> was
>>>>> > > > presented in 2018’s pycon, and because some things can change
>>>>> since
>>>>> > > > auto-formatters are pretty tricky beasts, I think beta in that
>>>>> case is
>>>>> > > > like our own ‘@Experimental’. If you look at the contribution
>>>>> page [1]
>>>>> > > > you can notice that it is less and less a single person project,
>>>>> there
>>>>> > > > have been 93 independent contributions since the project became
>>>>> > > > public, and the fact that it is hosted in the python organization
>>>>> > > > github [2] gives some confidence on the project continuity.
>>>>> > > >
>>>>> > > > You are right however about the fact that the main author seems
>>>>> to be
>>>>> > > > the ‘benevolent’ dictator, and in the 2-spaces issue he can seem
>>>>> > > > arbitrary, but he is just following pep8 style guide
>>>>> recommendations
>>>>> > > > [3]. I am curious of why we (Beam) do not follow the 4 spaces
>>>>> > > > recommendation of PEP-8 or even Google's own Python style guide
>>>>> [4],
>>>>> > > > So, probably it should be to us to reconsider the current policy
>>>>> to
>>>>> > > > adapt to the standards (and the tool).
>>>>> > > >
>>>>> > > > I did a quick run of black with python 2.7 compatibility on
>>>>> > > > sdks/python and got only 4 parsing errors which is positive
>>>>> given the
>>>>> > > > size of our code base.
>>>>> > > >
>>>>> > > > 415 files reformatted, 45 files left unchanged, 4 files failed
>>>>> to reformat.
>>>>> > > >
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py:
>>>>> > > > Cannot parse: 47:22:   _display_progress = print
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/log_handler.py:
>>>>> > > > Cannot parse: 151:18:               file=sys.stderr)
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py:
>>>>> > > > Cannot parse: 160:34:       print(traceback_string,
>>>>> file=sys.stderr)
>>>>> > > > error: cannot format
>>>>> > > >
>>>>> /home/ismael/upstream/beam/sdks/python/apache_beam/typehints/trivial_inference.py:
>>>>> > > > Cannot parse: 335:51:       print('-->' if pc == last_pc else '
>>>>>    ',
>>>>> > > > end=' ')
>>>>> > > >
>>>>> > > > I still think this can be positive for the project but well I am
>>>>> > > > barely a contributor to the python code base so I let you the
>>>>> python
>>>>> > > > maintainers to reconsider this, in any case it seems like a good
>>>>> > > > improvement for the project.
>>>>> > > >
>>>>> > > > [1] https://github.com/python/black/graphs/contributors
>>>>> > > > [2] https://github.com/python
>>>>> > > > [3] https://www.python.org/dev/peps/pep-0008/#indentation
>>>>> > > > [4]
>>>>> https://github.com/google/styleguide/blob/gh-pages/pyguide.md#34-indentation
>>>>> > > >
>>>>> > > > On Tue, May 28, 2019 at 11:15 PM Ahmet Altay <al...@google.com>
>>>>> wrote:
>>>>> > > >>
>>>>> > > >> I am in the same boat with Robert, I am in favor of
>>>>> autoformatters but I am not familiar with this one. My concerns are:
>>>>> > > >> - The product is clearly marked as beta with a big warning.
>>>>> > > >> - It looks like mostly a single person project. For the same
>>>>> reason I also strongly prefer not using a fork for a specific setting. Fork
>>>>> will only have less people looking at it.
>>>>> > > >>
>>>>> > > >> IMO, this is in an early stage for us. That said lint issues
>>>>> are real as pointed in the thread. If someone would like to give it a try
>>>>> and see how it would look like for us that would be interesting.
>>>>> > > >>
>>>>> > > >> On Tue, May 28, 2019 at 4:44 AM Katarzyna Kucharczyk <
>>>>> ka.kucharczyk@gmail.com> wrote:
>>>>> > > >>>
>>>>> > > >>> This sounds really good. A lot of Jenkins jobs failures are
>>>>> caused by lint problems.
>>>>> > > >>> I think it would be great to have something similar to
>>>>> Spotless in Java SDK (I heard there is problem with configuring Black with
>>>>> IntelliJ).
>>>>> > > >>>
>>>>> > > >>> On Mon, May 27, 2019 at 10:52 PM Robert Bradshaw <
>>>>> robertwb@google.com> wrote:
>>>>> > > >>>>
>>>>> > > >>>> I'm generally in favor of autoformatters, though I haven't
>>>>> looked at
>>>>> > > >>>> how well this particular one works. We might have to go with
>>>>> > > >>>> https://github.com/desbma/black-2spaces given
>>>>> > > >>>> https://github.com/python/black/issues/378 .
>>>>> > > >>>>
>>>>> > > >>>> On Mon, May 27, 2019 at 10:43 PM Pablo Estrada <
>>>>> pabloem@google.com> wrote:
>>>>> > > >>>>>
>>>>> > > >>>>> This looks pretty good:) I know at least a couple people
>>>>> (myself included) who've been annoyed by having to take care of lint issues
>>>>> that maybe a code formatter could save us.
>>>>> > > >>>>> Thanks for sharing Ismael.
>>>>> > > >>>>> -P.
>>>>> > > >>>>>
>>>>> > > >>>>>
>>>>> > > >>>>> On Mon, May 27, 2019, 12:24 PM Ismaël Mejía <
>>>>> iemejia@gmail.com> wrote:
>>>>> > > >>>>>>
>>>>> > > >>>>>> I stumbled by chance into Black [1] a python code auto
>>>>> formatter that
>>>>> > > >>>>>> is becoming the 'de-facto' auto-formatter for python, and
>>>>> wanted to
>>>>> > > >>>>>> bring to the ML Is there interest from the python people to
>>>>> get this
>>>>> > > >>>>>> into the build?
>>>>> > > >>>>>>
>>>>> > > >>>>>> The introduction of spotless for Java has been a good
>>>>> improvement and
>>>>> > > >>>>>> maybe the python code base may benefit of this too.
>>>>> > > >>>>>>
>>>>> > > >>>>>> WDYT?
>>>>> > > >>>>>>
>>>>> > > >>>>>> [1] https://github.com/python/black
>>>>>
>>>>