You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Ray Chiang <rc...@apache.org> on 2018/07/30 20:20:45 UTC

[DISCUSS] Applying scalafmt to core code

I had started on KAFKA-2423 (was Scalastyle, now Expand scalafmt to 
core).  As part of the cleanup, applying the "gradlew spotlessApply" 
command ended up affecting too many (435 out of 439) files.  Since this 
will affect every file, this sort of change does risk polluting the git 
logs.

So, I'd like to get a discussion going to find some agreement on an 
approach.  Right now, I see two categories of options:

A) Getting scalafmt working on the existing code
B) Getting all the code conforming to scalafmt requirements

For the first, I see a couple of approaches:

A1) Do the minimum change that allows scalafmt to run on all the .scala 
files
A2) Make the change so that scalafmt runs as-is (only on the streams 
code) and add a different task/options that allow running scalafmt on a 
subset of code.  (Reasons explained below)

For the second, I can think of the following options:

B1) Do one giant git commit of all cleaned code (no one seemed to like this)
B2) Do git commits one file at a time (trunk or as a branch)
B3) Do git commits one leaf subdirectory at a time (trunk or as a branch)
B4) With each pull request on all patches, run option A2) on the 
affected files

 From what I can envision, options B2 and B3 require quite a bit of 
manual work if we want to cover multiple releases.  The "cleanest" 
option I can think of looks something like:

C1) Contributor makes code modifications for their JIRA
C2) Contributor runs option A2 to also apply scalafmt to their existing code
C3) Committer does the regular review process

At some point in the future, enough cleanup could be done that the final 
cleanup can be done as a much smaller set of MINOR commits.

-Ray

Re: [DISCUSS] Applying scalafmt to core code

Posted by Colin McCabe <cm...@apache.org>.

On Wed, Aug 8, 2018, at 10:19, Ray Chiang wrote:
> By doing piecemeal formatting, I don't think we can do a "hard" 
> enforcement on using scalafmt with every PR, but by allowing the tool to 
> run on already modified files in a patch, we can slowly migrate towards 
> getting the entire code base clean.  The trade offs are pretty standard 
> (giant patch polluting "git blame" vs. slower cleanup).  This came out 
> of the discussion in KAFKA-2423, where most seemed against one giant patch.

Right, there is definitely a tradeoff there.

> 
> The benefits of pretty-printing tends to be limited, but it does open 
> the door for other linting/static analysis tools without the need to 
> turn off their particular pretty-printing features (which is in some, 
> but not all tools).

It seems like either way, we will have to customize the tools to whatever style we decide on.  In the case of findbugs, at least, turning off or modifying the style settings is extremely easy to do.  I have observed the same with the other static analysis tools I have used.  So I don't think this particular question should make a big difference to the final decision.

best,
Colin


> 
> -Ray
> 
> 
> On 20180807 11:41 AM, Colin McCabe wrote:
> > Hmm.  It would be unfortunate to make contributors include unrelated style changes in their PRs.  This would be especially hard on new contributors who might not want to make a large change.
> >
> > If we really want to do something like this, I would vote for A1.  Just do the change all at once and get it over with.
> >
> > I'm also curious what benefit we get out of making these changes.  If the code style was acceptable to the reviewers who committed the code, maybe we should leave it alone?
> >
> > best,
> > Colin
> >
> >
> > On Tue, Aug 7, 2018, at 09:41, Guozhang Wang wrote:
> >> Hello Ray,
> >>
> >> I saw on the original PR Jason (cc'ed) expressed a concern comparing
> >> scalafmt with scalastyle: the latter will throw exceptions in the build
> >> process to notify developers while the former will automatically reformat
> >> the code that developers may not be aware of. So I think maybe Jason can
> >> elaborate a bit more of your thoughts on this regard.
> >>
> >> Personally I like this idea (scalafmt). As for cherry-picking burdens,
> >> there may be always some unavoidable, and I think B4 seems less invasive
> >> and hence preferable.
> >>
> >>
> >> Guozhang
> >>
> >>
> >>
> >>
> >> On Mon, Jul 30, 2018 at 1:20 PM, Ray Chiang <rc...@apache.org> wrote:
> >>
> >>> I had started on KAFKA-2423 (was Scalastyle, now Expand scalafmt to
> >>> core).  As part of the cleanup, applying the "gradlew spotlessApply"
> >>> command ended up affecting too many (435 out of 439) files.  Since this
> >>> will affect every file, this sort of change does risk polluting the git
> >>> logs.
> >>>
> >>> So, I'd like to get a discussion going to find some agreement on an
> >>> approach.  Right now, I see two categories of options:
> >>>
> >>> A) Getting scalafmt working on the existing code
> >>> B) Getting all the code conforming to scalafmt requirements
> >>>
> >>> For the first, I see a couple of approaches:
> >>>
> >>> A1) Do the minimum change that allows scalafmt to run on all the .scala
> >>> files
> >>> A2) Make the change so that scalafmt runs as-is (only on the streams code)
> >>> and add a different task/options that allow running scalafmt on a subset of
> >>> code.  (Reasons explained below)
> >>>
> >>> For the second, I can think of the following options:
> >>>
> >>> B1) Do one giant git commit of all cleaned code (no one seemed to like
> >>> this)
> >>> B2) Do git commits one file at a time (trunk or as a branch)
> >>> B3) Do git commits one leaf subdirectory at a time (trunk or as a branch)
> >>> B4) With each pull request on all patches, run option A2) on the affected
> >>> files
> >>>
> >>>  From what I can envision, options B2 and B3 require quite a bit of manual
> >>> work if we want to cover multiple releases.  The "cleanest" option I can
> >>> think of looks something like:
> >>>
> >>> C1) Contributor makes code modifications for their JIRA
> >>> C2) Contributor runs option A2 to also apply scalafmt to their existing
> >>> code
> >>> C3) Committer does the regular review process
> >>>
> >>> At some point in the future, enough cleanup could be done that the final
> >>> cleanup can be done as a much smaller set of MINOR commits.
> >>>
> >>> -Ray
> >>>
> >>>
> >>
> >> -- 
> >> -- Guozhang
>

Re: [DISCUSS] Applying scalafmt to core code

Posted by Ray Chiang <rc...@apache.org>.

By doing piecemeal formatting, I don't think we can do a "hard" 
enforcement on using scalafmt with every PR, but by allowing the tool to 
run on already modified files in a patch, we can slowly migrate towards 
getting the entire code base clean.  The trade offs are pretty standard 
(giant patch polluting "git blame" vs. slower cleanup).  This came out 
of the discussion in KAFKA-2423, where most seemed against one giant patch.

The benefits of pretty-printing tends to be limited, but it does open 
the door for other linting/static analysis tools without the need to 
turn off their particular pretty-printing features (which is in some, 
but not all tools).

-Ray


On 20180807 11:41 AM, Colin McCabe wrote:
> Hmm.  It would be unfortunate to make contributors include unrelated style changes in their PRs.  This would be especially hard on new contributors who might not want to make a large change.
>
> If we really want to do something like this, I would vote for A1.  Just do the change all at once and get it over with.
>
> I'm also curious what benefit we get out of making these changes.  If the code style was acceptable to the reviewers who committed the code, maybe we should leave it alone?
>
> best,
> Colin
>
>
> On Tue, Aug 7, 2018, at 09:41, Guozhang Wang wrote:
>> Hello Ray,
>>
>> I saw on the original PR Jason (cc'ed) expressed a concern comparing
>> scalafmt with scalastyle: the latter will throw exceptions in the build
>> process to notify developers while the former will automatically reformat
>> the code that developers may not be aware of. So I think maybe Jason can
>> elaborate a bit more of your thoughts on this regard.
>>
>> Personally I like this idea (scalafmt). As for cherry-picking burdens,
>> there may be always some unavoidable, and I think B4 seems less invasive
>> and hence preferable.
>>
>>
>> Guozhang
>>
>>
>>
>>
>> On Mon, Jul 30, 2018 at 1:20 PM, Ray Chiang <rc...@apache.org> wrote:
>>
>>> I had started on KAFKA-2423 (was Scalastyle, now Expand scalafmt to
>>> core).  As part of the cleanup, applying the "gradlew spotlessApply"
>>> command ended up affecting too many (435 out of 439) files.  Since this
>>> will affect every file, this sort of change does risk polluting the git
>>> logs.
>>>
>>> So, I'd like to get a discussion going to find some agreement on an
>>> approach.  Right now, I see two categories of options:
>>>
>>> A) Getting scalafmt working on the existing code
>>> B) Getting all the code conforming to scalafmt requirements
>>>
>>> For the first, I see a couple of approaches:
>>>
>>> A1) Do the minimum change that allows scalafmt to run on all the .scala
>>> files
>>> A2) Make the change so that scalafmt runs as-is (only on the streams code)
>>> and add a different task/options that allow running scalafmt on a subset of
>>> code.  (Reasons explained below)
>>>
>>> For the second, I can think of the following options:
>>>
>>> B1) Do one giant git commit of all cleaned code (no one seemed to like
>>> this)
>>> B2) Do git commits one file at a time (trunk or as a branch)
>>> B3) Do git commits one leaf subdirectory at a time (trunk or as a branch)
>>> B4) With each pull request on all patches, run option A2) on the affected
>>> files
>>>
>>>  From what I can envision, options B2 and B3 require quite a bit of manual
>>> work if we want to cover multiple releases.  The "cleanest" option I can
>>> think of looks something like:
>>>
>>> C1) Contributor makes code modifications for their JIRA
>>> C2) Contributor runs option A2 to also apply scalafmt to their existing
>>> code
>>> C3) Committer does the regular review process
>>>
>>> At some point in the future, enough cleanup could be done that the final
>>> cleanup can be done as a much smaller set of MINOR commits.
>>>
>>> -Ray
>>>
>>>
>>
>> -- 
>> -- Guozhang

Re: [DISCUSS] Applying scalafmt to core code

Posted by Colin McCabe <cm...@apache.org>.

Hmm.  It would be unfortunate to make contributors include unrelated style changes in their PRs.  This would be especially hard on new contributors who might not want to make a large change.

If we really want to do something like this, I would vote for A1.  Just do the change all at once and get it over with.

I'm also curious what benefit we get out of making these changes.  If the code style was acceptable to the reviewers who committed the code, maybe we should leave it alone?

best,
Colin


On Tue, Aug 7, 2018, at 09:41, Guozhang Wang wrote:
> Hello Ray,
> 
> I saw on the original PR Jason (cc'ed) expressed a concern comparing
> scalafmt with scalastyle: the latter will throw exceptions in the build
> process to notify developers while the former will automatically reformat
> the code that developers may not be aware of. So I think maybe Jason can
> elaborate a bit more of your thoughts on this regard.
> 
> Personally I like this idea (scalafmt). As for cherry-picking burdens,
> there may be always some unavoidable, and I think B4 seems less invasive
> and hence preferable.
> 
> 
> Guozhang
> 
> 
> 
> 
> On Mon, Jul 30, 2018 at 1:20 PM, Ray Chiang <rc...@apache.org> wrote:
> 
> > I had started on KAFKA-2423 (was Scalastyle, now Expand scalafmt to
> > core).  As part of the cleanup, applying the "gradlew spotlessApply"
> > command ended up affecting too many (435 out of 439) files.  Since this
> > will affect every file, this sort of change does risk polluting the git
> > logs.
> >
> > So, I'd like to get a discussion going to find some agreement on an
> > approach.  Right now, I see two categories of options:
> >
> > A) Getting scalafmt working on the existing code
> > B) Getting all the code conforming to scalafmt requirements
> >
> > For the first, I see a couple of approaches:
> >
> > A1) Do the minimum change that allows scalafmt to run on all the .scala
> > files
> > A2) Make the change so that scalafmt runs as-is (only on the streams code)
> > and add a different task/options that allow running scalafmt on a subset of
> > code.  (Reasons explained below)
> >
> > For the second, I can think of the following options:
> >
> > B1) Do one giant git commit of all cleaned code (no one seemed to like
> > this)
> > B2) Do git commits one file at a time (trunk or as a branch)
> > B3) Do git commits one leaf subdirectory at a time (trunk or as a branch)
> > B4) With each pull request on all patches, run option A2) on the affected
> > files
> >
> > From what I can envision, options B2 and B3 require quite a bit of manual
> > work if we want to cover multiple releases.  The "cleanest" option I can
> > think of looks something like:
> >
> > C1) Contributor makes code modifications for their JIRA
> > C2) Contributor runs option A2 to also apply scalafmt to their existing
> > code
> > C3) Committer does the regular review process
> >
> > At some point in the future, enough cleanup could be done that the final
> > cleanup can be done as a much smaller set of MINOR commits.
> >
> > -Ray
> >
> >
> 
> 
> -- 
> -- Guozhang

Re: [DISCUSS] Applying scalafmt to core code

Posted by Guozhang Wang <wa...@gmail.com>.

Hello Ray,

I saw on the original PR Jason (cc'ed) expressed a concern comparing
scalafmt with scalastyle: the latter will throw exceptions in the build
process to notify developers while the former will automatically reformat
the code that developers may not be aware of. So I think maybe Jason can
elaborate a bit more of your thoughts on this regard.

Personally I like this idea (scalafmt). As for cherry-picking burdens,
there may be always some unavoidable, and I think B4 seems less invasive
and hence preferable.


Guozhang




On Mon, Jul 30, 2018 at 1:20 PM, Ray Chiang <rc...@apache.org> wrote:

> I had started on KAFKA-2423 (was Scalastyle, now Expand scalafmt to
> core).  As part of the cleanup, applying the "gradlew spotlessApply"
> command ended up affecting too many (435 out of 439) files.  Since this
> will affect every file, this sort of change does risk polluting the git
> logs.
>
> So, I'd like to get a discussion going to find some agreement on an
> approach.  Right now, I see two categories of options:
>
> A) Getting scalafmt working on the existing code
> B) Getting all the code conforming to scalafmt requirements
>
> For the first, I see a couple of approaches:
>
> A1) Do the minimum change that allows scalafmt to run on all the .scala
> files
> A2) Make the change so that scalafmt runs as-is (only on the streams code)
> and add a different task/options that allow running scalafmt on a subset of
> code.  (Reasons explained below)
>
> For the second, I can think of the following options:
>
> B1) Do one giant git commit of all cleaned code (no one seemed to like
> this)
> B2) Do git commits one file at a time (trunk or as a branch)
> B3) Do git commits one leaf subdirectory at a time (trunk or as a branch)
> B4) With each pull request on all patches, run option A2) on the affected
> files
>
> From what I can envision, options B2 and B3 require quite a bit of manual
> work if we want to cover multiple releases.  The "cleanest" option I can
> think of looks something like:
>
> C1) Contributor makes code modifications for their JIRA
> C2) Contributor runs option A2 to also apply scalafmt to their existing
> code
> C3) Committer does the regular review process
>
> At some point in the future, enough cleanup could be done that the final
> cleanup can be done as a much smaller set of MINOR commits.
>
> -Ray
>
>


-- 
-- Guozhang