You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Josh Rosen <ro...@gmail.com> on 2014/10/13 06:37:04 UTC

Scalastyle improvements / large code reformatting

There are a number of open pull requests that aim to extend Spark’s automated style checks (see https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella JIRA).  These fixes are mostly good, but I have some concerns about merging these patches.  Several of these patches make large reformatting changes in nearly every file of Spark, which makes it more difficult to use `git blame` and has the potential to introduce merge conflicts with all open PRs and all backport patches.

I feel that most of the value of automated style-checks comes from allowing reviewers/committers to focus on the technical content of pull requests rather than their formatting.  My concern is that the convenience added by these new style rules will not outweigh the other overheads that these reformatting patches will create for the committers.

If possible, it would be great if we could extend the style checker to enforce the more stringent rules only for new code additions / deletions.  If not, I don’t think that we should proceed with the reformatting.  Others might disagree, though, so I welcome comments / discussion.

- Josh

Re: Scalastyle improvements / large code reformatting

Posted by Erik Erlandson <ej...@redhat.com>.

----- Original Message -----
> I'm also against these huge reformattings. They slow down development and
> backporting for trivial reasons. Let's not do that at this point, the style
> of the current code is quite consistent and we have plenty of other things
> to worry about. Instead, what you can do is as you edit a file when you're
> working on a feature, fix up style issues you see. Or, as Josh suggested,
> some way to make this apply only to new files would help.


+1, the benefit/cost ratio of wide-spread code churn just to alter formatting is close to zero.



> 
> Matei
> 
> On Oct 12, 2014, at 10:16 PM, Patrick Wendell <pw...@gmail.com> wrote:
> 
> > Another big problem with these patches are that they make it almost
> > impossible to backport changes to older branches cleanly (there
> > becomes like 100% chance of a merge conflict).
> > 
> > One proposal is to do this:
> > 1. We only consider new style rules at the end of a release cycle,
> > when there is the smallest chance of wanting to backport stuff.
> > 2. We require that they are submitted in individual patches with a (a)
> > new style rule and (b) the associated changes. Then we can also
> > evaluate on a case-by-case basis how large the change is for each
> > rule. For rules that require sweeping changes across the codebase,
> > personally I'd vote against them. For rules like import ordering that
> > won't cause too much pain on the diff (it's pretty easy to deal with
> > those conflicts) I'd be okay with it.
> > 
> > If we went with this, we'd also have to warn people that we might not
> > accept new style rules if they are too costly to enforce. I'm guessing
> > people will still contribute even with those expectations.
> > 
> > - Patrick
> > 
> > On Sun, Oct 12, 2014 at 9:39 PM, Reynold Xin <rx...@databricks.com> wrote:
> >> I actually think we should just take the bite and follow through with the
> >> reformatting. Many rules are simply not possible to enforce only on deltas
> >> (e.g. import ordering).
> >> 
> >> That said, maybe there are better windows to do this, e.g. during the QA
> >> period.
> >> 
> >> On Sun, Oct 12, 2014 at 9:37 PM, Josh Rosen <ro...@gmail.com> wrote:
> >> 
> >>> There are a number of open pull requests that aim to extend Spark's
> >>> automated style checks (see
> >>> https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella JIRA).
> >>> These fixes are mostly good, but I have some concerns about merging these
> >>> patches.  Several of these patches make large reformatting changes in
> >>> nearly every file of Spark, which makes it more difficult to use `git
> >>> blame` and has the potential to introduce merge conflicts with all open
> >>> PRs
> >>> and all backport patches.
> >>> 
> >>> I feel that most of the value of automated style-checks comes from
> >>> allowing reviewers/committers to focus on the technical content of pull
> >>> requests rather than their formatting.  My concern is that the
> >>> convenience
> >>> added by these new style rules will not outweigh the other overheads that
> >>> these reformatting patches will create for the committers.
> >>> 
> >>> If possible, it would be great if we could extend the style checker to
> >>> enforce the more stringent rules only for new code additions / deletions.
> >>> If not, I don't think that we should proceed with the reformatting.
> >>> Others
> >>> might disagree, though, so I welcome comments / discussion.
> >>> 
> >>> - Josh
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scalastyle improvements / large code reformatting

Posted by Nicholas Chammas <ni...@gmail.com>.
On Mon, Oct 13, 2014 at 11:57 AM, Patrick Wendell <pw...@gmail.com>
wrote:

> That would even work for imports as well,
> you'd just have a thing where if anyone modified some imports they
> would have to fix all the imports in that file. It's at least worth a
> try.
>

OK, that sounds like a fair compromise. I've updated the description on
SPARK-3849 <https://issues.apache.org/jira/browse/SPARK-3849> accordingly.

Nick

Re: Scalastyle improvements / large code reformatting

Posted by Marcelo Vanzin <va...@cloudera.com>.
Another option is to add new style rules that trigger too many errors
as warnings, and slowly clean them up. This means that reviewers will
be burdened with manually enforcing the rules for a while, and we need
to remember to turn them to errors once some threshold is reached.

(The Hadoop build had this check where it compared the number of
warnings against the last known good build, and yelled at you if you
added any new warnings. That might be something to look at also,
although not sure how easy it would be and I seem to remember it
having false positives.)

On Mon, Oct 13, 2014 at 8:57 AM, Patrick Wendell <pw...@gmail.com> wrote:
> Hey Nick,
>
> I think the best solution is really to find a way to only apply
> certain rules to code modified after a certain date. I also don't
> think it would be that hard to implement because git can output
> per-line information about modification times. So you'd just run the
> scalastyle rules and then if you saw errors from rules with a special
> "if modified since" property, we'd only fail the line has been
> modified after that date. That would even work for imports as well,
> you'd just have a thing where if anyone modified some imports they
> would have to fix all the imports in that file. It's at least worth a
> try.
>
> Overall I think that's the only real solution here. Doing things
> closer to releases reduces the overhead of backporting, but overall
> it's still going to be a very high overhead.
>
> - Patrick
>
> On Mon, Oct 13, 2014 at 5:46 AM, Nicholas Chammas
> <ni...@gmail.com> wrote:
>> The arguments against large scale refactorings make sense. Doing them, if at
>> all, during QA cycles or around releases sounds like a promising idea.
>>
>> Coupled with that, would it be useful to implement new rules outside of
>> these potential windows for refactoring in such a way that they report on
>> style errors without failing the build?
>>
>> That way they work like a nag and encourage developers to fix style problems
>> in the course of working on their original patch. Coupled with a clear
>> policy to fix style only where code is being changed anyway, this could be a
>> helpful way to steadily fix problems related to new rules over time. Then,
>> when we get close to the "finish line", we can make a final patch to fix the
>> remaining issues for a given rule and enforce it as part of the build.
>> Having the style report should also make it easier for committers to review
>> style. We will have to do some work to show reports on new rules in a
>> digestible way, as they will probably be very large at first, but I think
>> that's a tractable problem.
>>
>> What do committers/reviewers think of that?
>>
>> As an aside, enforcing new style rules for new files only is an interesting
>> idea, but you'd have to track that a file was added after the new rules were
>> enforced. Otherwise bad style will be allowed after the initial checkin of
>> that file. Also, enforcing style rules on changes may work in some (e.g.
>> space required  before "{") but is impossible in other (import ordering)
>> cases, as Reynold pointed out.
>>
>> Nick
>>
>>
>> 2014년 10월 13일 월요일, Matei Zaharia<ma...@gmail.com>님이 작성한 메시지:
>>
>>> I'm also against these huge reformattings. They slow down development and
>>> backporting for trivial reasons. Let's not do that at this point, the style
>>> of the current code is quite consistent and we have plenty of other things
>>> to worry about. Instead, what you can do is as you edit a file when you're
>>> working on a feature, fix up style issues you see. Or, as Josh suggested,
>>> some way to make this apply only to new files would help.
>>>
>>> Matei
>>>
>>> On Oct 12, 2014, at 10:16 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>>
>>> > Another big problem with these patches are that they make it almost
>>> > impossible to backport changes to older branches cleanly (there
>>> > becomes like 100% chance of a merge conflict).
>>> >
>>> > One proposal is to do this:
>>> > 1. We only consider new style rules at the end of a release cycle,
>>> > when there is the smallest chance of wanting to backport stuff.
>>> > 2. We require that they are submitted in individual patches with a (a)
>>> > new style rule and (b) the associated changes. Then we can also
>>> > evaluate on a case-by-case basis how large the change is for each
>>> > rule. For rules that require sweeping changes across the codebase,
>>> > personally I'd vote against them. For rules like import ordering that
>>> > won't cause too much pain on the diff (it's pretty easy to deal with
>>> > those conflicts) I'd be okay with it.
>>> >
>>> > If we went with this, we'd also have to warn people that we might not
>>> > accept new style rules if they are too costly to enforce. I'm guessing
>>> > people will still contribute even with those expectations.
>>> >
>>> > - Patrick
>>> >
>>> > On Sun, Oct 12, 2014 at 9:39 PM, Reynold Xin <rx...@databricks.com>
>>> > wrote:
>>> >> I actually think we should just take the bite and follow through with
>>> >> the
>>> >> reformatting. Many rules are simply not possible to enforce only on
>>> >> deltas
>>> >> (e.g. import ordering).
>>> >>
>>> >> That said, maybe there are better windows to do this, e.g. during the
>>> >> QA
>>> >> period.
>>> >>
>>> >> On Sun, Oct 12, 2014 at 9:37 PM, Josh Rosen <ro...@gmail.com>
>>> >> wrote:
>>> >>
>>> >>> There are a number of open pull requests that aim to extend Spark's
>>> >>> automated style checks (see
>>> >>> https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella
>>> >>> JIRA).
>>> >>> These fixes are mostly good, but I have some concerns about merging
>>> >>> these
>>> >>> patches.  Several of these patches make large reformatting changes in
>>> >>> nearly every file of Spark, which makes it more difficult to use `git
>>> >>> blame` and has the potential to introduce merge conflicts with all
>>> >>> open PRs
>>> >>> and all backport patches.
>>> >>>
>>> >>> I feel that most of the value of automated style-checks comes from
>>> >>> allowing reviewers/committers to focus on the technical content of
>>> >>> pull
>>> >>> requests rather than their formatting.  My concern is that the
>>> >>> convenience
>>> >>> added by these new style rules will not outweigh the other overheads
>>> >>> that
>>> >>> these reformatting patches will create for the committers.
>>> >>>
>>> >>> If possible, it would be great if we could extend the style checker to
>>> >>> enforce the more stringent rules only for new code additions /
>>> >>> deletions.
>>> >>> If not, I don't think that we should proceed with the reformatting.
>>> >>> Others
>>> >>> might disagree, though, so I welcome comments / discussion.
>>> >>>
>>> >>> - Josh
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> > For additional commands, e-mail: dev-help@spark.apache.org
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scalastyle improvements / large code reformatting

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Nick,

I think the best solution is really to find a way to only apply
certain rules to code modified after a certain date. I also don't
think it would be that hard to implement because git can output
per-line information about modification times. So you'd just run the
scalastyle rules and then if you saw errors from rules with a special
"if modified since" property, we'd only fail the line has been
modified after that date. That would even work for imports as well,
you'd just have a thing where if anyone modified some imports they
would have to fix all the imports in that file. It's at least worth a
try.

Overall I think that's the only real solution here. Doing things
closer to releases reduces the overhead of backporting, but overall
it's still going to be a very high overhead.

- Patrick

On Mon, Oct 13, 2014 at 5:46 AM, Nicholas Chammas
<ni...@gmail.com> wrote:
> The arguments against large scale refactorings make sense. Doing them, if at
> all, during QA cycles or around releases sounds like a promising idea.
>
> Coupled with that, would it be useful to implement new rules outside of
> these potential windows for refactoring in such a way that they report on
> style errors without failing the build?
>
> That way they work like a nag and encourage developers to fix style problems
> in the course of working on their original patch. Coupled with a clear
> policy to fix style only where code is being changed anyway, this could be a
> helpful way to steadily fix problems related to new rules over time. Then,
> when we get close to the "finish line", we can make a final patch to fix the
> remaining issues for a given rule and enforce it as part of the build.
> Having the style report should also make it easier for committers to review
> style. We will have to do some work to show reports on new rules in a
> digestible way, as they will probably be very large at first, but I think
> that's a tractable problem.
>
> What do committers/reviewers think of that?
>
> As an aside, enforcing new style rules for new files only is an interesting
> idea, but you'd have to track that a file was added after the new rules were
> enforced. Otherwise bad style will be allowed after the initial checkin of
> that file. Also, enforcing style rules on changes may work in some (e.g.
> space required  before "{") but is impossible in other (import ordering)
> cases, as Reynold pointed out.
>
> Nick
>
>
> 2014년 10월 13일 월요일, Matei Zaharia<ma...@gmail.com>님이 작성한 메시지:
>
>> I'm also against these huge reformattings. They slow down development and
>> backporting for trivial reasons. Let's not do that at this point, the style
>> of the current code is quite consistent and we have plenty of other things
>> to worry about. Instead, what you can do is as you edit a file when you're
>> working on a feature, fix up style issues you see. Or, as Josh suggested,
>> some way to make this apply only to new files would help.
>>
>> Matei
>>
>> On Oct 12, 2014, at 10:16 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>
>> > Another big problem with these patches are that they make it almost
>> > impossible to backport changes to older branches cleanly (there
>> > becomes like 100% chance of a merge conflict).
>> >
>> > One proposal is to do this:
>> > 1. We only consider new style rules at the end of a release cycle,
>> > when there is the smallest chance of wanting to backport stuff.
>> > 2. We require that they are submitted in individual patches with a (a)
>> > new style rule and (b) the associated changes. Then we can also
>> > evaluate on a case-by-case basis how large the change is for each
>> > rule. For rules that require sweeping changes across the codebase,
>> > personally I'd vote against them. For rules like import ordering that
>> > won't cause too much pain on the diff (it's pretty easy to deal with
>> > those conflicts) I'd be okay with it.
>> >
>> > If we went with this, we'd also have to warn people that we might not
>> > accept new style rules if they are too costly to enforce. I'm guessing
>> > people will still contribute even with those expectations.
>> >
>> > - Patrick
>> >
>> > On Sun, Oct 12, 2014 at 9:39 PM, Reynold Xin <rx...@databricks.com>
>> > wrote:
>> >> I actually think we should just take the bite and follow through with
>> >> the
>> >> reformatting. Many rules are simply not possible to enforce only on
>> >> deltas
>> >> (e.g. import ordering).
>> >>
>> >> That said, maybe there are better windows to do this, e.g. during the
>> >> QA
>> >> period.
>> >>
>> >> On Sun, Oct 12, 2014 at 9:37 PM, Josh Rosen <ro...@gmail.com>
>> >> wrote:
>> >>
>> >>> There are a number of open pull requests that aim to extend Spark's
>> >>> automated style checks (see
>> >>> https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella
>> >>> JIRA).
>> >>> These fixes are mostly good, but I have some concerns about merging
>> >>> these
>> >>> patches.  Several of these patches make large reformatting changes in
>> >>> nearly every file of Spark, which makes it more difficult to use `git
>> >>> blame` and has the potential to introduce merge conflicts with all
>> >>> open PRs
>> >>> and all backport patches.
>> >>>
>> >>> I feel that most of the value of automated style-checks comes from
>> >>> allowing reviewers/committers to focus on the technical content of
>> >>> pull
>> >>> requests rather than their formatting.  My concern is that the
>> >>> convenience
>> >>> added by these new style rules will not outweigh the other overheads
>> >>> that
>> >>> these reformatting patches will create for the committers.
>> >>>
>> >>> If possible, it would be great if we could extend the style checker to
>> >>> enforce the more stringent rules only for new code additions /
>> >>> deletions.
>> >>> If not, I don't think that we should proceed with the reformatting.
>> >>> Others
>> >>> might disagree, though, so I welcome comments / discussion.
>> >>>
>> >>> - Josh
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scalastyle improvements / large code reformatting

Posted by Nicholas Chammas <ni...@gmail.com>.
The arguments against large scale refactorings make sense. Doing them, if
at all, during QA cycles or around releases sounds like a promising idea.

Coupled with that, would it be useful to implement new rules outside of
these potential windows for refactoring in such a way that they report on
style errors without failing the build?

That way they work like a nag and encourage developers to fix style
problems in the course of working on their original patch. Coupled with a
clear policy to fix style only where code is being changed anyway, this
could be a helpful way to steadily fix problems related to new rules over
time. Then, when we get close to the "finish line", we can make a final
patch to fix the remaining issues for a given rule and enforce it as part
of the build. Having the style report should also make it easier for
committers to review style. We will have to do some work to show reports on
new rules in a digestible way, as they will probably be very large at
first, but I think that's a tractable problem.

What do committers/reviewers think of that?

As an aside, enforcing new style rules for new files only is an interesting
idea, but you'd have to track that a file was added after the new rules
were enforced. Otherwise bad style will be allowed after the initial
checkin of that file. Also, enforcing style rules on changes may work in
some (e.g. space required  before "{") but is impossible in other (import
ordering) cases, as Reynold pointed out.

Nick


2014년 10월 13일 월요일, Matei Zaharia<ma...@gmail.com>님이 작성한 메시지:

> I'm also against these huge reformattings. They slow down development and
> backporting for trivial reasons. Let's not do that at this point, the style
> of the current code is quite consistent and we have plenty of other things
> to worry about. Instead, what you can do is as you edit a file when you're
> working on a feature, fix up style issues you see. Or, as Josh suggested,
> some way to make this apply only to new files would help.
>
> Matei
>
> On Oct 12, 2014, at 10:16 PM, Patrick Wendell <pwendell@gmail.com
> <javascript:;>> wrote:
>
> > Another big problem with these patches are that they make it almost
> > impossible to backport changes to older branches cleanly (there
> > becomes like 100% chance of a merge conflict).
> >
> > One proposal is to do this:
> > 1. We only consider new style rules at the end of a release cycle,
> > when there is the smallest chance of wanting to backport stuff.
> > 2. We require that they are submitted in individual patches with a (a)
> > new style rule and (b) the associated changes. Then we can also
> > evaluate on a case-by-case basis how large the change is for each
> > rule. For rules that require sweeping changes across the codebase,
> > personally I'd vote against them. For rules like import ordering that
> > won't cause too much pain on the diff (it's pretty easy to deal with
> > those conflicts) I'd be okay with it.
> >
> > If we went with this, we'd also have to warn people that we might not
> > accept new style rules if they are too costly to enforce. I'm guessing
> > people will still contribute even with those expectations.
> >
> > - Patrick
> >
> > On Sun, Oct 12, 2014 at 9:39 PM, Reynold Xin <rxin@databricks.com
> <javascript:;>> wrote:
> >> I actually think we should just take the bite and follow through with
> the
> >> reformatting. Many rules are simply not possible to enforce only on
> deltas
> >> (e.g. import ordering).
> >>
> >> That said, maybe there are better windows to do this, e.g. during the QA
> >> period.
> >>
> >> On Sun, Oct 12, 2014 at 9:37 PM, Josh Rosen <rosenville@gmail.com
> <javascript:;>> wrote:
> >>
> >>> There are a number of open pull requests that aim to extend Spark's
> >>> automated style checks (see
> >>> https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella
> JIRA).
> >>> These fixes are mostly good, but I have some concerns about merging
> these
> >>> patches.  Several of these patches make large reformatting changes in
> >>> nearly every file of Spark, which makes it more difficult to use `git
> >>> blame` and has the potential to introduce merge conflicts with all
> open PRs
> >>> and all backport patches.
> >>>
> >>> I feel that most of the value of automated style-checks comes from
> >>> allowing reviewers/committers to focus on the technical content of pull
> >>> requests rather than their formatting.  My concern is that the
> convenience
> >>> added by these new style rules will not outweigh the other overheads
> that
> >>> these reformatting patches will create for the committers.
> >>>
> >>> If possible, it would be great if we could extend the style checker to
> >>> enforce the more stringent rules only for new code additions /
> deletions.
> >>> If not, I don't think that we should proceed with the reformatting.
> Others
> >>> might disagree, though, so I welcome comments / discussion.
> >>>
> >>> - Josh
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <javascript:;>
> > For additional commands, e-mail: dev-help@spark.apache.org
> <javascript:;>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <javascript:;>
> For additional commands, e-mail: dev-help@spark.apache.org <javascript:;>
>
>

Re: Scalastyle improvements / large code reformatting

Posted by Matei Zaharia <ma...@gmail.com>.
I'm also against these huge reformattings. They slow down development and backporting for trivial reasons. Let's not do that at this point, the style of the current code is quite consistent and we have plenty of other things to worry about. Instead, what you can do is as you edit a file when you're working on a feature, fix up style issues you see. Or, as Josh suggested, some way to make this apply only to new files would help.

Matei

On Oct 12, 2014, at 10:16 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Another big problem with these patches are that they make it almost
> impossible to backport changes to older branches cleanly (there
> becomes like 100% chance of a merge conflict).
> 
> One proposal is to do this:
> 1. We only consider new style rules at the end of a release cycle,
> when there is the smallest chance of wanting to backport stuff.
> 2. We require that they are submitted in individual patches with a (a)
> new style rule and (b) the associated changes. Then we can also
> evaluate on a case-by-case basis how large the change is for each
> rule. For rules that require sweeping changes across the codebase,
> personally I'd vote against them. For rules like import ordering that
> won't cause too much pain on the diff (it's pretty easy to deal with
> those conflicts) I'd be okay with it.
> 
> If we went with this, we'd also have to warn people that we might not
> accept new style rules if they are too costly to enforce. I'm guessing
> people will still contribute even with those expectations.
> 
> - Patrick
> 
> On Sun, Oct 12, 2014 at 9:39 PM, Reynold Xin <rx...@databricks.com> wrote:
>> I actually think we should just take the bite and follow through with the
>> reformatting. Many rules are simply not possible to enforce only on deltas
>> (e.g. import ordering).
>> 
>> That said, maybe there are better windows to do this, e.g. during the QA
>> period.
>> 
>> On Sun, Oct 12, 2014 at 9:37 PM, Josh Rosen <ro...@gmail.com> wrote:
>> 
>>> There are a number of open pull requests that aim to extend Spark's
>>> automated style checks (see
>>> https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella JIRA).
>>> These fixes are mostly good, but I have some concerns about merging these
>>> patches.  Several of these patches make large reformatting changes in
>>> nearly every file of Spark, which makes it more difficult to use `git
>>> blame` and has the potential to introduce merge conflicts with all open PRs
>>> and all backport patches.
>>> 
>>> I feel that most of the value of automated style-checks comes from
>>> allowing reviewers/committers to focus on the technical content of pull
>>> requests rather than their formatting.  My concern is that the convenience
>>> added by these new style rules will not outweigh the other overheads that
>>> these reformatting patches will create for the committers.
>>> 
>>> If possible, it would be great if we could extend the style checker to
>>> enforce the more stringent rules only for new code additions / deletions.
>>> If not, I don't think that we should proceed with the reformatting.  Others
>>> might disagree, though, so I welcome comments / discussion.
>>> 
>>> - Josh
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scalastyle improvements / large code reformatting

Posted by Patrick Wendell <pw...@gmail.com>.
Another big problem with these patches are that they make it almost
impossible to backport changes to older branches cleanly (there
becomes like 100% chance of a merge conflict).

One proposal is to do this:
1. We only consider new style rules at the end of a release cycle,
when there is the smallest chance of wanting to backport stuff.
2. We require that they are submitted in individual patches with a (a)
new style rule and (b) the associated changes. Then we can also
evaluate on a case-by-case basis how large the change is for each
rule. For rules that require sweeping changes across the codebase,
personally I'd vote against them. For rules like import ordering that
won't cause too much pain on the diff (it's pretty easy to deal with
those conflicts) I'd be okay with it.

If we went with this, we'd also have to warn people that we might not
accept new style rules if they are too costly to enforce. I'm guessing
people will still contribute even with those expectations.

- Patrick

On Sun, Oct 12, 2014 at 9:39 PM, Reynold Xin <rx...@databricks.com> wrote:
> I actually think we should just take the bite and follow through with the
> reformatting. Many rules are simply not possible to enforce only on deltas
> (e.g. import ordering).
>
> That said, maybe there are better windows to do this, e.g. during the QA
> period.
>
> On Sun, Oct 12, 2014 at 9:37 PM, Josh Rosen <ro...@gmail.com> wrote:
>
>> There are a number of open pull requests that aim to extend Spark's
>> automated style checks (see
>> https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella JIRA).
>> These fixes are mostly good, but I have some concerns about merging these
>> patches.  Several of these patches make large reformatting changes in
>> nearly every file of Spark, which makes it more difficult to use `git
>> blame` and has the potential to introduce merge conflicts with all open PRs
>> and all backport patches.
>>
>> I feel that most of the value of automated style-checks comes from
>> allowing reviewers/committers to focus on the technical content of pull
>> requests rather than their formatting.  My concern is that the convenience
>> added by these new style rules will not outweigh the other overheads that
>> these reformatting patches will create for the committers.
>>
>> If possible, it would be great if we could extend the style checker to
>> enforce the more stringent rules only for new code additions / deletions.
>> If not, I don't think that we should proceed with the reformatting.  Others
>> might disagree, though, so I welcome comments / discussion.
>>
>> - Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Scalastyle improvements / large code reformatting

Posted by Reynold Xin <rx...@databricks.com>.
I actually think we should just take the bite and follow through with the
reformatting. Many rules are simply not possible to enforce only on deltas
(e.g. import ordering).

That said, maybe there are better windows to do this, e.g. during the QA
period.

On Sun, Oct 12, 2014 at 9:37 PM, Josh Rosen <ro...@gmail.com> wrote:

> There are a number of open pull requests that aim to extend Spark’s
> automated style checks (see
> https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella JIRA).
> These fixes are mostly good, but I have some concerns about merging these
> patches.  Several of these patches make large reformatting changes in
> nearly every file of Spark, which makes it more difficult to use `git
> blame` and has the potential to introduce merge conflicts with all open PRs
> and all backport patches.
>
> I feel that most of the value of automated style-checks comes from
> allowing reviewers/committers to focus on the technical content of pull
> requests rather than their formatting.  My concern is that the convenience
> added by these new style rules will not outweigh the other overheads that
> these reformatting patches will create for the committers.
>
> If possible, it would be great if we could extend the style checker to
> enforce the more stringent rules only for new code additions / deletions.
> If not, I don’t think that we should proceed with the reformatting.  Others
> might disagree, though, so I welcome comments / discussion.
>
> - Josh