You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Lars Francke <la...@gmail.com> on 2017/08/07 12:13:27 UTC

Question about how to best contribute

Hi,

a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
the past am a Hive committer and have used Hadoop for 10 years now, so I'm
not totally inexperienced. I'm earning my money as a Hadoop consultant so
I've seen dozens of real-life clusters in my life.

As part of a few recent client projects and now writing about Hadoop in a
new project/book I'm digging into the source code to figure out some of the
things that are not documented.

But as part of this digging I'm seeing lots of warnings in the code,
inconsistencies etc. and I'd like to contribute some fixes to this back to
the community.

I have been a long-time believer in good code quality and consistent code
styles. This might affect people like me especially who do a lot of
"drive-by" contributions as I'm not someone who looks at the code daily but
comes across it reasonably often as part of client engagements. In those
scenarios, it's very unhelpful to have inconsistent code & bad
documentation.

Two simple but concrete examples:
* There's lots of "final" usages on variables and methods but no
consistency. Was this done for particular reasons or personal preference?
* Similarly, there's lots of things that are public or protected while they
could in theory be private. This especially makes it very hard to reason
about code.

Judging from the current code there's lots of "unofficial" code styling
and/or personal preference. The Wiki says[1] to follow the Sun
guidelines[2] which have not been updated in almost 20 years. A new version
is in the works an clarifies a lot of things[3]. I'm trying to get it
published soon. I'd try to format according to the latter (that means among
other things no "final" for local variables).

I realize that I won't be able to single-handedly fix all of this
especially as code gets contributed but if the community thinks it's
worthwhile I'd still love to land a few cleanup patches. My experience in
the past has been that it's hard to get attention to these things (which I
fully understand as they take up someone's time to review & commit).

So, this is my request for comments on these questions:
* Is there any interest in this at all?
** "This" being patches for code style & things like FindBugs & Checkstyle
warnings
* Size of the patches: Rather one big patch or smaller ones (e.g. per file
or package)
* Anyone willing to help me with this? e.g. reviewing and committing? I'd
be more than happy to bribe you with drinks, sweets, food or something else

My plan is not to go through each and every file and fix every issue I see.
But there are some specific areas I'm looking at in detail and there I'd
love to contribute back.

Thank you for reading!

Cheers,
Lars

PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
and yarn-dev as well?

[1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
[2] <
http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html
>
[3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
[4] <
https://issues.apache.org/jira/issues/?filter=-1&jql=reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>

Re: Question about how to best contribute

Posted by Lars Francke <la...@gmail.com>.
Thanks John, another one for my list of things to look at.

On Mon, Aug 7, 2017 at 11:24 PM, John Zhuge <jo...@gmail.com> wrote:

> And check out HADOOP-12145
> <https://issues.apache.org/jira/browse/HADOOP-12145> Organize and update
> CodeReviewChecklist wiki.
>
> Thanks, your contribution will be greatly appreciated!
>
>
> On Mon, Aug 7, 2017 at 5:53 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>>
>> Hi Lars & Welcome!
>>
>> Maybe the first step here would be look at those style guides and think
>> how to bring them up to date, especially with stuff like lambda-expressions
>> in java 8, and mnodules forthcoming in in java 9, SLF4J logging, Junit 5 ->
>> 5 testing, code instrumentation, diagnostics, log stability, etc.
>>
>> https://issues.apache.org/jira/browse/HADOOP-12143 . ;
>>
>> This is my go at doing this
>>
>> https://github.com/steveloughran/formality/blob/master/
>> styleguide/styleguide.md
>>
>>
>> I've not done any work on trying to get it in, more evolving it as how I
>> code & what I look for, especially in tests.
>>
>> If you want to take this on, it'd be nice. At the same time, I fear
>> there'd be push back if you turned up and started telling people what to
>> do. Collaborating with us all on the test code is a good place to start.
>>
>> We're also more relaxed about contributions to the less-core bits of the
>> system (things like HDFS, IPC, security and Yarn core are trouble). If
>> there's stuff outside that you want to take a go at helping clean up,
>> that'd be lower risk (example: object store connectors)
>>
>> -Steve
>>
>>
>>
>> On 7 Aug 2017, at 13:13, Lars Francke <lars.francke@gmail.com<mailto:
>> lars.francke@gmail.com>> wrote:
>>
>> Hi,
>>
>> a few words about me: I've contributed to Hadoop (and it's ecosystem[4])
>> in
>> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
>> not totally inexperienced. I'm earning my money as a Hadoop consultant so
>> I've seen dozens of real-life clusters in my life.
>>
>> As part of a few recent client projects and now writing about Hadoop in a
>> new project/book I'm digging into the source code to figure out some of
>> the
>> things that are not documented.
>>
>> But as part of this digging I'm seeing lots of warnings in the code,
>> inconsistencies etc. and I'd like to contribute some fixes to this back to
>> the community.
>>
>> I have been a long-time believer in good code quality and consistent code
>> styles. This might affect people like me especially who do a lot of
>> "drive-by" contributions as I'm not someone who looks at the code daily
>> but
>> comes across it reasonably often as part of client engagements. In those
>> scenarios, it's very unhelpful to have inconsistent code & bad
>> documentation.
>>
>> Two simple but concrete examples:
>> * There's lots of "final" usages on variables and methods but no
>> consistency. Was this done for particular reasons or personal preference?
>>
>> personal, though with a move to l-expressions, it matters a lot more. We
>> should really be marking all parameters as final at the very least.
>>
>>
>> * Similarly, there's lots of things that are public or protected while
>> they
>> could in theory be private. This especially makes it very hard to reason
>> about code.
>>
>> there's now a bit of fear of breaking things, but at the very least,
>> things could be protected or package-private more than they are.
>>
>>
>>
>> Judging from the current code there's lots of "unofficial" code styling
>> and/or personal preference. The Wiki says[1] to follow the Sun
>> guidelines[2] which have not been updated in almost 20 years. A new
>> version
>> is in the works an clarifies a lot of things[3]. I'm trying to get it
>> published soon. I'd try to format according to the latter (that means
>> among
>> other things no "final" for local variables).
>>
>> I realize that I won't be able to single-handedly fix all of this
>> especially as code gets contributed but if the community thinks it's
>> worthwhile I'd still love to land a few cleanup patches. My experience in
>> the past has been that it's hard to get attention to these things (which I
>> fully understand as they take up someone's time to review & commit).
>>
>> So, this is my request for comments on these questions:
>> * Is there any interest in this at all?
>> ** "This" being patches for code style & things like FindBugs & Checkstyle
>> warnings
>> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
>> or package)
>> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
>> be more than happy to bribe you with drinks, sweets, food or something
>> else
>>
>> My plan is not to go through each and every file and fix every issue I
>> see.
>> But there are some specific areas I'm looking at in detail and there I'd
>> love to contribute back.
>>
>> Thank you for reading!
>>
>> Cheers,
>> Lars
>>
>> PS: Posting to common-dev only, not sure if I should cross post to
>> hdfs-dev
>> and yarn-dev as well?
>>
>> [1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
>> [2] <
>> http://www.oracle.com/technetwork/java/javase/documentation/
>> codeconvtoc-136057.html
>>
>> [3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
>> [4] <
>> https://issues.apache.org/jira/issues/?filter=-1&jql=reporte
>> r%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>>
>>
>>
>
>
> --
> John
>

Re: Question about how to best contribute

Posted by John Zhuge <jo...@gmail.com>.
And check out HADOOP-12145
<https://issues.apache.org/jira/browse/HADOOP-12145> Organize and update
CodeReviewChecklist wiki.

Thanks, your contribution will be greatly appreciated!


On Mon, Aug 7, 2017 at 5:53 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> Hi Lars & Welcome!
>
> Maybe the first step here would be look at those style guides and think
> how to bring them up to date, especially with stuff like lambda-expressions
> in java 8, and mnodules forthcoming in in java 9, SLF4J logging, Junit 5 ->
> 5 testing, code instrumentation, diagnostics, log stability, etc.
>
> https://issues.apache.org/jira/browse/HADOOP-12143 . ;
>
> This is my go at doing this
>
> https://github.com/steveloughran/formality/blob/
> master/styleguide/styleguide.md
>
>
> I've not done any work on trying to get it in, more evolving it as how I
> code & what I look for, especially in tests.
>
> If you want to take this on, it'd be nice. At the same time, I fear
> there'd be push back if you turned up and started telling people what to
> do. Collaborating with us all on the test code is a good place to start.
>
> We're also more relaxed about contributions to the less-core bits of the
> system (things like HDFS, IPC, security and Yarn core are trouble). If
> there's stuff outside that you want to take a go at helping clean up,
> that'd be lower risk (example: object store connectors)
>
> -Steve
>
>
>
> On 7 Aug 2017, at 13:13, Lars Francke <lars.francke@gmail.com<mailto:
> lars.francke@gmail.com>> wrote:
>
> Hi,
>
> a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
> not totally inexperienced. I'm earning my money as a Hadoop consultant so
> I've seen dozens of real-life clusters in my life.
>
> As part of a few recent client projects and now writing about Hadoop in a
> new project/book I'm digging into the source code to figure out some of the
> things that are not documented.
>
> But as part of this digging I'm seeing lots of warnings in the code,
> inconsistencies etc. and I'd like to contribute some fixes to this back to
> the community.
>
> I have been a long-time believer in good code quality and consistent code
> styles. This might affect people like me especially who do a lot of
> "drive-by" contributions as I'm not someone who looks at the code daily but
> comes across it reasonably often as part of client engagements. In those
> scenarios, it's very unhelpful to have inconsistent code & bad
> documentation.
>
> Two simple but concrete examples:
> * There's lots of "final" usages on variables and methods but no
> consistency. Was this done for particular reasons or personal preference?
>
> personal, though with a move to l-expressions, it matters a lot more. We
> should really be marking all parameters as final at the very least.
>
>
> * Similarly, there's lots of things that are public or protected while they
> could in theory be private. This especially makes it very hard to reason
> about code.
>
> there's now a bit of fear of breaking things, but at the very least,
> things could be protected or package-private more than they are.
>
>
>
> Judging from the current code there's lots of "unofficial" code styling
> and/or personal preference. The Wiki says[1] to follow the Sun
> guidelines[2] which have not been updated in almost 20 years. A new version
> is in the works an clarifies a lot of things[3]. I'm trying to get it
> published soon. I'd try to format according to the latter (that means among
> other things no "final" for local variables).
>
> I realize that I won't be able to single-handedly fix all of this
> especially as code gets contributed but if the community thinks it's
> worthwhile I'd still love to land a few cleanup patches. My experience in
> the past has been that it's hard to get attention to these things (which I
> fully understand as they take up someone's time to review & commit).
>
> So, this is my request for comments on these questions:
> * Is there any interest in this at all?
> ** "This" being patches for code style & things like FindBugs & Checkstyle
> warnings
> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
> or package)
> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
> be more than happy to bribe you with drinks, sweets, food or something else
>
> My plan is not to go through each and every file and fix every issue I see.
> But there are some specific areas I'm looking at in detail and there I'd
> love to contribute back.
>
> Thank you for reading!
>
> Cheers,
> Lars
>
> PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
> and yarn-dev as well?
>
> [1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
> [2] <
> http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-
> 136057.html
>
> [3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
> [4] <
> https://issues.apache.org/jira/issues/?filter=-1&jql=
> reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>
>
>


-- 
John

Re: Question about how to best contribute

Posted by Lars Francke <la...@gmail.com>.
Thanks for the ideas Steve.

Hi Lars & Welcome!
>
> Maybe the first step here would be look at those style guides and think
> how to bring them up to date, especially with stuff like lambda-expressions
> in java 8, and mnodules forthcoming in in java 9, SLF4J logging, Junit 5 ->
> 5 testing, code instrumentation, diagnostics, log stability, etc.
>
> https://issues.apache.org/jira/browse/HADOOP-12143 . ;
>

Yeah most style guides are outdated but fortunately at least for the basic
Java stuff there's the updated version and I currently have a thread going
on the OpenJDK mailing list to hopefully get it published (<
http://mail.openjdk.java.net/pipermail/discuss/2017-July/004254.html>). It
already covers Java 8 (but not 9). And it obviously doesn't cover SLF4J
etc. but I've put the issue on my list of things to look at.


> This is my go at doing this
>
> https://github.com/steveloughran/formality/blob/
> master/styleguide/styleguide.md
>
>
> I've not done any work on trying to get it in, more evolving it as how I
> code & what I look for, especially in tests.
>
> If you want to take this on, it'd be nice. At the same time, I fear
> there'd be push back if you turned up and started telling people what to
> do. Collaborating with us all on the test code is a good place to start.
>

Hence me reaching out here to see what would be welcome and how.

Two thing I'm sure of though:

1) We'll never be able to agree on a style that _everyone_ agrees on but in
my opinion that shouldn't stop us from adopting one. And my vote is for the
updated Java one just because of consistency. More projects will use that
one (or the Google one) just because they are popular and easily found on
Google. I

2) There'll be plenty of patches going in ignoring all of the style
guidelines.



>
> We're also more relaxed about contributions to the less-core bits of the
> system (things like HDFS, IPC, security and Yarn core are trouble). If
> there's stuff outside that you want to take a go at helping clean up,
> that'd be lower risk (example: object store connectors)
>
> -Steve
>
>
>
> On 7 Aug 2017, at 13:13, Lars Francke <la...@gmail.com> wrote:
>
> Hi,
>
> a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
> not totally inexperienced. I'm earning my money as a Hadoop consultant so
> I've seen dozens of real-life clusters in my life.
>
> As part of a few recent client projects and now writing about Hadoop in a
> new project/book I'm digging into the source code to figure out some of the
> things that are not documented.
>
> But as part of this digging I'm seeing lots of warnings in the code,
> inconsistencies etc. and I'd like to contribute some fixes to this back to
> the community.
>
> I have been a long-time believer in good code quality and consistent code
> styles. This might affect people like me especially who do a lot of
> "drive-by" contributions as I'm not someone who looks at the code daily but
> comes across it reasonably often as part of client engagements. In those
> scenarios, it's very unhelpful to have inconsistent code & bad
> documentation.
>
> Two simple but concrete examples:
> * There's lots of "final" usages on variables and methods but no
> consistency. Was this done for particular reasons or personal preference?
>
>
> personal, though with a move to l-expressions, it matters a lot more. We
> should really be marking all parameters as final at the very least.
>
>
> * Similarly, there's lots of things that are public or protected while they
> could in theory be private. This especially makes it very hard to reason
> about code.
>
>
> there's now a bit of fear of breaking things, but at the very least,
> things could be protected or package-private more than they are.
>

Yeah. The "final" thing is mostly a style thing but this here is actually
costing me (and I assume others) lots of time. And - as you said - it makes
changing/deprecating/removing things hard because you have no idea how and
where things are being used.

Judging from the current code there's lots of "unofficial" code styling
> and/or personal preference. The Wiki says[1] to follow the Sun
> guidelines[2] which have not been updated in almost 20 years. A new version
> is in the works an clarifies a lot of things[3]. I'm trying to get it
> published soon. I'd try to format according to the latter (that means among
> other things no "final" for local variables).
>
> I realize that I won't be able to single-handedly fix all of this
> especially as code gets contributed but if the community thinks it's
> worthwhile I'd still love to land a few cleanup patches. My experience in
> the past has been that it's hard to get attention to these things (which I
> fully understand as they take up someone's time to review & commit).
>
> So, this is my request for comments on these questions:
> * Is there any interest in this at all?
> ** "This" being patches for code style & things like FindBugs & Checkstyle
> warnings
> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
> or package)
> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
> be more than happy to bribe you with drinks, sweets, food or something else
>
> My plan is not to go through each and every file and fix every issue I see.
> But there are some specific areas I'm looking at in detail and there I'd
> love to contribute back.
>
> Thank you for reading!
>
> Cheers,
> Lars
>
> PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
> and yarn-dev as well?
>
> [1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
> [2] <
> http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-
> 136057.html
>
>
> [3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
> [4] <
> https://issues.apache.org/jira/issues/?filter=-1&jql=
> reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>
>

Re: Question about how to best contribute

Posted by Steve Loughran <st...@hortonworks.com>.
Hi Lars & Welcome!

Maybe the first step here would be look at those style guides and think how to bring them up to date, especially with stuff like lambda-expressions in java 8, and mnodules forthcoming in in java 9, SLF4J logging, Junit 5 -> 5 testing, code instrumentation, diagnostics, log stability, etc.

https://issues.apache.org/jira/browse/HADOOP-12143 . ;

This is my go at doing this

https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md


I've not done any work on trying to get it in, more evolving it as how I code & what I look for, especially in tests.

If you want to take this on, it'd be nice. At the same time, I fear there'd be push back if you turned up and started telling people what to do. Collaborating with us all on the test code is a good place to start.

We're also more relaxed about contributions to the less-core bits of the system (things like HDFS, IPC, security and Yarn core are trouble). If there's stuff outside that you want to take a go at helping clean up, that'd be lower risk (example: object store connectors)

-Steve



On 7 Aug 2017, at 13:13, Lars Francke <la...@gmail.com>> wrote:

Hi,

a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
the past am a Hive committer and have used Hadoop for 10 years now, so I'm
not totally inexperienced. I'm earning my money as a Hadoop consultant so
I've seen dozens of real-life clusters in my life.

As part of a few recent client projects and now writing about Hadoop in a
new project/book I'm digging into the source code to figure out some of the
things that are not documented.

But as part of this digging I'm seeing lots of warnings in the code,
inconsistencies etc. and I'd like to contribute some fixes to this back to
the community.

I have been a long-time believer in good code quality and consistent code
styles. This might affect people like me especially who do a lot of
"drive-by" contributions as I'm not someone who looks at the code daily but
comes across it reasonably often as part of client engagements. In those
scenarios, it's very unhelpful to have inconsistent code & bad
documentation.

Two simple but concrete examples:
* There's lots of "final" usages on variables and methods but no
consistency. Was this done for particular reasons or personal preference?

personal, though with a move to l-expressions, it matters a lot more. We should really be marking all parameters as final at the very least.


* Similarly, there's lots of things that are public or protected while they
could in theory be private. This especially makes it very hard to reason
about code.

there's now a bit of fear of breaking things, but at the very least, things could be protected or package-private more than they are.



Judging from the current code there's lots of "unofficial" code styling
and/or personal preference. The Wiki says[1] to follow the Sun
guidelines[2] which have not been updated in almost 20 years. A new version
is in the works an clarifies a lot of things[3]. I'm trying to get it
published soon. I'd try to format according to the latter (that means among
other things no "final" for local variables).

I realize that I won't be able to single-handedly fix all of this
especially as code gets contributed but if the community thinks it's
worthwhile I'd still love to land a few cleanup patches. My experience in
the past has been that it's hard to get attention to these things (which I
fully understand as they take up someone's time to review & commit).

So, this is my request for comments on these questions:
* Is there any interest in this at all?
** "This" being patches for code style & things like FindBugs & Checkstyle
warnings
* Size of the patches: Rather one big patch or smaller ones (e.g. per file
or package)
* Anyone willing to help me with this? e.g. reviewing and committing? I'd
be more than happy to bribe you with drinks, sweets, food or something else

My plan is not to go through each and every file and fix every issue I see.
But there are some specific areas I'm looking at in detail and there I'd
love to contribute back.

Thank you for reading!

Cheers,
Lars

PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
and yarn-dev as well?

[1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
[2] <
http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html

[3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
[4] <
https://issues.apache.org/jira/issues/?filter=-1&jql=reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke



Re: Question about how to best contribute

Posted by Lars Francke <la...@gmail.com>.
Hi Akira,


> > So, this is my request for comments on these questions:
> > * Is there any interest in this at all?
> > ** "This" being patches for code style & things like FindBugs &
> Checkstyle
> > warnings
>
> Yes. I'm interested in this.
>
> > * Size of the patches: Rather one big patch or smaller ones (e.g. per
> file
> > or package)
>
> Par file or package is fine.
> Bigger patch makes review harder,
> and if your patch is big, you will need to rebase it frequently :(


Good point!


>
>
> > * Anyone willing to help me with this? e.g. reviewing and committing?
>
> Yes, please send an e-mail to dev ML or ping me if your patches are not
> reviewed from anyone.


Excellent! Thank you so much, I'll certainly take you up on your offer!

Cheers,
Lars




>
> On 2017/08/07 21:13, Lars Francke wrote:
>
>> Hi,
>>
>> a few words about me: I've contributed to Hadoop (and it's ecosystem[4])
>> in
>> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
>> not totally inexperienced. I'm earning my money as a Hadoop consultant so
>> I've seen dozens of real-life clusters in my life.
>>
>> As part of a few recent client projects and now writing about Hadoop in a
>> new project/book I'm digging into the source code to figure out some of
>> the
>> things that are not documented.
>>
>> But as part of this digging I'm seeing lots of warnings in the code,
>> inconsistencies etc. and I'd like to contribute some fixes to this back to
>> the community.
>>
>> I have been a long-time believer in good code quality and consistent code
>> styles. This might affect people like me especially who do a lot of
>> "drive-by" contributions as I'm not someone who looks at the code daily
>> but
>> comes across it reasonably often as part of client engagements. In those
>> scenarios, it's very unhelpful to have inconsistent code & bad
>> documentation.
>>
>> Two simple but concrete examples:
>> * There's lots of "final" usages on variables and methods but no
>> consistency. Was this done for particular reasons or personal preference?
>> * Similarly, there's lots of things that are public or protected while
>> they
>> could in theory be private. This especially makes it very hard to reason
>> about code.
>>
>> Judging from the current code there's lots of "unofficial" code styling
>> and/or personal preference. The Wiki says[1] to follow the Sun
>> guidelines[2] which have not been updated in almost 20 years. A new
>> version
>> is in the works an clarifies a lot of things[3]. I'm trying to get it
>> published soon. I'd try to format according to the latter (that means
>> among
>> other things no "final" for local variables).
>>
>> I realize that I won't be able to single-handedly fix all of this
>> especially as code gets contributed but if the community thinks it's
>> worthwhile I'd still love to land a few cleanup patches. My experience in
>> the past has been that it's hard to get attention to these things (which I
>> fully understand as they take up someone's time to review & commit).
>>
>> So, this is my request for comments on these questions:
>> * Is there any interest in this at all?
>> ** "This" being patches for code style & things like FindBugs & Checkstyle
>> warnings
>> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
>> or package)
>> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
>> be more than happy to bribe you with drinks, sweets, food or something
>> else
>>
>> My plan is not to go through each and every file and fix every issue I
>> see.
>> But there are some specific areas I'm looking at in detail and there I'd
>> love to contribute back.
>>
>> Thank you for reading!
>>
>> Cheers,
>> Lars
>>
>> PS: Posting to common-dev only, not sure if I should cross post to
>> hdfs-dev
>> and yarn-dev as well?
>>
>> [1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
>> [2] <
>> http://www.oracle.com/technetwork/java/javase/documentation/
>> codeconvtoc-136057.html
>>
>>>
>>> [3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
>> [4] <
>> https://issues.apache.org/jira/issues/?filter=-1&jql=reporte
>> r%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>>
>>>
>>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: Question about how to best contribute

Posted by Akira Ajisaka <aa...@apache.org>.
Hi Lars,

Thank you for your questions!

 > So, this is my request for comments on these questions:
 > * Is there any interest in this at all?
 > ** "This" being patches for code style & things like FindBugs & Checkstyle
 > warnings

Yes. I'm interested in this.

 > * Size of the patches: Rather one big patch or smaller ones (e.g. per file
 > or package)

Par file or package is fine.
Bigger patch makes review harder,
and if your patch is big, you will need to rebase it frequently :(

 > * Anyone willing to help me with this? e.g. reviewing and committing?

Yes, please send an e-mail to dev ML or ping me if your patches are not reviewed from anyone.

Thanks,
Akira

On 2017/08/07 21:13, Lars Francke wrote:
> Hi,
>
> a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
> not totally inexperienced. I'm earning my money as a Hadoop consultant so
> I've seen dozens of real-life clusters in my life.
>
> As part of a few recent client projects and now writing about Hadoop in a
> new project/book I'm digging into the source code to figure out some of the
> things that are not documented.
>
> But as part of this digging I'm seeing lots of warnings in the code,
> inconsistencies etc. and I'd like to contribute some fixes to this back to
> the community.
>
> I have been a long-time believer in good code quality and consistent code
> styles. This might affect people like me especially who do a lot of
> "drive-by" contributions as I'm not someone who looks at the code daily but
> comes across it reasonably often as part of client engagements. In those
> scenarios, it's very unhelpful to have inconsistent code & bad
> documentation.
>
> Two simple but concrete examples:
> * There's lots of "final" usages on variables and methods but no
> consistency. Was this done for particular reasons or personal preference?
> * Similarly, there's lots of things that are public or protected while they
> could in theory be private. This especially makes it very hard to reason
> about code.
>
> Judging from the current code there's lots of "unofficial" code styling
> and/or personal preference. The Wiki says[1] to follow the Sun
> guidelines[2] which have not been updated in almost 20 years. A new version
> is in the works an clarifies a lot of things[3]. I'm trying to get it
> published soon. I'd try to format according to the latter (that means among
> other things no "final" for local variables).
>
> I realize that I won't be able to single-handedly fix all of this
> especially as code gets contributed but if the community thinks it's
> worthwhile I'd still love to land a few cleanup patches. My experience in
> the past has been that it's hard to get attention to these things (which I
> fully understand as they take up someone's time to review & commit).
>
> So, this is my request for comments on these questions:
> * Is there any interest in this at all?
> ** "This" being patches for code style & things like FindBugs & Checkstyle
> warnings
> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
> or package)
> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
> be more than happy to bribe you with drinks, sweets, food or something else
>
> My plan is not to go through each and every file and fix every issue I see.
> But there are some specific areas I'm looking at in detail and there I'd
> love to contribute back.
>
> Thank you for reading!
>
> Cheers,
> Lars
>
> PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
> and yarn-dev as well?
>
> [1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
> [2] <
> http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html
>>
> [3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
> [4] <
> https://issues.apache.org/jira/issues/?filter=-1&jql=reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: Question about how to best contribute

Posted by Jason Lowe <jl...@yahoo-inc.com.INVALID>.
+1 for Steve's and Chris's sentiments.  Mass reformatting of existing code can make maintaining anything released prior to the makeover very difficult.  Almost all of Apache Hadoop's users are not on trunk or branch-2, and I'm not sure we want large refactoring patches going into stability lines like branch-2.8, branch-2.7, and branch-2.6 where most of the users are.  We should definitely consider the maintenance costs of refactoring decisions.

Jason 

    On Wednesday, August 9, 2017 4:55 AM, Steve Loughran <st...@hortonworks.com> wrote:
 

 
> On 8 Aug 2017, at 21:33, Chris Douglas <cd...@apache.org> wrote:
> 
> Lars-
> 
> Welcome!
> 
> As a mild refinement of enthusiasm for this proposal: when you
> approach a "cleanup", please consider the cost to tracing the lineage
> of changes in the codebase. Working on a project as large and
> long-running as Hadoop, we often need to trace what motivated a
> particular change using only the commit log and JIRA. Sifting through
> cosmetic changes that obscure the reasoning behind a module not worth
> the aesthetic benefits of consistently formatted code. As a strawman:
> hitting 100% checkstyle compliance would not improve our users'
> experience, so please use your judgement.
> 
> As you point out, we're not going to maintain perfect discipline going
> forward, either. Nitpicking our contributors beyond what is necessary
> to keep the code legible discourages them from continuing as
> contributors. As a general heuristic: the stricter the rule, the more
> automation is required to enforce it. This prevents everyone from
> burning out on minutiae.
> 
> All that said, if you propose a refactoring that makes it easier to
> maintain code that's developed more vestigial parts that functional
> ones (and we have more than a few of those), that is hugely valuable.
> -C


That reminds me of a few more issues

* Major cleanup patches invariably break handling pending patches from others (which we should review) and also make cherrypicking harder. Which we why we tend to avoid things like a "lets fix the import order" patch for the sake of it.

* We can't treat things tagged @Private as stuff we can break on a whim. I know it'd be nice, but often they get picked up because they're the only way to do things...even the example YARN app does this. So changes there always need to go through a scan of the downstream apps. A few of us have IntelliJ set up to include all the main projects so we can find if/where a class or method gets used...and use that to temper our enthusiasm. Chris himself had to deal with this last week with the proposed removal of FileStatus.isDir in HADOOP-14726 . We never want to break downstream code


FWIW, I'd use any new styleguide to manage future contribs, not reapply to all existing code, except during other work. Even then, with caution


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


   

Re: Question about how to best contribute

Posted by Steve Loughran <st...@hortonworks.com>.
> On 8 Aug 2017, at 21:33, Chris Douglas <cd...@apache.org> wrote:
> 
> Lars-
> 
> Welcome!
> 
> As a mild refinement of enthusiasm for this proposal: when you
> approach a "cleanup", please consider the cost to tracing the lineage
> of changes in the codebase. Working on a project as large and
> long-running as Hadoop, we often need to trace what motivated a
> particular change using only the commit log and JIRA. Sifting through
> cosmetic changes that obscure the reasoning behind a module not worth
> the aesthetic benefits of consistently formatted code. As a strawman:
> hitting 100% checkstyle compliance would not improve our users'
> experience, so please use your judgement.
> 
> As you point out, we're not going to maintain perfect discipline going
> forward, either. Nitpicking our contributors beyond what is necessary
> to keep the code legible discourages them from continuing as
> contributors. As a general heuristic: the stricter the rule, the more
> automation is required to enforce it. This prevents everyone from
> burning out on minutiae.
> 
> All that said, if you propose a refactoring that makes it easier to
> maintain code that's developed more vestigial parts that functional
> ones (and we have more than a few of those), that is hugely valuable.
> -C


That reminds me of a few more issues

* Major cleanup patches invariably break handling pending patches from others (which we should review) and also make cherrypicking harder. Which we why we tend to avoid things like a "lets fix the import order" patch for the sake of it.

* We can't treat things tagged @Private as stuff we can break on a whim. I know it'd be nice, but often they get picked up because they're the only way to do things...even the example YARN app does this. So changes there always need to go through a scan of the downstream apps. A few of us have IntelliJ set up to include all the main projects so we can find if/where a class or method gets used...and use that to temper our enthusiasm. Chris himself had to deal with this last week with the proposed removal of FileStatus.isDir in HADOOP-14726 . We never want to break downstream code


FWIW, I'd use any new styleguide to manage future contribs, not reapply to all existing code, except during other work. Even then, with caution


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: Question about how to best contribute

Posted by Chris Douglas <cd...@apache.org>.
Lars-

Welcome!

As a mild refinement of enthusiasm for this proposal: when you
approach a "cleanup", please consider the cost to tracing the lineage
of changes in the codebase. Working on a project as large and
long-running as Hadoop, we often need to trace what motivated a
particular change using only the commit log and JIRA. Sifting through
cosmetic changes that obscure the reasoning behind a module not worth
the aesthetic benefits of consistently formatted code. As a strawman:
hitting 100% checkstyle compliance would not improve our users'
experience, so please use your judgement.

As you point out, we're not going to maintain perfect discipline going
forward, either. Nitpicking our contributors beyond what is necessary
to keep the code legible discourages them from continuing as
contributors. As a general heuristic: the stricter the rule, the more
automation is required to enforce it. This prevents everyone from
burning out on minutiae.

All that said, if you propose a refactoring that makes it easier to
maintain code that's developed more vestigial parts that functional
ones (and we have more than a few of those), that is hugely valuable.
-C

On Mon, Aug 7, 2017 at 5:13 AM, Lars Francke <la...@gmail.com> wrote:
> Hi,
>
> a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
> not totally inexperienced. I'm earning my money as a Hadoop consultant so
> I've seen dozens of real-life clusters in my life.
>
> As part of a few recent client projects and now writing about Hadoop in a
> new project/book I'm digging into the source code to figure out some of the
> things that are not documented.
>
> But as part of this digging I'm seeing lots of warnings in the code,
> inconsistencies etc. and I'd like to contribute some fixes to this back to
> the community.
>
> I have been a long-time believer in good code quality and consistent code
> styles. This might affect people like me especially who do a lot of
> "drive-by" contributions as I'm not someone who looks at the code daily but
> comes across it reasonably often as part of client engagements. In those
> scenarios, it's very unhelpful to have inconsistent code & bad
> documentation.
>
> Two simple but concrete examples:
> * There's lots of "final" usages on variables and methods but no
> consistency. Was this done for particular reasons or personal preference?
> * Similarly, there's lots of things that are public or protected while they
> could in theory be private. This especially makes it very hard to reason
> about code.
>
> Judging from the current code there's lots of "unofficial" code styling
> and/or personal preference. The Wiki says[1] to follow the Sun
> guidelines[2] which have not been updated in almost 20 years. A new version
> is in the works an clarifies a lot of things[3]. I'm trying to get it
> published soon. I'd try to format according to the latter (that means among
> other things no "final" for local variables).
>
> I realize that I won't be able to single-handedly fix all of this
> especially as code gets contributed but if the community thinks it's
> worthwhile I'd still love to land a few cleanup patches. My experience in
> the past has been that it's hard to get attention to these things (which I
> fully understand as they take up someone's time to review & commit).
>
> So, this is my request for comments on these questions:
> * Is there any interest in this at all?
> ** "This" being patches for code style & things like FindBugs & Checkstyle
> warnings
> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
> or package)
> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
> be more than happy to bribe you with drinks, sweets, food or something else
>
> My plan is not to go through each and every file and fix every issue I see.
> But there are some specific areas I'm looking at in detail and there I'd
> love to contribute back.
>
> Thank you for reading!
>
> Cheers,
> Lars
>
> PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
> and yarn-dev as well?
>
> [1] <https://wiki.apache.org/hadoop/CodeReviewChecklist>
> [2] <
> http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html
>>
> [3] <http://cr.openjdk.java.net/~alundblad/styleguide/index-v6.html>
> [4] <
> https://issues.apache.org/jira/issues/?filter=-1&jql=reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org