You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Jean-Baptiste Onofré <jb...@nanthrax.net> on 2016/10/05 11:51:01 UTC

[REMINDER] Technical discussion on the mailing list

Hi team,

I would like to excuse myself to have forgotten to discuss and share with you a technical point and generally speaking do a small reminder.

When we work with Eugene on the JdbcIO, we experimented AutoValue to deal with IO configuration. AutoValue provides a nice way to reduce and limit the boilerplate code required by the IO configuration.
We used AutoValue in JdbcIO and, regarding the good improvements we saw, we started to refactor the other IOs.

The use of AutoValue should have been notice and discussed on the mailing list.

"If it doesn't exist on the mailing list, it doesn't exist at all."

So, any comment happening on a GitHub pull request, or discussion on hangouts which can impact the project (generally speaking) has to happen on the mailing list.

It provides project transparency and facilitates the new contribution onboarding.

Thanks !

Regards
JB

Re: [REMINDER] Technical discussion on the mailing list

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Hi Dan,

Even if enable the "full" Github integration is a step forward, I'm not 
sure it will help at the end.

I'm afraid we (at least most of us ;)) won't read pull request comments 
on the mailing list because it would be way more verbose.

So, I think there's more value to quickly explain a change, the impacts, 
the origin, ...
It doesn't take more than 5 minutes and very convenient for the 
contributors.

My $0.01

Regards
JB

On 10/06/2016 02:13 AM, Daniel Kulp wrote:
> I just want to give a little more context to this….  I’ve been lurking on this list for several months now reading everything that’s going on.   From Apache’s standpoint, that should be a “very good start” for getting to know what is happening in a project.
>
> On my last PR, Eugene commented about using the AutoValue pattern for part of it which caught me off guard.   None of the other IO’s in master were using it, there wasn’t any discussion on this list about it, I had no idea what it was about.   So I asked JB to make sure I hadn’t missed anything.
>
> Anyway, this is one of the main concerns I have with Beam’s PR work flow, I feel I’m missing things as there is significant amount of things not happening on a list.   The initial pull request is going to the commits list (ok, would prefer the dev list, but at least its on a list).  However, none of the comments or discussions or anything that is occurring as part of the review is making it to any list.   The only people that “learn” from the reviews are the reviewers and the person who initiated the PR unless they go into each and every PR and read the comments (and find the news ones and such).    With my Apache hat on, this bothers me.    As another example, the comments on PR1003 are very applicable to anyone looking into writing IO’s and they could learn about some of the “best practices” presented there.      Anyway, I don’t really understand why the full github integration wasn’t setup for the beam PR’s so that the comments would come back to the lists as well (and JIRA, BTW).
>
> That’s basically the background as to why JB sent this.  I was confused and bugged him.   :-)
>
> Dan
>
>
>
>> On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
>>
>> Hi team,
>>
>> I would like to excuse myself to have forgotten to discuss and share with you a technical point and generally speaking do a small reminder.
>>
>> When we work with Eugene on the JdbcIO, we experimented AutoValue to deal with IO configuration. AutoValue provides a nice way to reduce and limit the boilerplate code required by the IO configuration.
>> We used AutoValue in JdbcIO and, regarding the good improvements we saw, we started to refactor the other IOs.
>>
>> The use of AutoValue should have been notice and discussed on the mailing list.
>>
>> "If it doesn't exist on the mailing list, it doesn't exist at all."
>>
>> So, any comment happening on a GitHub pull request, or discussion on hangouts which can impact the project (generally speaking) has to happen on the mailing list.
>>
>> It provides project transparency and facilitates the new contribution onboarding.
>>
>> Thanks !
>>
>> Regards
>> JB
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [REMINDER] Technical discussion on the mailing list

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Yes, but as said in another e-mail, even if it's a step forward, an 
argued and valuable proposal message is better IMHO.

Regards
JB

On 10/06/2016 08:16 AM, Daniel Kulp wrote:
>
>> On Oct 6, 2016, at 2:33 AM, Dan Halperin <dh...@google.com.INVALID> wrote:
>>
>> Anyway, I don\u2019t really understand why the full github integration wasn\u2019t
>>> setup for the beam PR\u2019s so that the comments would come back to the lists
>>> as well (and JIRA, BTW).
>>>
>>
>> This part confuses me. We've been told that discussions on JIRA, even
>> though they are emailed to the mailing lists, don't count as happening on
>> the mailing list. So why would github integration be helpful vs just more
>> spam?
>
> Not sure who told you that\u2026
>
> With JIRA, you can actually do a reply-all and respond to comments via email and your comments would make it back to the JIRA.   Thus, the discussion can be done \u201con the list\u201d yet still be in jira.  That kind of depends on how the JIRA project is setup though.
>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [REMINDER] Technical discussion on the mailing list

Posted by Daniel Kulp <dk...@apache.org>.

> On Oct 6, 2016, at 2:33 AM, Dan Halperin <dh...@google.com.INVALID> wrote:
> 
> Anyway, I don’t really understand why the full github integration wasn’t
>> setup for the beam PR’s so that the comments would come back to the lists
>> as well (and JIRA, BTW).
>> 
> 
> This part confuses me. We've been told that discussions on JIRA, even
> though they are emailed to the mailing lists, don't count as happening on
> the mailing list. So why would github integration be helpful vs just more
> spam?

Not sure who told you that…

With JIRA, you can actually do a reply-all and respond to comments via email and your comments would make it back to the JIRA.   Thus, the discussion can be done “on the list” yet still be in jira.  That kind of depends on how the JIRA project is setup though.  


-- 
Daniel Kulp
dkulp@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com

Re: [REMINDER] Technical discussion on the mailing list

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Great summary Davor,

+1 with what you said.

Regards
JB

On 10/06/2016 08:29 AM, Davor Bonaci wrote:
> Daniel, so glad you are starting to contribute to Beam! It was great
> talking with you in person back in May. Welcome!
>
> --
>
> There are lots of different things mentioned here; I'll try to address them
> separately.
>
> The first use of AutoValue should have been discussed on the dev@ mailing
> list. I think the main reason for the discussion is a bit different --
> AutoValue has a non-trivial tradeoff -- compile complexity vs. boilerplate
> code. For example, AutoValue may degrade IDE experience for some
> contributors. If we'd go in depth on this, I'm sure we'd find opposing
> opinions on the use of AutoValue. This tradeoff should have been discussed
> on the dev@ list, followed by a community decision.
>
> Note that this has happened *before* the JdbcIO work. Since AutoValue has
> been already used elsewhere in the project, there was no real reason not to
> use it in JdbcIO, as appropriate. Therefore, I think JB and Eugene did
> everything right! Second, third or thousandth usage of a concept doesn't
> require any particular discussion. They didn't make anything worse. Their
> discussion is totally appropriate for code review.
>
> Now, as Daniel points out, I think it is not right to ask a contributor to
> change his PR to use AutoValue when none of the existing IO connectors use
> it. This is making a too high standard. In fact, it is desirable for new
> contributions to follow already established patterns, instead of inventing
> something new. If we want to change a pattern, we should do it as a
> separate effort across the board.
>
> On the other hand, dev@ discussion wouldn't have helped to prevent review
> comments/discussions. Let's say we have had the discussion, and a new
> contributor comes a year later. Should we ask her to read all discussions
> that ever happened in the project to learn everything she might need? Of
> course not! She should follow already established patterns and learn any
> specifics during code review. And then, best practices should be documented
> on the website.
>
> To summarize, a few things could have been better:
> * Discussion of the first use of AutoValue on dev@.
> * Avoiding overzealous core review comments.
> * Changing a pattern should have been done by filing several starter tasks
> in JIRA.
>
> --
>
> There are also several different proposals for altering a part of the
> workflow.
>
>> code review comments not making to a list
>
> We have >1000 PRs so far, with at least a dozen comments on average, with
> pace increasing. This is >10,000 emails, most of which are "fix a typo".
> This leads into information overload, with actual information being missed.
>
> If someone wants this extra information -- just clicking the Watch button
> in the GitHub UI will make it happen!
>
>> creating new JIRA and opening PR to dev@
>
> These currently go to commits@. This would have resulted in another 1,700
> email threads compared to <150 now.
>
> Generally speaking, *all* of this is already available to anyone who wants
> to receive it. However, anyone I know that has tried, has given up very
> quickly ;). If anybody is concerned, we can create several new lists for
> this traffic -- but we shouldn't repurpose dev@ for it.
>
>> I feel I\u2019m missing things as there is significant amount of things not
> happening on a list
>
> I think "feeling of missing things" is totally valid. I feel that too, as
> well as almost everybody else.
>
> My best answer is -- we should realize that we are an extremely large and
> complex project, with >100 contributors and >20 people working on it full
> time. Nobody can follow every SDK, every runner, every IO connector, every
> pull request, every comment that all contributors make each and every day.
>
> While nobody can follow everything, everything is being followed by
> multiple people. And, we need to be accountable to each other to surface
> everything relevant to the dev@ list. And I believe that is already
> happening the vast majority of time. This is just one example where it
> didn't happen.
>
> --
>
> All that said, there are certainly areas for improvement. If anyone has
> specific ideas, please reach out! I'd love to discuss them in detail and
> propose improvements to the wider community.
>
> Thanks!
>
>
> On Wed, Oct 5, 2016 at 6:16 PM, Thomas Weise <th...@apache.org> wrote:
>>
>> How about sending just the notifications for creating new JIRA and opening
>> PR to dev@ so that those that are interested can start watching?
>>
>> Thanks,
>> Thomas
>>
>> On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin <dh...@google.com.invalid>
>> wrote:
>>
>>> On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <dk...@apache.org> wrote:
>>>
>>>> I just want to give a little more context to this\u2026.  I\u2019ve been
> lurking on
>>>> this list for several months now reading everything that\u2019s going on.
>>>  From
>>>> Apache\u2019s standpoint, that should be a \u201cvery good start\u201d for getting to
>>> know
>>>> what is happening in a project.
>>>>
>>>> On my last PR, Eugene commented about using the AutoValue pattern for
>>> part
>>>> of it which caught me off guard.   None of the other IO\u2019s in master
> were
>>>> using it, there wasn\u2019t any discussion on this list about it, I had no
>>> idea
>>>> what it was about.   So I asked JB to make sure I hadn\u2019t missed
> anything.
>>>>
>>>
>>>> Anyway, this is one of the main concerns I have with Beam\u2019s PR work
> flow,
>>>> I feel I\u2019m missing things as there is significant amount of things not
>>>> happening on a list.   The initial pull request is going to the
> commits
>>>> list (ok, would prefer the dev list, but at least its on a list).
>>> However,
>>>> none of the comments or discussions or anything that is occurring as
> part
>>>> of the review is making it to any list.   The only people that \u201clearn\u201d
>>> from
>>>> the reviews are the reviewers and the person who initiated the PR
> unless
>>>> they go into each and every PR and read the comments (and find the
> news
>>>> ones and such).    With my Apache hat on, this bothers me.
>>>
>>>
>>> Anyway, I don\u2019t really understand why the full github integration wasn\u2019t
>>>> setup for the beam PR\u2019s so that the comments would come back to the
> lists
>>>> as well (and JIRA, BTW).
>>>>
>>>
>>> This part confuses me. We've been told that discussions on JIRA, even
>>> though they are emailed to the mailing lists, don't count as happening
> on
>>> the mailing list. So why would github integration be helpful vs just
> more
>>> spam?
>>>
>>> As another example, the comments on PR1003 are very applicable to anyone
>>>> looking into writing IO\u2019s and they could learn about some of the \u201cbest
>>>> practices\u201d presented there.
>>>
>>>
>>> As Beam grows during its incubation, we are moving a lot of knowledge to
>>> documentation, but yes -- right now, most of the I/O related practices
> live
>>> in Eugene's and my head (and now, JB's!). We're working on it, and hope
> to
>>> dramatically improve documentation for source authors in the next
> quarter.
>>>
>>> For AutoValue specifically, this is by no means codified and it is
>>> DEFINITELY not mandatory. Eugene and JB just experimented with it in the
>>> last few days and decided it was useful in a few cases. We do (or did,
>>> before this thread) need to have an actual discussion on the mailing
> list
>>> before moving forward further towards making it policy.
>>>
>>> Right now Ben Chambers is trying to apply AutoValue in places that need
>>> templated types and struggling with multiple ?s, so the discussion may
> need
>>> to continue! ...
>>>
>>> Thanks,
>>> Dan
>>>
>>> That\u2019s basically the background as to why JB sent this.  I was confused
> and
>>>> bugged him.   :-)
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>>> On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>>> wrote:
>>>>>
>>>>> Hi team,
>>>>>
>>>>> I would like to excuse myself to have forgotten to discuss and share
>>>> with you a technical point and generally speaking do a small reminder.
>>>>>
>>>>> When we work with Eugene on the JdbcIO, we experimented AutoValue to
>>>> deal with IO configuration. AutoValue provides a nice way to reduce
> and
>>>> limit the boilerplate code required by the IO configuration.
>>>>> We used AutoValue in JdbcIO and, regarding the good improvements we
>>> saw,
>>>> we started to refactor the other IOs.
>>>>>
>>>>> The use of AutoValue should have been notice and discussed on the
>>>> mailing list.
>>>>>
>>>>> "If it doesn't exist on the mailing list, it doesn't exist at all."
>>>>>
>>>>> So, any comment happening on a GitHub pull request, or discussion on
>>>> hangouts which can impact the project (generally speaking) has to
> happen
>>> on
>>>> the mailing list.
>>>>>
>>>>> It provides project transparency and facilitates the new
> contribution
>>>> onboarding.
>>>>>
>>>>> Thanks !
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>
>>>> --
>>>> Daniel Kulp
>>>> dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <
>>>> http://dankulp.com/blog>
>>>> Talend Community Coder - http://coders.talend.com <
>>>> http://coders.talend.com/>
>>>>
>>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [REMINDER] Technical discussion on the mailing list

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Hi Dan,

I think for 1, we are not so far and things are improving smoothly. For 
instance, Ben started the metric discussion thread on the mailing list. 
Of course, we have to improve there, but it's just a question of time 
and good habits.

For 2, I'm with you about that. IMHO, we need an agreement following by 
a statement on the mailing list at the end of the discussion period. 
It's what we are using in other Apache projects 
([DISCUSSION]/[VOTE]/[RESULT]). Sometime, externally from the projects, 
the discussion can be seen a bit "directive" (the same about some 
comments in the pull request). So, a clear statement on the mailing list 
(as proposed in my original e-mail) will help there and it's definitely 
an area where we have to improve.

Thanks,
Regards
JB

On 10/06/2016 12:04 PM, Daniel Kulp wrote:
> At the end of the day, it comes down to two questions:
>
> 1) Are there technical and project direction discussions happening off list and not reflected back to the list?
>
> 2) If yes, are the concrete decisions being made as a result of the off list discussions?
>
>
> The answer to #1 is a definite yes.   From a quick perusal, several of the pull requests that have more than 10-15 comments is a discussion that should be back here.  Look at the Metrics PR, the splittable DoFn pr, etc\u2026 There are discussions there that NEED to be coming back to this list one way or another.   For things like the Splittable DoFn that is a long running discussion, it would most likely need periodic summaries/updates.  Trying to follow everything going on in that PR is impossible with the comments on the outdated commits and such.
>
> Because of the scale of the #1 problem, I\u2019m unsure of the answer to #2, but my gut feeling is yes.  If the off list discussions are resulting in technical changes that people don\u2019t know about and cannot object to or comment on, then there is a problem.
>
> From an Apache standpoint, we have to get the answer to BOTH questions to a \u201cno\u201d state.   That\u2019s a requirement.   The question is how do we get there?
>
> Dan
>
>
>
>> On Oct 6, 2016, at 8:29 AM, Davor Bonaci <da...@google.com.INVALID> wrote:
>>
>> Daniel, so glad you are starting to contribute to Beam! It was great
>> talking with you in person back in May. Welcome!
>>
>> --
>>
>> There are lots of different things mentioned here; I'll try to address them
>> separately.
>>
>> The first use of AutoValue should have been discussed on the dev@ mailing
>> list. I think the main reason for the discussion is a bit different --
>> AutoValue has a non-trivial tradeoff -- compile complexity vs. boilerplate
>> code. For example, AutoValue may degrade IDE experience for some
>> contributors. If we'd go in depth on this, I'm sure we'd find opposing
>> opinions on the use of AutoValue. This tradeoff should have been discussed
>> on the dev@ list, followed by a community decision.
>>
>> Note that this has happened *before* the JdbcIO work. Since AutoValue has
>> been already used elsewhere in the project, there was no real reason not to
>> use it in JdbcIO, as appropriate. Therefore, I think JB and Eugene did
>> everything right! Second, third or thousandth usage of a concept doesn't
>> require any particular discussion. They didn't make anything worse. Their
>> discussion is totally appropriate for code review.
>>
>> Now, as Daniel points out, I think it is not right to ask a contributor to
>> change his PR to use AutoValue when none of the existing IO connectors use
>> it. This is making a too high standard. In fact, it is desirable for new
>> contributions to follow already established patterns, instead of inventing
>> something new. If we want to change a pattern, we should do it as a
>> separate effort across the board.
>>
>> On the other hand, dev@ discussion wouldn't have helped to prevent review
>> comments/discussions. Let's say we have had the discussion, and a new
>> contributor comes a year later. Should we ask her to read all discussions
>> that ever happened in the project to learn everything she might need? Of
>> course not! She should follow already established patterns and learn any
>> specifics during code review. And then, best practices should be documented
>> on the website.
>>
>> To summarize, a few things could have been better:
>> * Discussion of the first use of AutoValue on dev@.
>> * Avoiding overzealous core review comments.
>> * Changing a pattern should have been done by filing several starter tasks
>> in JIRA.
>>
>> --
>>
>> There are also several different proposals for altering a part of the
>> workflow.
>>
>>> code review comments not making to a list
>>
>> We have >1000 PRs so far, with at least a dozen comments on average, with
>> pace increasing. This is >10,000 emails, most of which are "fix a typo".
>> This leads into information overload, with actual information being missed.
>>
>> If someone wants this extra information -- just clicking the Watch button
>> in the GitHub UI will make it happen!
>>
>>> creating new JIRA and opening PR to dev@
>>
>> These currently go to commits@. This would have resulted in another 1,700
>> email threads compared to <150 now.
>>
>> Generally speaking, *all* of this is already available to anyone who wants
>> to receive it. However, anyone I know that has tried, has given up very
>> quickly ;). If anybody is concerned, we can create several new lists for
>> this traffic -- but we shouldn't repurpose dev@ for it.
>>
>>> I feel I\u2019m missing things as there is significant amount of things not
>> happening on a list
>>
>> I think "feeling of missing things" is totally valid. I feel that too, as
>> well as almost everybody else.
>>
>> My best answer is -- we should realize that we are an extremely large and
>> complex project, with >100 contributors and >20 people working on it full
>> time. Nobody can follow every SDK, every runner, every IO connector, every
>> pull request, every comment that all contributors make each and every day.
>>
>> While nobody can follow everything, everything is being followed by
>> multiple people. And, we need to be accountable to each other to surface
>> everything relevant to the dev@ list. And I believe that is already
>> happening the vast majority of time. This is just one example where it
>> didn't happen.
>>
>> --
>>
>> All that said, there are certainly areas for improvement. If anyone has
>> specific ideas, please reach out! I'd love to discuss them in detail and
>> propose improvements to the wider community.
>>
>> Thanks!
>>
>>
>> On Wed, Oct 5, 2016 at 6:16 PM, Thomas Weise <th...@apache.org> wrote:
>>>
>>> How about sending just the notifications for creating new JIRA and opening
>>> PR to dev@ so that those that are interested can start watching?
>>>
>>> Thanks,
>>> Thomas
>>>
>>> On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin <dh...@google.com.invalid>
>>> wrote:
>>>
>>>> On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <dk...@apache.org> wrote:
>>>>
>>>>> I just want to give a little more context to this\u2026.  I\u2019ve been
>> lurking on
>>>>> this list for several months now reading everything that\u2019s going on.
>>>> From
>>>>> Apache\u2019s standpoint, that should be a \u201cvery good start\u201d for getting to
>>>> know
>>>>> what is happening in a project.
>>>>>
>>>>> On my last PR, Eugene commented about using the AutoValue pattern for
>>>> part
>>>>> of it which caught me off guard.   None of the other IO\u2019s in master
>> were
>>>>> using it, there wasn\u2019t any discussion on this list about it, I had no
>>>> idea
>>>>> what it was about.   So I asked JB to make sure I hadn\u2019t missed
>> anything.
>>>>>
>>>>
>>>>> Anyway, this is one of the main concerns I have with Beam\u2019s PR work
>> flow,
>>>>> I feel I\u2019m missing things as there is significant amount of things not
>>>>> happening on a list.   The initial pull request is going to the
>> commits
>>>>> list (ok, would prefer the dev list, but at least its on a list).
>>>> However,
>>>>> none of the comments or discussions or anything that is occurring as
>> part
>>>>> of the review is making it to any list.   The only people that \u201clearn\u201d
>>>> from
>>>>> the reviews are the reviewers and the person who initiated the PR
>> unless
>>>>> they go into each and every PR and read the comments (and find the
>> news
>>>>> ones and such).    With my Apache hat on, this bothers me.
>>>>
>>>>
>>>> Anyway, I don\u2019t really understand why the full github integration wasn\u2019t
>>>>> setup for the beam PR\u2019s so that the comments would come back to the
>> lists
>>>>> as well (and JIRA, BTW).
>>>>>
>>>>
>>>> This part confuses me. We've been told that discussions on JIRA, even
>>>> though they are emailed to the mailing lists, don't count as happening
>> on
>>>> the mailing list. So why would github integration be helpful vs just
>> more
>>>> spam?
>>>>
>>>> As another example, the comments on PR1003 are very applicable to anyone
>>>>> looking into writing IO\u2019s and they could learn about some of the \u201cbest
>>>>> practices\u201d presented there.
>>>>
>>>>
>>>> As Beam grows during its incubation, we are moving a lot of knowledge to
>>>> documentation, but yes -- right now, most of the I/O related practices
>> live
>>>> in Eugene's and my head (and now, JB's!). We're working on it, and hope
>> to
>>>> dramatically improve documentation for source authors in the next
>> quarter.
>>>>
>>>> For AutoValue specifically, this is by no means codified and it is
>>>> DEFINITELY not mandatory. Eugene and JB just experimented with it in the
>>>> last few days and decided it was useful in a few cases. We do (or did,
>>>> before this thread) need to have an actual discussion on the mailing
>> list
>>>> before moving forward further towards making it policy.
>>>>
>>>> Right now Ben Chambers is trying to apply AutoValue in places that need
>>>> templated types and struggling with multiple ?s, so the discussion may
>> need
>>>> to continue! ...
>>>>
>>>> Thanks,
>>>> Dan
>>>>
>>>> That\u2019s basically the background as to why JB sent this.  I was confused
>> and
>>>>> bugged him.   :-)
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>>> On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>>>> wrote:
>>>>>>
>>>>>> Hi team,
>>>>>>
>>>>>> I would like to excuse myself to have forgotten to discuss and share
>>>>> with you a technical point and generally speaking do a small reminder.
>>>>>>
>>>>>> When we work with Eugene on the JdbcIO, we experimented AutoValue to
>>>>> deal with IO configuration. AutoValue provides a nice way to reduce
>> and
>>>>> limit the boilerplate code required by the IO configuration.
>>>>>> We used AutoValue in JdbcIO and, regarding the good improvements we
>>>> saw,
>>>>> we started to refactor the other IOs.
>>>>>>
>>>>>> The use of AutoValue should have been notice and discussed on the
>>>>> mailing list.
>>>>>>
>>>>>> "If it doesn't exist on the mailing list, it doesn't exist at all."
>>>>>>
>>>>>> So, any comment happening on a GitHub pull request, or discussion on
>>>>> hangouts which can impact the project (generally speaking) has to
>> happen
>>>> on
>>>>> the mailing list.
>>>>>>
>>>>>> It provides project transparency and facilitates the new
>> contribution
>>>>> onboarding.
>>>>>>
>>>>>> Thanks !
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>
>>>>> --
>>>>> Daniel Kulp
>>>>> dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <
>>>>> http://dankulp.com/blog>
>>>>> Talend Community Coder - http://coders.talend.com <
>>>>> http://coders.talend.com/>
>>>>>
>>>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [REMINDER] Technical discussion on the mailing list

Posted by Frances Perry <fj...@google.com.INVALID>.

>
> At the end of the day, it comes down to two questions:
>
> 1) Are there technical and project direction discussions happening off
> list and not reflected back to the list?
>
> 2) If yes, are the concrete decisions being made as a result of the off
> list discussions?


> From an Apache standpoint, we have to get the answer to BOTH questions to
> a “no” state.   That’s a requirement.


I don't think these are binary "yes/no" questions. There's a good degree of
subtlety here given the complexity of the project and the scope of each
individual discussion.

On any given code review, there are absolutely technical discussions
happening and concrete decisions being made -- that’s the point of code
review ;-)

However, the vast majority of these are low-level and don't impact the
project direction or reach beyond a single component. We should absolutely
surface things of interest to the community on the dev@ list, but not every
tiny comment, given the scale and scope of the project. It’s just a matter
of putting the visibility bar in the right place.

All discussions are happening publicly in standard Apache tooling, so the
question in my mind is really how do we ensure that each person in the
community can sift through all this information to follow the relevant set
of things? For example, folks on the Python SDK will quickly be drowned out
by details on Java-specific implementation questions, but should absolutely
be discussing changes to core concepts.

Part of the reason there is so many detailed discussions in Beam is that we
choose (for community growth and project quality) to have a
review-then-commit workflow. I would imagine this causes additional chatter
over projects that do the opposite, where many of these small suggestions
might never be surfaced. Said differently, while I totally agree that all
significant technical and project discussions should be on the dev@ list,
we should not blindly apply a requirement (discussion on dev@) to a level
of detail that is not required in the first place (code reviews).

My preference would be to keep dev@ for the human-initiated, larger scoped
discussions and as a community be accountable to each other for identifying
and communicating those. There are definitely things in those >10,000
individual PR comments that could have been handled better -- so let’s
proactively surface them, like this thread does. And if there are other
ways to help add structure to the visibility in the community, let’s
discuss. (I for one would love better ways to keep up with everything!)

Frances

Re: [REMINDER] Technical discussion on the mailing list

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.

Hi Daniel,

Thanks for raising this. I think I was a major contributor to your
frustration with the process by suggesting big changes to your IO PR.

As others have said, ideally that process should have gone differently: 1)
we should have had documentation on the best practices in developing IOs,
and 2) I should not have requested you to do the AutoValue conversion in
your PR because that (using AutoValue for PTransform builders) was a new
idea at the time and it was unreasonable to put the burden of implementing
it on you.

I'm very much in favor of improving that process, e.g. by looking at
recently submitted IOs and putting the essence of discussion on those PRs
together into a best-practices document (this document, when it appears,
should *absolutely* be discussed on the mailing list!); and I should have
at least mentioned the idea "consider using it for PTransform builders too"
on the mailing list, because even though we already use AutoValue in many
places, we didn't use it for PTransforms, so that is a new technical idea,
even if not a new concept. This was an example of a technical discussion
that happened off list, and thanks to JB for reflecting it back onto the
list. Indeed, it takes some habit-building time to learn to do this
consistently.

***

Now, for SplittableDoFn in particular - I just looked through all the 171
comments on the PR [https://github.com/apache/incubator-beam/pull/896] and
I honestly don't think there's anything there that merited a broad
discussion on the list - the API is nearly exactly the same as was proposed
on the mailing list, and the comments are about style nitpicks or
implementation details relevant only to people actively working on the
involved classes. Weekly updates, if I provided them, would have been of
the form "Still addressing comments / refactoring $x / extracting part $y
into a separate PR / adding tests". The large number of comments is because
this is a large PR and there's a lot of gritty details there to get right,
but none of them are high-level - the high-level design has been finalized
already.

The PR is taking so long also because I've spun off several smaller PRs
making focused changes to individual subsystems (e.g. DoFnReflector), which
I think also do not merit being discussed on the mailing list, because they
don't change the technical direction of the project, these are deep
implementation-detail classes, there's only 2-3 people involved in
developing or using them (including myself), and all these people are
listed as reviewers.

Please let me know if you feel differently - as the person driving SDF, I
really would like to be sure that the community is satisfied with the
communication around it.

On Thu, Oct 6, 2016 at 3:05 AM Daniel Kulp <dk...@apache.org> wrote:

> At the end of the day, it comes down to two questions:
>
> 1) Are there technical and project direction discussions happening off
> list and not reflected back to the list?
>
> 2) If yes, are the concrete decisions being made as a result of the off
> list discussions?
>
>
> The answer to #1 is a definite yes.   From a quick perusal, several of the
> pull requests that have more than 10-15 comments is a discussion that
> should be back here.  Look at the Metrics PR, the splittable DoFn pr, etc…
> There are discussions there that NEED to be coming back to this list one
> way or another.   For things like the Splittable DoFn that is a long
> running discussion, it would most likely need periodic summaries/updates.
> Trying to follow everything going on in that PR is impossible with the
> comments on the outdated commits and such.
>
> Because of the scale of the #1 problem, I’m unsure of the answer to #2,
> but my gut feeling is yes.  If the off list discussions are resulting in
> technical changes that people don’t know about and cannot object to or
> comment on, then there is a problem.
>
> From an Apache standpoint, we have to get the answer to BOTH questions to
> a “no” state.   That’s a requirement.   The question is how do we get there?
>
> Dan
>
>
>
> > On Oct 6, 2016, at 8:29 AM, Davor Bonaci <da...@google.com.INVALID>
> wrote:
> >
> > Daniel, so glad you are starting to contribute to Beam! It was great
> > talking with you in person back in May. Welcome!
> >
> > --
> >
> > There are lots of different things mentioned here; I'll try to address
> them
> > separately.
> >
> > The first use of AutoValue should have been discussed on the dev@
> mailing
> > list. I think the main reason for the discussion is a bit different --
> > AutoValue has a non-trivial tradeoff -- compile complexity vs.
> boilerplate
> > code. For example, AutoValue may degrade IDE experience for some
> > contributors. If we'd go in depth on this, I'm sure we'd find opposing
> > opinions on the use of AutoValue. This tradeoff should have been
> discussed
> > on the dev@ list, followed by a community decision.
> >
> > Note that this has happened *before* the JdbcIO work. Since AutoValue has
> > been already used elsewhere in the project, there was no real reason not
> to
> > use it in JdbcIO, as appropriate. Therefore, I think JB and Eugene did
> > everything right! Second, third or thousandth usage of a concept doesn't
> > require any particular discussion. They didn't make anything worse. Their
> > discussion is totally appropriate for code review.
> >
> > Now, as Daniel points out, I think it is not right to ask a contributor
> to
> > change his PR to use AutoValue when none of the existing IO connectors
> use
> > it. This is making a too high standard. In fact, it is desirable for new
> > contributions to follow already established patterns, instead of
> inventing
> > something new. If we want to change a pattern, we should do it as a
> > separate effort across the board.
> >
> > On the other hand, dev@ discussion wouldn't have helped to prevent
> review
> > comments/discussions. Let's say we have had the discussion, and a new
> > contributor comes a year later. Should we ask her to read all discussions
> > that ever happened in the project to learn everything she might need? Of
> > course not! She should follow already established patterns and learn any
> > specifics during code review. And then, best practices should be
> documented
> > on the website.
> >
> > To summarize, a few things could have been better:
> > * Discussion of the first use of AutoValue on dev@.
> > * Avoiding overzealous core review comments.
> > * Changing a pattern should have been done by filing several starter
> tasks
> > in JIRA.
> >
> > --
> >
> > There are also several different proposals for altering a part of the
> > workflow.
> >
> >> code review comments not making to a list
> >
> > We have >1000 PRs so far, with at least a dozen comments on average, with
> > pace increasing. This is >10,000 emails, most of which are "fix a typo".
> > This leads into information overload, with actual information being
> missed.
> >
> > If someone wants this extra information -- just clicking the Watch button
> > in the GitHub UI will make it happen!
> >
> >> creating new JIRA and opening PR to dev@
> >
> > These currently go to commits@. This would have resulted in another
> 1,700
> > email threads compared to <150 now.
> >
> > Generally speaking, *all* of this is already available to anyone who
> wants
> > to receive it. However, anyone I know that has tried, has given up very
> > quickly ;). If anybody is concerned, we can create several new lists for
> > this traffic -- but we shouldn't repurpose dev@ for it.
> >
> >> I feel I’m missing things as there is significant amount of things not
> > happening on a list
> >
> > I think "feeling of missing things" is totally valid. I feel that too, as
> > well as almost everybody else.
> >
> > My best answer is -- we should realize that we are an extremely large and
> > complex project, with >100 contributors and >20 people working on it full
> > time. Nobody can follow every SDK, every runner, every IO connector,
> every
> > pull request, every comment that all contributors make each and every
> day.
> >
> > While nobody can follow everything, everything is being followed by
> > multiple people. And, we need to be accountable to each other to surface
> > everything relevant to the dev@ list. And I believe that is already
> > happening the vast majority of time. This is just one example where it
> > didn't happen.
> >
> > --
> >
> > All that said, there are certainly areas for improvement. If anyone has
> > specific ideas, please reach out! I'd love to discuss them in detail and
> > propose improvements to the wider community.
> >
> > Thanks!
> >
> >
> > On Wed, Oct 5, 2016 at 6:16 PM, Thomas Weise <th...@apache.org> wrote:
> >>
> >> How about sending just the notifications for creating new JIRA and
> opening
> >> PR to dev@ so that those that are interested can start watching?
> >>
> >> Thanks,
> >> Thomas
> >>
> >> On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin
> <dh...@google.com.invalid>
> >> wrote:
> >>
> >>> On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <dk...@apache.org> wrote:
> >>>
> >>>> I just want to give a little more context to this….  I’ve been
> > lurking on
> >>>> this list for several months now reading everything that’s going on.
> >>> From
> >>>> Apache’s standpoint, that should be a “very good start” for getting to
> >>> know
> >>>> what is happening in a project.
> >>>>
> >>>> On my last PR, Eugene commented about using the AutoValue pattern for
> >>> part
> >>>> of it which caught me off guard.   None of the other IO’s in master
> > were
> >>>> using it, there wasn’t any discussion on this list about it, I had no
> >>> idea
> >>>> what it was about.   So I asked JB to make sure I hadn’t missed
> > anything.
> >>>>
> >>>
> >>>> Anyway, this is one of the main concerns I have with Beam’s PR work
> > flow,
> >>>> I feel I’m missing things as there is significant amount of things not
> >>>> happening on a list.   The initial pull request is going to the
> > commits
> >>>> list (ok, would prefer the dev list, but at least its on a list).
> >>> However,
> >>>> none of the comments or discussions or anything that is occurring as
> > part
> >>>> of the review is making it to any list.   The only people that “learn”
> >>> from
> >>>> the reviews are the reviewers and the person who initiated the PR
> > unless
> >>>> they go into each and every PR and read the comments (and find the
> > news
> >>>> ones and such).    With my Apache hat on, this bothers me.
> >>>
> >>>
> >>> Anyway, I don’t really understand why the full github integration
> wasn’t
> >>>> setup for the beam PR’s so that the comments would come back to the
> > lists
> >>>> as well (and JIRA, BTW).
> >>>>
> >>>
> >>> This part confuses me. We've been told that discussions on JIRA, even
> >>> though they are emailed to the mailing lists, don't count as happening
> > on
> >>> the mailing list. So why would github integration be helpful vs just
> > more
> >>> spam?
> >>>
> >>> As another example, the comments on PR1003 are very applicable to
> anyone
> >>>> looking into writing IO’s and they could learn about some of the “best
> >>>> practices” presented there.
> >>>
> >>>
> >>> As Beam grows during its incubation, we are moving a lot of knowledge
> to
> >>> documentation, but yes -- right now, most of the I/O related practices
> > live
> >>> in Eugene's and my head (and now, JB's!). We're working on it, and hope
> > to
> >>> dramatically improve documentation for source authors in the next
> > quarter.
> >>>
> >>> For AutoValue specifically, this is by no means codified and it is
> >>> DEFINITELY not mandatory. Eugene and JB just experimented with it in
> the
> >>> last few days and decided it was useful in a few cases. We do (or did,
> >>> before this thread) need to have an actual discussion on the mailing
> > list
> >>> before moving forward further towards making it policy.
> >>>
> >>> Right now Ben Chambers is trying to apply AutoValue in places that need
> >>> templated types and struggling with multiple ?s, so the discussion may
> > need
> >>> to continue! ...
> >>>
> >>> Thanks,
> >>> Dan
> >>>
> >>> That’s basically the background as to why JB sent this.  I was confused
> > and
> >>>> bugged him.   :-)
> >>>>
> >>>> Dan
> >>>>
> >>>>
> >>>>
> >>>>> On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> >>>> wrote:
> >>>>>
> >>>>> Hi team,
> >>>>>
> >>>>> I would like to excuse myself to have forgotten to discuss and share
> >>>> with you a technical point and generally speaking do a small reminder.
> >>>>>
> >>>>> When we work with Eugene on the JdbcIO, we experimented AutoValue to
> >>>> deal with IO configuration. AutoValue provides a nice way to reduce
> > and
> >>>> limit the boilerplate code required by the IO configuration.
> >>>>> We used AutoValue in JdbcIO and, regarding the good improvements we
> >>> saw,
> >>>> we started to refactor the other IOs.
> >>>>>
> >>>>> The use of AutoValue should have been notice and discussed on the
> >>>> mailing list.
> >>>>>
> >>>>> "If it doesn't exist on the mailing list, it doesn't exist at all."
> >>>>>
> >>>>> So, any comment happening on a GitHub pull request, or discussion on
> >>>> hangouts which can impact the project (generally speaking) has to
> > happen
> >>> on
> >>>> the mailing list.
> >>>>>
> >>>>> It provides project transparency and facilitates the new
> > contribution
> >>>> onboarding.
> >>>>>
> >>>>> Thanks !
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>
> >>>> --
> >>>> Daniel Kulp
> >>>> dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog
> <
> >>>> http://dankulp.com/blog>
> >>>> Talend Community Coder - http://coders.talend.com <
> >>>> http://coders.talend.com/>
> >>>>
> >>>
>
> --
> Daniel Kulp
> dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <
> http://dankulp.com/blog>
> Talend Community Coder - http://coders.talend.com <
> http://coders.talend.com/>
>

Re: [REMINDER] Technical discussion on the mailing list

Posted by Daniel Kulp <dk...@apache.org>.

At the end of the day, it comes down to two questions:

1) Are there technical and project direction discussions happening off list and not reflected back to the list?

2) If yes, are the concrete decisions being made as a result of the off list discussions?


The answer to #1 is a definite yes.   From a quick perusal, several of the pull requests that have more than 10-15 comments is a discussion that should be back here.  Look at the Metrics PR, the splittable DoFn pr, etc… There are discussions there that NEED to be coming back to this list one way or another.   For things like the Splittable DoFn that is a long running discussion, it would most likely need periodic summaries/updates.  Trying to follow everything going on in that PR is impossible with the comments on the outdated commits and such.  

Because of the scale of the #1 problem, I’m unsure of the answer to #2, but my gut feeling is yes.  If the off list discussions are resulting in technical changes that people don’t know about and cannot object to or comment on, then there is a problem.  

From an Apache standpoint, we have to get the answer to BOTH questions to a “no” state.   That’s a requirement.   The question is how do we get there?

Dan



> On Oct 6, 2016, at 8:29 AM, Davor Bonaci <da...@google.com.INVALID> wrote:
> 
> Daniel, so glad you are starting to contribute to Beam! It was great
> talking with you in person back in May. Welcome!
> 
> --
> 
> There are lots of different things mentioned here; I'll try to address them
> separately.
> 
> The first use of AutoValue should have been discussed on the dev@ mailing
> list. I think the main reason for the discussion is a bit different --
> AutoValue has a non-trivial tradeoff -- compile complexity vs. boilerplate
> code. For example, AutoValue may degrade IDE experience for some
> contributors. If we'd go in depth on this, I'm sure we'd find opposing
> opinions on the use of AutoValue. This tradeoff should have been discussed
> on the dev@ list, followed by a community decision.
> 
> Note that this has happened *before* the JdbcIO work. Since AutoValue has
> been already used elsewhere in the project, there was no real reason not to
> use it in JdbcIO, as appropriate. Therefore, I think JB and Eugene did
> everything right! Second, third or thousandth usage of a concept doesn't
> require any particular discussion. They didn't make anything worse. Their
> discussion is totally appropriate for code review.
> 
> Now, as Daniel points out, I think it is not right to ask a contributor to
> change his PR to use AutoValue when none of the existing IO connectors use
> it. This is making a too high standard. In fact, it is desirable for new
> contributions to follow already established patterns, instead of inventing
> something new. If we want to change a pattern, we should do it as a
> separate effort across the board.
> 
> On the other hand, dev@ discussion wouldn't have helped to prevent review
> comments/discussions. Let's say we have had the discussion, and a new
> contributor comes a year later. Should we ask her to read all discussions
> that ever happened in the project to learn everything she might need? Of
> course not! She should follow already established patterns and learn any
> specifics during code review. And then, best practices should be documented
> on the website.
> 
> To summarize, a few things could have been better:
> * Discussion of the first use of AutoValue on dev@.
> * Avoiding overzealous core review comments.
> * Changing a pattern should have been done by filing several starter tasks
> in JIRA.
> 
> --
> 
> There are also several different proposals for altering a part of the
> workflow.
> 
>> code review comments not making to a list
> 
> We have >1000 PRs so far, with at least a dozen comments on average, with
> pace increasing. This is >10,000 emails, most of which are "fix a typo".
> This leads into information overload, with actual information being missed.
> 
> If someone wants this extra information -- just clicking the Watch button
> in the GitHub UI will make it happen!
> 
>> creating new JIRA and opening PR to dev@
> 
> These currently go to commits@. This would have resulted in another 1,700
> email threads compared to <150 now.
> 
> Generally speaking, *all* of this is already available to anyone who wants
> to receive it. However, anyone I know that has tried, has given up very
> quickly ;). If anybody is concerned, we can create several new lists for
> this traffic -- but we shouldn't repurpose dev@ for it.
> 
>> I feel I’m missing things as there is significant amount of things not
> happening on a list
> 
> I think "feeling of missing things" is totally valid. I feel that too, as
> well as almost everybody else.
> 
> My best answer is -- we should realize that we are an extremely large and
> complex project, with >100 contributors and >20 people working on it full
> time. Nobody can follow every SDK, every runner, every IO connector, every
> pull request, every comment that all contributors make each and every day.
> 
> While nobody can follow everything, everything is being followed by
> multiple people. And, we need to be accountable to each other to surface
> everything relevant to the dev@ list. And I believe that is already
> happening the vast majority of time. This is just one example where it
> didn't happen.
> 
> --
> 
> All that said, there are certainly areas for improvement. If anyone has
> specific ideas, please reach out! I'd love to discuss them in detail and
> propose improvements to the wider community.
> 
> Thanks!
> 
> 
> On Wed, Oct 5, 2016 at 6:16 PM, Thomas Weise <th...@apache.org> wrote:
>> 
>> How about sending just the notifications for creating new JIRA and opening
>> PR to dev@ so that those that are interested can start watching?
>> 
>> Thanks,
>> Thomas
>> 
>> On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin <dh...@google.com.invalid>
>> wrote:
>> 
>>> On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <dk...@apache.org> wrote:
>>> 
>>>> I just want to give a little more context to this….  I’ve been
> lurking on
>>>> this list for several months now reading everything that’s going on.
>>> From
>>>> Apache’s standpoint, that should be a “very good start” for getting to
>>> know
>>>> what is happening in a project.
>>>> 
>>>> On my last PR, Eugene commented about using the AutoValue pattern for
>>> part
>>>> of it which caught me off guard.   None of the other IO’s in master
> were
>>>> using it, there wasn’t any discussion on this list about it, I had no
>>> idea
>>>> what it was about.   So I asked JB to make sure I hadn’t missed
> anything.
>>>> 
>>> 
>>>> Anyway, this is one of the main concerns I have with Beam’s PR work
> flow,
>>>> I feel I’m missing things as there is significant amount of things not
>>>> happening on a list.   The initial pull request is going to the
> commits
>>>> list (ok, would prefer the dev list, but at least its on a list).
>>> However,
>>>> none of the comments or discussions or anything that is occurring as
> part
>>>> of the review is making it to any list.   The only people that “learn”
>>> from
>>>> the reviews are the reviewers and the person who initiated the PR
> unless
>>>> they go into each and every PR and read the comments (and find the
> news
>>>> ones and such).    With my Apache hat on, this bothers me.
>>> 
>>> 
>>> Anyway, I don’t really understand why the full github integration wasn’t
>>>> setup for the beam PR’s so that the comments would come back to the
> lists
>>>> as well (and JIRA, BTW).
>>>> 
>>> 
>>> This part confuses me. We've been told that discussions on JIRA, even
>>> though they are emailed to the mailing lists, don't count as happening
> on
>>> the mailing list. So why would github integration be helpful vs just
> more
>>> spam?
>>> 
>>> As another example, the comments on PR1003 are very applicable to anyone
>>>> looking into writing IO’s and they could learn about some of the “best
>>>> practices” presented there.
>>> 
>>> 
>>> As Beam grows during its incubation, we are moving a lot of knowledge to
>>> documentation, but yes -- right now, most of the I/O related practices
> live
>>> in Eugene's and my head (and now, JB's!). We're working on it, and hope
> to
>>> dramatically improve documentation for source authors in the next
> quarter.
>>> 
>>> For AutoValue specifically, this is by no means codified and it is
>>> DEFINITELY not mandatory. Eugene and JB just experimented with it in the
>>> last few days and decided it was useful in a few cases. We do (or did,
>>> before this thread) need to have an actual discussion on the mailing
> list
>>> before moving forward further towards making it policy.
>>> 
>>> Right now Ben Chambers is trying to apply AutoValue in places that need
>>> templated types and struggling with multiple ?s, so the discussion may
> need
>>> to continue! ...
>>> 
>>> Thanks,
>>> Dan
>>> 
>>> That’s basically the background as to why JB sent this.  I was confused
> and
>>>> bugged him.   :-)
>>>> 
>>>> Dan
>>>> 
>>>> 
>>>> 
>>>>> On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>>>> wrote:
>>>>> 
>>>>> Hi team,
>>>>> 
>>>>> I would like to excuse myself to have forgotten to discuss and share
>>>> with you a technical point and generally speaking do a small reminder.
>>>>> 
>>>>> When we work with Eugene on the JdbcIO, we experimented AutoValue to
>>>> deal with IO configuration. AutoValue provides a nice way to reduce
> and
>>>> limit the boilerplate code required by the IO configuration.
>>>>> We used AutoValue in JdbcIO and, regarding the good improvements we
>>> saw,
>>>> we started to refactor the other IOs.
>>>>> 
>>>>> The use of AutoValue should have been notice and discussed on the
>>>> mailing list.
>>>>> 
>>>>> "If it doesn't exist on the mailing list, it doesn't exist at all."
>>>>> 
>>>>> So, any comment happening on a GitHub pull request, or discussion on
>>>> hangouts which can impact the project (generally speaking) has to
> happen
>>> on
>>>> the mailing list.
>>>>> 
>>>>> It provides project transparency and facilitates the new
> contribution
>>>> onboarding.
>>>>> 
>>>>> Thanks !
>>>>> 
>>>>> Regards
>>>>> JB
>>>>> 
>>>> 
>>>> --
>>>> Daniel Kulp
>>>> dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <
>>>> http://dankulp.com/blog>
>>>> Talend Community Coder - http://coders.talend.com <
>>>> http://coders.talend.com/>
>>>> 
>>> 

-- 
Daniel Kulp
dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <http://dankulp.com/blog>
Talend Community Coder - http://coders.talend.com <http://coders.talend.com/>

Re: [REMINDER] Technical discussion on the mailing list

Posted by Davor Bonaci <da...@google.com.INVALID>.

Daniel, so glad you are starting to contribute to Beam! It was great
talking with you in person back in May. Welcome!

--

There are lots of different things mentioned here; I'll try to address them
separately.

The first use of AutoValue should have been discussed on the dev@ mailing
list. I think the main reason for the discussion is a bit different --
AutoValue has a non-trivial tradeoff -- compile complexity vs. boilerplate
code. For example, AutoValue may degrade IDE experience for some
contributors. If we'd go in depth on this, I'm sure we'd find opposing
opinions on the use of AutoValue. This tradeoff should have been discussed
on the dev@ list, followed by a community decision.

Note that this has happened *before* the JdbcIO work. Since AutoValue has
been already used elsewhere in the project, there was no real reason not to
use it in JdbcIO, as appropriate. Therefore, I think JB and Eugene did
everything right! Second, third or thousandth usage of a concept doesn't
require any particular discussion. They didn't make anything worse. Their
discussion is totally appropriate for code review.

Now, as Daniel points out, I think it is not right to ask a contributor to
change his PR to use AutoValue when none of the existing IO connectors use
it. This is making a too high standard. In fact, it is desirable for new
contributions to follow already established patterns, instead of inventing
something new. If we want to change a pattern, we should do it as a
separate effort across the board.

On the other hand, dev@ discussion wouldn't have helped to prevent review
comments/discussions. Let's say we have had the discussion, and a new
contributor comes a year later. Should we ask her to read all discussions
that ever happened in the project to learn everything she might need? Of
course not! She should follow already established patterns and learn any
specifics during code review. And then, best practices should be documented
on the website.

To summarize, a few things could have been better:
* Discussion of the first use of AutoValue on dev@.
* Avoiding overzealous core review comments.
* Changing a pattern should have been done by filing several starter tasks
in JIRA.

--

There are also several different proposals for altering a part of the
workflow.

> code review comments not making to a list

We have >1000 PRs so far, with at least a dozen comments on average, with
pace increasing. This is >10,000 emails, most of which are "fix a typo".
This leads into information overload, with actual information being missed.

If someone wants this extra information -- just clicking the Watch button
in the GitHub UI will make it happen!

> creating new JIRA and opening PR to dev@

These currently go to commits@. This would have resulted in another 1,700
email threads compared to <150 now.

Generally speaking, *all* of this is already available to anyone who wants
to receive it. However, anyone I know that has tried, has given up very
quickly ;). If anybody is concerned, we can create several new lists for
this traffic -- but we shouldn't repurpose dev@ for it.

> I feel I’m missing things as there is significant amount of things not
happening on a list

I think "feeling of missing things" is totally valid. I feel that too, as
well as almost everybody else.

My best answer is -- we should realize that we are an extremely large and
complex project, with >100 contributors and >20 people working on it full
time. Nobody can follow every SDK, every runner, every IO connector, every
pull request, every comment that all contributors make each and every day.

While nobody can follow everything, everything is being followed by
multiple people. And, we need to be accountable to each other to surface
everything relevant to the dev@ list. And I believe that is already
happening the vast majority of time. This is just one example where it
didn't happen.

--

All that said, there are certainly areas for improvement. If anyone has
specific ideas, please reach out! I'd love to discuss them in detail and
propose improvements to the wider community.

Thanks!

On Wed, Oct 5, 2016 at 6:16 PM, Thomas Weise <th...@apache.org> wrote:
>
> How about sending just the notifications for creating new JIRA and opening
> PR to dev@ so that those that are interested can start watching?
>
> Thanks,
> Thomas
>
> On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin <dh...@google.com.invalid>
> wrote:
>
> > On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <dk...@apache.org> wrote:
> >
> > > I just want to give a little more context to this….  I’ve been
lurking on
> > > this list for several months now reading everything that’s going on.
> >  From
> > > Apache’s standpoint, that should be a “very good start” for getting to
> > know
> > > what is happening in a project.
> > >
> > > On my last PR, Eugene commented about using the AutoValue pattern for
> > part
> > > of it which caught me off guard.   None of the other IO’s in master
were
> > > using it, there wasn’t any discussion on this list about it, I had no
> > idea
> > > what it was about.   So I asked JB to make sure I hadn’t missed
anything.
> > >
> >
> > > Anyway, this is one of the main concerns I have with Beam’s PR work
flow,
> > > I feel I’m missing things as there is significant amount of things not
> > > happening on a list.   The initial pull request is going to the
commits
> > > list (ok, would prefer the dev list, but at least its on a list).
> > However,
> > > none of the comments or discussions or anything that is occurring as
part
> > > of the review is making it to any list.   The only people that “learn”
> > from
> > > the reviews are the reviewers and the person who initiated the PR
unless
> > > they go into each and every PR and read the comments (and find the
news
> > > ones and such).    With my Apache hat on, this bothers me.
> >
> >
> > Anyway, I don’t really understand why the full github integration wasn’t
> > > setup for the beam PR’s so that the comments would come back to the
lists
> > > as well (and JIRA, BTW).
> > >
> >
> > This part confuses me. We've been told that discussions on JIRA, even
> > though they are emailed to the mailing lists, don't count as happening
on
> > the mailing list. So why would github integration be helpful vs just
more
> > spam?
> >
> > As another example, the comments on PR1003 are very applicable to anyone
> > > looking into writing IO’s and they could learn about some of the “best
> > > practices” presented there.
> >
> >
> > As Beam grows during its incubation, we are moving a lot of knowledge to
> > documentation, but yes -- right now, most of the I/O related practices
live
> > in Eugene's and my head (and now, JB's!). We're working on it, and hope
to
> > dramatically improve documentation for source authors in the next
quarter.
> >
> > For AutoValue specifically, this is by no means codified and it is
> > DEFINITELY not mandatory. Eugene and JB just experimented with it in the
> > last few days and decided it was useful in a few cases. We do (or did,
> > before this thread) need to have an actual discussion on the mailing
list
> > before moving forward further towards making it policy.
> >
> > Right now Ben Chambers is trying to apply AutoValue in places that need
> > templated types and struggling with multiple ?s, so the discussion may
need
> > to continue! ...
> >
> > Thanks,
> > Dan
> >
> > That’s basically the background as to why JB sent this.  I was confused
and
> > > bugged him.   :-)
> > >
> > > Dan
> > >
> > >
> > >
> > > > On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > > wrote:
> > > >
> > > > Hi team,
> > > >
> > > > I would like to excuse myself to have forgotten to discuss and share
> > > with you a technical point and generally speaking do a small reminder.
> > > >
> > > > When we work with Eugene on the JdbcIO, we experimented AutoValue to
> > > deal with IO configuration. AutoValue provides a nice way to reduce
and
> > > limit the boilerplate code required by the IO configuration.
> > > > We used AutoValue in JdbcIO and, regarding the good improvements we
> > saw,
> > > we started to refactor the other IOs.
> > > >
> > > > The use of AutoValue should have been notice and discussed on the
> > > mailing list.
> > > >
> > > > "If it doesn't exist on the mailing list, it doesn't exist at all."
> > > >
> > > > So, any comment happening on a GitHub pull request, or discussion on
> > > hangouts which can impact the project (generally speaking) has to
happen
> > on
> > > the mailing list.
> > > >
> > > > It provides project transparency and facilitates the new
contribution
> > > onboarding.
> > > >
> > > > Thanks !
> > > >
> > > > Regards
> > > > JB
> > > >
> > >
> > > --
> > > Daniel Kulp
> > > dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <
> > > http://dankulp.com/blog>
> > > Talend Community Coder - http://coders.talend.com <
> > > http://coders.talend.com/>
> > >
> >

Re: [REMINDER] Technical discussion on the mailing list

Posted by Thomas Weise <th...@apache.org>.

How about sending just the notifications for creating new JIRA and opening
PR to dev@ so that those that are interested can start watching?

Thanks,
Thomas

On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin <dh...@google.com.invalid>
wrote:

> On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <dk...@apache.org> wrote:
>
> > I just want to give a little more context to this….  I’ve been lurking on
> > this list for several months now reading everything that’s going on.
>  From
> > Apache’s standpoint, that should be a “very good start” for getting to
> know
> > what is happening in a project.
> >
> > On my last PR, Eugene commented about using the AutoValue pattern for
> part
> > of it which caught me off guard.   None of the other IO’s in master were
> > using it, there wasn’t any discussion on this list about it, I had no
> idea
> > what it was about.   So I asked JB to make sure I hadn’t missed anything.
> >
>
> > Anyway, this is one of the main concerns I have with Beam’s PR work flow,
> > I feel I’m missing things as there is significant amount of things not
> > happening on a list.   The initial pull request is going to the commits
> > list (ok, would prefer the dev list, but at least its on a list).
> However,
> > none of the comments or discussions or anything that is occurring as part
> > of the review is making it to any list.   The only people that “learn”
> from
> > the reviews are the reviewers and the person who initiated the PR unless
> > they go into each and every PR and read the comments (and find the news
> > ones and such).    With my Apache hat on, this bothers me.
>
>
> Anyway, I don’t really understand why the full github integration wasn’t
> > setup for the beam PR’s so that the comments would come back to the lists
> > as well (and JIRA, BTW).
> >
>
> This part confuses me. We've been told that discussions on JIRA, even
> though they are emailed to the mailing lists, don't count as happening on
> the mailing list. So why would github integration be helpful vs just more
> spam?
>
> As another example, the comments on PR1003 are very applicable to anyone
> > looking into writing IO’s and they could learn about some of the “best
> > practices” presented there.
>
>
> As Beam grows during its incubation, we are moving a lot of knowledge to
> documentation, but yes -- right now, most of the I/O related practices live
> in Eugene's and my head (and now, JB's!). We're working on it, and hope to
> dramatically improve documentation for source authors in the next quarter.
>
> For AutoValue specifically, this is by no means codified and it is
> DEFINITELY not mandatory. Eugene and JB just experimented with it in the
> last few days and decided it was useful in a few cases. We do (or did,
> before this thread) need to have an actual discussion on the mailing list
> before moving forward further towards making it policy.
>
> Right now Ben Chambers is trying to apply AutoValue in places that need
> templated types and struggling with multiple ?s, so the discussion may need
> to continue! ...
>
> Thanks,
> Dan
>
> That’s basically the background as to why JB sent this.  I was confused and
> > bugged him.   :-)
> >
> > Dan
> >
> >
> >
> > > On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> > >
> > > Hi team,
> > >
> > > I would like to excuse myself to have forgotten to discuss and share
> > with you a technical point and generally speaking do a small reminder.
> > >
> > > When we work with Eugene on the JdbcIO, we experimented AutoValue to
> > deal with IO configuration. AutoValue provides a nice way to reduce and
> > limit the boilerplate code required by the IO configuration.
> > > We used AutoValue in JdbcIO and, regarding the good improvements we
> saw,
> > we started to refactor the other IOs.
> > >
> > > The use of AutoValue should have been notice and discussed on the
> > mailing list.
> > >
> > > "If it doesn't exist on the mailing list, it doesn't exist at all."
> > >
> > > So, any comment happening on a GitHub pull request, or discussion on
> > hangouts which can impact the project (generally speaking) has to happen
> on
> > the mailing list.
> > >
> > > It provides project transparency and facilitates the new contribution
> > onboarding.
> > >
> > > Thanks !
> > >
> > > Regards
> > > JB
> > >
> >
> > --
> > Daniel Kulp
> > dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <
> > http://dankulp.com/blog>
> > Talend Community Coder - http://coders.talend.com <
> > http://coders.talend.com/>
> >
>

Re: [REMINDER] Technical discussion on the mailing list

Posted by Dan Halperin <dh...@google.com.INVALID>.

On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp <dk...@apache.org> wrote:

> I just want to give a little more context to this….  I’ve been lurking on
> this list for several months now reading everything that’s going on.   From
> Apache’s standpoint, that should be a “very good start” for getting to know
> what is happening in a project.
>
> On my last PR, Eugene commented about using the AutoValue pattern for part
> of it which caught me off guard.   None of the other IO’s in master were
> using it, there wasn’t any discussion on this list about it, I had no idea
> what it was about.   So I asked JB to make sure I hadn’t missed anything.
>

> Anyway, this is one of the main concerns I have with Beam’s PR work flow,
> I feel I’m missing things as there is significant amount of things not
> happening on a list.   The initial pull request is going to the commits
> list (ok, would prefer the dev list, but at least its on a list).  However,
> none of the comments or discussions or anything that is occurring as part
> of the review is making it to any list.   The only people that “learn” from
> the reviews are the reviewers and the person who initiated the PR unless
> they go into each and every PR and read the comments (and find the news
> ones and such).    With my Apache hat on, this bothers me.


Anyway, I don’t really understand why the full github integration wasn’t
> setup for the beam PR’s so that the comments would come back to the lists
> as well (and JIRA, BTW).
>

This part confuses me. We've been told that discussions on JIRA, even
though they are emailed to the mailing lists, don't count as happening on
the mailing list. So why would github integration be helpful vs just more
spam?

As another example, the comments on PR1003 are very applicable to anyone
> looking into writing IO’s and they could learn about some of the “best
> practices” presented there.


As Beam grows during its incubation, we are moving a lot of knowledge to
documentation, but yes -- right now, most of the I/O related practices live
in Eugene's and my head (and now, JB's!). We're working on it, and hope to
dramatically improve documentation for source authors in the next quarter.

For AutoValue specifically, this is by no means codified and it is
DEFINITELY not mandatory. Eugene and JB just experimented with it in the
last few days and decided it was useful in a few cases. We do (or did,
before this thread) need to have an actual discussion on the mailing list
before moving forward further towards making it policy.

Right now Ben Chambers is trying to apply AutoValue in places that need
templated types and struggling with multiple ?s, so the discussion may need
to continue! ...

Thanks,
Dan

That’s basically the background as to why JB sent this.  I was confused and
> bugged him.   :-)
>
> Dan
>
>
>
> > On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
> >
> > Hi team,
> >
> > I would like to excuse myself to have forgotten to discuss and share
> with you a technical point and generally speaking do a small reminder.
> >
> > When we work with Eugene on the JdbcIO, we experimented AutoValue to
> deal with IO configuration. AutoValue provides a nice way to reduce and
> limit the boilerplate code required by the IO configuration.
> > We used AutoValue in JdbcIO and, regarding the good improvements we saw,
> we started to refactor the other IOs.
> >
> > The use of AutoValue should have been notice and discussed on the
> mailing list.
> >
> > "If it doesn't exist on the mailing list, it doesn't exist at all."
> >
> > So, any comment happening on a GitHub pull request, or discussion on
> hangouts which can impact the project (generally speaking) has to happen on
> the mailing list.
> >
> > It provides project transparency and facilitates the new contribution
> onboarding.
> >
> > Thanks !
> >
> > Regards
> > JB
> >
>
> --
> Daniel Kulp
> dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <
> http://dankulp.com/blog>
> Talend Community Coder - http://coders.talend.com <
> http://coders.talend.com/>
>

Re: [REMINDER] Technical discussion on the mailing list

Posted by Daniel Kulp <dk...@apache.org>.

I just want to give a little more context to this….  I’ve been lurking on this list for several months now reading everything that’s going on.   From Apache’s standpoint, that should be a “very good start” for getting to know what is happening in a project.   

On my last PR, Eugene commented about using the AutoValue pattern for part of it which caught me off guard.   None of the other IO’s in master were using it, there wasn’t any discussion on this list about it, I had no idea what it was about.   So I asked JB to make sure I hadn’t missed anything.  

Anyway, this is one of the main concerns I have with Beam’s PR work flow, I feel I’m missing things as there is significant amount of things not happening on a list.   The initial pull request is going to the commits list (ok, would prefer the dev list, but at least its on a list).  However, none of the comments or discussions or anything that is occurring as part of the review is making it to any list.   The only people that “learn” from the reviews are the reviewers and the person who initiated the PR unless they go into each and every PR and read the comments (and find the news ones and such).    With my Apache hat on, this bothers me.    As another example, the comments on PR1003 are very applicable to anyone looking into writing IO’s and they could learn about some of the “best practices” presented there.      Anyway, I don’t really understand why the full github integration wasn’t setup for the beam PR’s so that the comments would come back to the lists as well (and JIRA, BTW).

That’s basically the background as to why JB sent this.  I was confused and bugged him.   :-)

Dan

> On Oct 5, 2016, at 1:51 PM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> 
> Hi team,
> 
> I would like to excuse myself to have forgotten to discuss and share with you a technical point and generally speaking do a small reminder.
> 
> When we work with Eugene on the JdbcIO, we experimented AutoValue to deal with IO configuration. AutoValue provides a nice way to reduce and limit the boilerplate code required by the IO configuration.
> We used AutoValue in JdbcIO and, regarding the good improvements we saw, we started to refactor the other IOs.
> 
> The use of AutoValue should have been notice and discussed on the mailing list.
> 
> "If it doesn't exist on the mailing list, it doesn't exist at all."
> 
> So, any comment happening on a GitHub pull request, or discussion on hangouts which can impact the project (generally speaking) has to happen on the mailing list.
> 
> It provides project transparency and facilitates the new contribution onboarding.
> 
> Thanks !
> 
> Regards
> JB
> 

-- 
Daniel Kulp
dkulp@apache.org <ma...@apache.org> - http://dankulp.com/blog <http://dankulp.com/blog>
Talend Community Coder - http://coders.talend.com <http://coders.talend.com/>

Re: [REMINDER] Technical discussion on the mailing list

Posted by Lukasz Cwik <lc...@google.com.INVALID>.

+1 for the notice
+1 for the usage of AutoValue

On Wed, Oct 5, 2016 at 4:51 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi team,
>
> I would like to excuse myself to have forgotten to discuss and share with
> you a technical point and generally speaking do a small reminder.
>
> When we work with Eugene on the JdbcIO, we experimented AutoValue to deal
> with IO configuration. AutoValue provides a nice way to reduce and limit
> the boilerplate code required by the IO configuration.
> We used AutoValue in JdbcIO and, regarding the good improvements we saw,
> we started to refactor the other IOs.
>
> The use of AutoValue should have been notice and discussed on the mailing
> list.
>
> "If it doesn't exist on the mailing list, it doesn't exist at all."
>
> So, any comment happening on a GitHub pull request, or discussion on
> hangouts which can impact the project (generally speaking) has to happen on
> the mailing list.
>
> It provides project transparency and facilitates the new contribution
> onboarding.
>
> Thanks !
>
> Regards
> JB
>
>

Re: [REMINDER] Technical discussion on the mailing list

Posted by Kenneth Knowles <kl...@google.com.INVALID>.

This is a great idea. And it produces many easy starter tickets! :-)

On Wed, Oct 5, 2016 at 4:51 AM Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:

> Hi team,
>
> I would like to excuse myself to have forgotten to discuss and share with
> you a technical point and generally speaking do a small reminder.
>
> When we work with Eugene on the JdbcIO, we experimented AutoValue to deal
> with IO configuration. AutoValue provides a nice way to reduce and limit
> the boilerplate code required by the IO configuration.
> We used AutoValue in JdbcIO and, regarding the good improvements we saw,
> we started to refactor the other IOs.
>
> The use of AutoValue should have been notice and discussed on the mailing
> list.
>
> "If it doesn't exist on the mailing list, it doesn't exist at all."
>
> So, any comment happening on a GitHub pull request, or discussion on
> hangouts which can impact the project (generally speaking) has to happen on
> the mailing list.
>
> It provides project transparency and facilitates the new contribution
> onboarding.
>
> Thanks !
>
> Regards
> JB
>
>

Re: [REMINDER] Technical discussion on the mailing list

Posted by Raghu Angadi <ra...@google.com.INVALID>.

+1 for AutoValue.

On Wed, Oct 5, 2016 at 4:51 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> So, any comment happening on a GitHub pull request, or discussion on
> hangouts which can impact the project (generally speaking) has to happen on
> the mailing list.
>
> It provides project transparency and facilitates the new contribution
> onboarding.
>

Would Jira discussion count towards this?