You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Sachin Goel <sa...@gmail.com> on 2015/12/05 17:00:36 UTC

Lack of review on PRs

Hi all
Sorry about a weekend email.

This email is to express my displeasure over the lack of any review on my
PRs on extending the ML library. Five of my PRs have been without any
review for times varying from 3-5 months now.
When I took up the task of extending the ML library by implementing core
algorithms such as Decision Tree and k-means clustering [with several
initialization schemes], I had hoped that the community will be actively
involved in it since ML is a very important component of any big data
system these days. However, it appears I have been wrong.
Surely, the initial reviews required a lot of changes from my side over
coding style mistakes [first time programmer in Scala], and
less-than-optimal implementations. I like to think that I have learned a
lot about maintaining better coding style compatible with Flink code base,
and spending time to optimize my work due to this.
However, if a PR requires work, that doesn't automatically disqualify it
from being reviewed actively, since the author has spent a lot of time on
it and has voluntarily taken up the task of contributing.

Machine learning is my core area of interest and I am able to contribute
much more to the library; however, a lack of review after repeated
reminders automatically discourages me from picking up more issues.

However minor some of my commits maybe, I have been actively involved in
the development work [with a total of 29 commits.]. I have also spent a lot
of time in release testing and diagnosing-slash-fixing lots of issues with
Web Dashboard. However, as with any contributor, my main goal is to
contribute to my area of interest, while also diversifying my work by
fixing other issues.

The PRs are 710, 757, 861, 918 and 1032. I propose the following order for
anyone who wants to review my work:
1032 [very simple feature.]
918 [very short PR]
861 [followed by 710 after a complete rebase] [major work for Histograms
and Decision Trees]
757 [major work for K-Means clustering and initialization schemes]

If I have come across as rude, I apologize.

Happy reviewing and thanks for bearing with me. :)

Cheers!
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

Re: Lack of review on PRs

Posted by Till Rohrmann <tr...@apache.org>.
Hi Sachin,

I can understand your dissatisfaction with the review process of your ML
PRs. They are open for a long time without much activity from a committer
even though you've spent a lot of effort writing them. I'm sorry for this,
since it's probably mainly because of me lacking the time resources to
review them.

I agree with Stephan that reviewing ML algorithms is particularly time
consuming because you not only have to verify the parallel implementation
but first of all you have to understand the algorithm. This usually
requires reading the respective literature. Furthermore, a close
examination is often necessary because a single wrong sign renders the math
and also the algorithm directly wrong. Figuring these things out when it's
still fresh is usually easier.

I personally prefer not to rush things and rather merge only things which
are working correctly, ideally supported by exhaustive tests, and
efficiently implemented.

At the moment I think the biggest problem is that there are too few
committers working on ML. Currently, it's hard for me to allocate time for
it and I guess so for others too. It would be great if the review process
wouldn't solely depend on the committers, though. However, I don't see an
immediate solution for this problem. Therefore, I would ask you to not lose
patience with the community, even though this might not be a satisfying
answer for you.

Cheers,
Till

On Sun, Dec 6, 2015 at 8:59 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi Sachin!
>
> Thank you for honestly speaking your mind on this issue.
> I agree that handling ML pull requests is particularly challenging right
> now for various reasons.
>
> There are currently very few active committers that have the background to
> actually review these pull requests. The whole work basically hinges on
> Till and Chiwan. As far as I know, both of them work on many things, and
> have only a certain time they can spend on ML. With regard to ML, Flink is
> simply a bit understaffed.
>
> The ML parts hard hard to develop. They require people who understand math,
> know how to write good code, and can think deeply into the parallel systems
> aspects of Flink. A lot of ML pull requests had to be review multiple times
> because they did not bring these aspects together.
>
> Finally, the ML pull requests are very time consuming to review, because
> the reviewer usually has to learn the implemented algorithm (or the
> parallel adaption) before being able to properly review it. As a result,
> while most other pull requests can be reviewed with a bit of time next to
> another weekly agenda, ML pull requests need a lot of dedicated time.
>
>
> The things I see that can really improve the situation are these:
>
> 1) Try to get more people in the community to help bring these pull
> requests into shape. While the group of ML-savvy committers is still small,
> it would be good if contributors could help each other out there are well,
> by reviewing the pull requests, and helping to get them into shape. Pull
> requests that are high quality by the time one of the committers looks over
> them can be merged relatively quickly.
>
> 2) Honestly reject some pull requests. Focusing the time on the promising
> ML pull requests and earlier rejecting pull requests that are far away from
> good shape. Giving the rejected pull requests some coarser feedback about
> what categories to improve, rather than detailed comments on each part.
> While it is nice to try and shepherd as many pull requests in as possible
> (and the community really tries to do that), it may not be possible in that
> area, as long as there is still not enough ML people.
>
> Maybe Till and Chiwan can share their thoughts on this.
>
>
> Greetings,
> Stephan
>
>
> On Sun, Dec 6, 2015 at 5:44 PM, Chiwan Park <ch...@apache.org> wrote:
>
> > Hi Sachin,
> >
> > I’m sorry for your unsatisfied experience about lack of review.
> >
> > As you know, because there are few committers (including me) whose
> > interest is ML, the review of PRs could be completed slowly. But I admit
> my
> > fault about leaving the PRs unreviewed for 3-5 months.
> >
> > I’ll try to review your PRs as soon as possible.
> >
> > > On Dec 6, 2015, at 1:00 AM, Sachin Goel <sa...@gmail.com>
> > wrote:
> > >
> > > Hi all
> > > Sorry about a weekend email.
> > >
> > > This email is to express my displeasure over the lack of any review on
> my
> > > PRs on extending the ML library. Five of my PRs have been without any
> > > review for times varying from 3-5 months now.
> > > When I took up the task of extending the ML library by implementing
> core
> > > algorithms such as Decision Tree and k-means clustering [with several
> > > initialization schemes], I had hoped that the community will be
> actively
> > > involved in it since ML is a very important component of any big data
> > > system these days. However, it appears I have been wrong.
> > > Surely, the initial reviews required a lot of changes from my side over
> > > coding style mistakes [first time programmer in Scala], and
> > > less-than-optimal implementations. I like to think that I have learned
> a
> > > lot about maintaining better coding style compatible with Flink code
> > base,
> > > and spending time to optimize my work due to this.
> > > However, if a PR requires work, that doesn't automatically disqualify
> it
> > > from being reviewed actively, since the author has spent a lot of time
> on
> > > it and has voluntarily taken up the task of contributing.
> > >
> > > Machine learning is my core area of interest and I am able to
> contribute
> > > much more to the library; however, a lack of review after repeated
> > > reminders automatically discourages me from picking up more issues.
> > >
> > > However minor some of my commits maybe, I have been actively involved
> in
> > > the development work [with a total of 29 commits.]. I have also spent a
> > lot
> > > of time in release testing and diagnosing-slash-fixing lots of issues
> > with
> > > Web Dashboard. However, as with any contributor, my main goal is to
> > > contribute to my area of interest, while also diversifying my work by
> > > fixing other issues.
> > >
> > > The PRs are 710, 757, 861, 918 and 1032. I propose the following order
> > for
> > > anyone who wants to review my work:
> > > 1032 [very simple feature.]
> > > 918 [very short PR]
> > > 861 [followed by 710 after a complete rebase] [major work for
> Histograms
> > > and Decision Trees]
> > > 757 [major work for K-Means clustering and initialization schemes]
> > >
> > > If I have come across as rude, I apologize.
> > >
> > > Happy reviewing and thanks for bearing with me. :)
> > >
> > > Cheers!
> > > Sachin
> > >
> > > -- Sachin Goel
> > > Computer Science, IIT Delhi
> > > m. +91-9871457685
> >
> > Regards,
> > Chiwan Park
> >
> >
> >
> >
>

Re: Lack of review on PRs

Posted by Stephan Ewen <se...@apache.org>.
Hi Sachin!

Thank you for honestly speaking your mind on this issue.
I agree that handling ML pull requests is particularly challenging right
now for various reasons.

There are currently very few active committers that have the background to
actually review these pull requests. The whole work basically hinges on
Till and Chiwan. As far as I know, both of them work on many things, and
have only a certain time they can spend on ML. With regard to ML, Flink is
simply a bit understaffed.

The ML parts hard hard to develop. They require people who understand math,
know how to write good code, and can think deeply into the parallel systems
aspects of Flink. A lot of ML pull requests had to be review multiple times
because they did not bring these aspects together.

Finally, the ML pull requests are very time consuming to review, because
the reviewer usually has to learn the implemented algorithm (or the
parallel adaption) before being able to properly review it. As a result,
while most other pull requests can be reviewed with a bit of time next to
another weekly agenda, ML pull requests need a lot of dedicated time.


The things I see that can really improve the situation are these:

1) Try to get more people in the community to help bring these pull
requests into shape. While the group of ML-savvy committers is still small,
it would be good if contributors could help each other out there are well,
by reviewing the pull requests, and helping to get them into shape. Pull
requests that are high quality by the time one of the committers looks over
them can be merged relatively quickly.

2) Honestly reject some pull requests. Focusing the time on the promising
ML pull requests and earlier rejecting pull requests that are far away from
good shape. Giving the rejected pull requests some coarser feedback about
what categories to improve, rather than detailed comments on each part.
While it is nice to try and shepherd as many pull requests in as possible
(and the community really tries to do that), it may not be possible in that
area, as long as there is still not enough ML people.

Maybe Till and Chiwan can share their thoughts on this.


Greetings,
Stephan


On Sun, Dec 6, 2015 at 5:44 PM, Chiwan Park <ch...@apache.org> wrote:

> Hi Sachin,
>
> I’m sorry for your unsatisfied experience about lack of review.
>
> As you know, because there are few committers (including me) whose
> interest is ML, the review of PRs could be completed slowly. But I admit my
> fault about leaving the PRs unreviewed for 3-5 months.
>
> I’ll try to review your PRs as soon as possible.
>
> > On Dec 6, 2015, at 1:00 AM, Sachin Goel <sa...@gmail.com>
> wrote:
> >
> > Hi all
> > Sorry about a weekend email.
> >
> > This email is to express my displeasure over the lack of any review on my
> > PRs on extending the ML library. Five of my PRs have been without any
> > review for times varying from 3-5 months now.
> > When I took up the task of extending the ML library by implementing core
> > algorithms such as Decision Tree and k-means clustering [with several
> > initialization schemes], I had hoped that the community will be actively
> > involved in it since ML is a very important component of any big data
> > system these days. However, it appears I have been wrong.
> > Surely, the initial reviews required a lot of changes from my side over
> > coding style mistakes [first time programmer in Scala], and
> > less-than-optimal implementations. I like to think that I have learned a
> > lot about maintaining better coding style compatible with Flink code
> base,
> > and spending time to optimize my work due to this.
> > However, if a PR requires work, that doesn't automatically disqualify it
> > from being reviewed actively, since the author has spent a lot of time on
> > it and has voluntarily taken up the task of contributing.
> >
> > Machine learning is my core area of interest and I am able to contribute
> > much more to the library; however, a lack of review after repeated
> > reminders automatically discourages me from picking up more issues.
> >
> > However minor some of my commits maybe, I have been actively involved in
> > the development work [with a total of 29 commits.]. I have also spent a
> lot
> > of time in release testing and diagnosing-slash-fixing lots of issues
> with
> > Web Dashboard. However, as with any contributor, my main goal is to
> > contribute to my area of interest, while also diversifying my work by
> > fixing other issues.
> >
> > The PRs are 710, 757, 861, 918 and 1032. I propose the following order
> for
> > anyone who wants to review my work:
> > 1032 [very simple feature.]
> > 918 [very short PR]
> > 861 [followed by 710 after a complete rebase] [major work for Histograms
> > and Decision Trees]
> > 757 [major work for K-Means clustering and initialization schemes]
> >
> > If I have come across as rude, I apologize.
> >
> > Happy reviewing and thanks for bearing with me. :)
> >
> > Cheers!
> > Sachin
> >
> > -- Sachin Goel
> > Computer Science, IIT Delhi
> > m. +91-9871457685
>
> Regards,
> Chiwan Park
>
>
>
>

Re: Lack of review on PRs

Posted by Chiwan Park <ch...@apache.org>.
Hi Sachin,

I’m sorry for your unsatisfied experience about lack of review.

As you know, because there are few committers (including me) whose interest is ML, the review of PRs could be completed slowly. But I admit my fault about leaving the PRs unreviewed for 3-5 months.

I’ll try to review your PRs as soon as possible.

> On Dec 6, 2015, at 1:00 AM, Sachin Goel <sa...@gmail.com> wrote:
> 
> Hi all
> Sorry about a weekend email.
> 
> This email is to express my displeasure over the lack of any review on my
> PRs on extending the ML library. Five of my PRs have been without any
> review for times varying from 3-5 months now.
> When I took up the task of extending the ML library by implementing core
> algorithms such as Decision Tree and k-means clustering [with several
> initialization schemes], I had hoped that the community will be actively
> involved in it since ML is a very important component of any big data
> system these days. However, it appears I have been wrong.
> Surely, the initial reviews required a lot of changes from my side over
> coding style mistakes [first time programmer in Scala], and
> less-than-optimal implementations. I like to think that I have learned a
> lot about maintaining better coding style compatible with Flink code base,
> and spending time to optimize my work due to this.
> However, if a PR requires work, that doesn't automatically disqualify it
> from being reviewed actively, since the author has spent a lot of time on
> it and has voluntarily taken up the task of contributing.
> 
> Machine learning is my core area of interest and I am able to contribute
> much more to the library; however, a lack of review after repeated
> reminders automatically discourages me from picking up more issues.
> 
> However minor some of my commits maybe, I have been actively involved in
> the development work [with a total of 29 commits.]. I have also spent a lot
> of time in release testing and diagnosing-slash-fixing lots of issues with
> Web Dashboard. However, as with any contributor, my main goal is to
> contribute to my area of interest, while also diversifying my work by
> fixing other issues.
> 
> The PRs are 710, 757, 861, 918 and 1032. I propose the following order for
> anyone who wants to review my work:
> 1032 [very simple feature.]
> 918 [very short PR]
> 861 [followed by 710 after a complete rebase] [major work for Histograms
> and Decision Trees]
> 757 [major work for K-Means clustering and initialization schemes]
> 
> If I have come across as rude, I apologize.
> 
> Happy reviewing and thanks for bearing with me. :)
> 
> Cheers!
> Sachin
> 
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685

Regards,
Chiwan Park