You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sean Owen <sr...@gmail.com> on 2018/10/24 19:09:28 UTC

What's a blocker?

Shifting this to dev@. See the PR https://github.com/apache/spark/pull/22144
for more context.

There will be no objective, complete definition of blocker, or even
regression or correctness issue. Many cases are clear, some are not. We can
draw up more guidelines, and feel free to open PRs against the
'contributing' doc. But in general these are the same consensus-driven
decisions we negotiate all the time.

What isn't said that should be is that there is a cost to not releasing.
Keep in mind we have, also, decided on a 'release train' cadence. That does
properly change the calculus about what's a blocker; the right decision
could change within even a week.

I wouldn't mind some verbiage around what a regression is. Since the last
minor release?

We can VOTE on anything we like, but we already VOTE on the release.
Weirdly, technically, the release vote criteria is simple majority, FWIW:
http://www.apache.org/legal/release-policy.html#release-approval

Yes, actually, it is only the PMC's votes that literally matter. Those
votes are, surely, based on input from others too. But that is actually
working as intended.


Let's understand statements like "X is not a blocker" to mean "I don't
think that X is a blocker". Interpretations not proclamations, backed up by
reasons, not all of which are appeals to policy and precedent.

I find it hard to argue about these in the abstract, because I believe it's
already widely agreed, and written down in ASF policy, that nobody makes
decisions unilaterally. Done, yes.

Practically speaking, the urgent issue is the 2.4 release. I don't see
process failures here that need fixing or debate. I do think those
outstanding issues merit technical discussion. The outcome will be a
tradeoff of some subjective issues, not read off of a policy sheet, and
will entail tradeoffs. Let's speak freely about those technical issues and
try to find the consensus position.


On Wed, Oct 24, 2018 at 12:21 PM Mark Hamstra <no...@github.com>
wrote:

> Thanks @tgravescs <https://github.com/tgravescs> for your latest posts --
> they've saved me from posting something similar in many respects but more
> strongly worded.
>
> What is bothering me (not just in the discussion of this PR, but more
> broadly) is that we have individuals making declarative statements about
> whether something can or can't block a release, or that something "is not
> that important to Spark at this point", etc. -- things for which there is
> no supporting PMC vote or declared policy. It may be your opinion,
> @cloud-fan <https://github.com/cloud-fan> , that Hive compatibility
> should no longer be important to the Apache Spark project, and I have no
> problem with you expressing your opinion on the matter. That may even be
> the internal policy at your employer, I don't know. But you are just not in
> a position on your own to make this declaration for the Apache Spark
> project.
>
> I don't mean to single you out, @cloud-fan <https://github.com/cloud-fan>
> , as the only offender, since this isn't a unique instance. For example,
> heading into a recent release we also saw individual declarations that the
> data correctness issue caused by the shuffle replay partitioning issue was
> not a blocker because it was not a regression or that it was not
> significant enough to alter the release schedule. Rather, my point is that
> things like release schedules, the declaration of release candidates,
> labeling JIRA tickets with "blocker", and de facto or even declared policy
> on regressions and release blockers are just tools in the service of the
> PMC. If, as was the case with the shuffle data correctness issue, PMC
> members think that the issue must be fixed before the next release, then
> release schedules, RC-status, other individuals' perceptions of importance
> to the project or of policy ultimately don't matter -- only the vote of the
> PMC does. What is concerning me is that, instead of efforts to persuade the
> PMC members that something should not block the next release or should not
> be important to the project, I am seeing flat declarations that an issue is
> not a blocker or not important. That may serve to stifle work to
> immediately fix a bug, or to discourage other contributions, but I can
> assure that trying to make the PMC serve the tools instead of the other way
> around won't serve to persuade at least some PMC members on how they should
> vote.
>
> Sorry, I guess I can't avoid wording things strongly after all.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/22144#issuecomment-432749466>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAyM-rH3kkKcNfb4tCq-F5IewM6uuwK0ks5uoKGFgaJpZM4WDFyL>
> .
>

Re: What's a blocker?

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Yeah, I can pretty much agree with that. Before we get into release
candidates, it's not as big a deal if something gets labeled as a blocker.
Once we are into an RC, I'd like to see any discussions as to whether
something is or isn't a blocker at least cross-referenced in the RC VOTE
thread so that PMC members can more easily be aware of the discussion and
potentially weigh in.

On Wed, Oct 24, 2018 at 7:12 PM Saisai Shao <sa...@gmail.com> wrote:

> Just my two cents of the past experience. As a release manager of Spark
> 2.3.2, I felt significantly delay during the release by block issues. Vote
> was failed several times by one or two "block issue". I think during the RC
> time, each "block issue" should be carefully evaluated by the related PMCs
> and release manager. Some issues which are not so critical or only matters
> to one or two firms should be carefully marked as blocker, to avoid the
> delay of the release.
>
> Thanks
> Saisai
>

Re: What's a blocker?

Posted by Saisai Shao <sa...@gmail.com>.
Just my two cents of the past experience. As a release manager of Spark
2.3.2, I felt significantly delay during the release by block issues. Vote
was failed several times by one or two "block issue". I think during the RC
time, each "block issue" should be carefully evaluated by the related PMCs
and release manager. Some issues which are not so critical or only matters
to one or two firms should be carefully marked as blocker, to avoid the
delay of the release.

Thanks
Saisai

Re: What's a blocker?

Posted by Hyukjin Kwon <gu...@gmail.com>.
> Let's understand statements like "X is not a blocker" to mean "I don't
think that X is a blocker". Interpretations not proclamations, backed up by
reasons, not all of which are appeals to policy and precedent.
Might not be a big deal and out of the topic but I rather hope people
explicitly avoid to say like "X is not a blocker" tho. It certainly does
sound like some kind of proclamations.


2018년 10월 25일 (목) 오전 3:09, Sean Owen <sr...@gmail.com>님이 작성:

> Shifting this to dev@. See the PR
> https://github.com/apache/spark/pull/22144 for more context.
>
> There will be no objective, complete definition of blocker, or even
> regression or correctness issue. Many cases are clear, some are not. We can
> draw up more guidelines, and feel free to open PRs against the
> 'contributing' doc. But in general these are the same consensus-driven
> decisions we negotiate all the time.
>
> What isn't said that should be is that there is a cost to not releasing.
> Keep in mind we have, also, decided on a 'release train' cadence. That does
> properly change the calculus about what's a blocker; the right decision
> could change within even a week.
>
> I wouldn't mind some verbiage around what a regression is. Since the last
> minor release?
>
> We can VOTE on anything we like, but we already VOTE on the release.
> Weirdly, technically, the release vote criteria is simple majority, FWIW:
> http://www.apache.org/legal/release-policy.html#release-approval
>
> Yes, actually, it is only the PMC's votes that literally matter. Those
> votes are, surely, based on input from others too. But that is actually
> working as intended.
>
>
> Let's understand statements like "X is not a blocker" to mean "I don't
> think that X is a blocker". Interpretations not proclamations, backed up by
> reasons, not all of which are appeals to policy and precedent.
>
> I find it hard to argue about these in the abstract, because I believe
> it's already widely agreed, and written down in ASF policy, that nobody
> makes decisions unilaterally. Done, yes.
>
> Practically speaking, the urgent issue is the 2.4 release. I don't see
> process failures here that need fixing or debate. I do think those
> outstanding issues merit technical discussion. The outcome will be a
> tradeoff of some subjective issues, not read off of a policy sheet, and
> will entail tradeoffs. Let's speak freely about those technical issues and
> try to find the consensus position.
>
>
> On Wed, Oct 24, 2018 at 12:21 PM Mark Hamstra <no...@github.com>
> wrote:
>
>> Thanks @tgravescs <https://github.com/tgravescs> for your latest posts
>> -- they've saved me from posting something similar in many respects but
>> more strongly worded.
>>
>> What is bothering me (not just in the discussion of this PR, but more
>> broadly) is that we have individuals making declarative statements about
>> whether something can or can't block a release, or that something "is not
>> that important to Spark at this point", etc. -- things for which there is
>> no supporting PMC vote or declared policy. It may be your opinion,
>> @cloud-fan <https://github.com/cloud-fan> , that Hive compatibility
>> should no longer be important to the Apache Spark project, and I have no
>> problem with you expressing your opinion on the matter. That may even be
>> the internal policy at your employer, I don't know. But you are just not in
>> a position on your own to make this declaration for the Apache Spark
>> project.
>>
>> I don't mean to single you out, @cloud-fan <https://github.com/cloud-fan>
>> , as the only offender, since this isn't a unique instance. For example,
>> heading into a recent release we also saw individual declarations that the
>> data correctness issue caused by the shuffle replay partitioning issue was
>> not a blocker because it was not a regression or that it was not
>> significant enough to alter the release schedule. Rather, my point is that
>> things like release schedules, the declaration of release candidates,
>> labeling JIRA tickets with "blocker", and de facto or even declared policy
>> on regressions and release blockers are just tools in the service of the
>> PMC. If, as was the case with the shuffle data correctness issue, PMC
>> members think that the issue must be fixed before the next release, then
>> release schedules, RC-status, other individuals' perceptions of importance
>> to the project or of policy ultimately don't matter -- only the vote of the
>> PMC does. What is concerning me is that, instead of efforts to persuade the
>> PMC members that something should not block the next release or should not
>> be important to the project, I am seeing flat declarations that an issue is
>> not a blocker or not important. That may serve to stifle work to
>> immediately fix a bug, or to discourage other contributions, but I can
>> assure that trying to make the PMC serve the tools instead of the other way
>> around won't serve to persuade at least some PMC members on how they should
>> vote.
>>
>> Sorry, I guess I can't avoid wording things strongly after all.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/apache/spark/pull/22144#issuecomment-432749466>, or mute
>> the thread
>> <https://github.com/notifications/unsubscribe-auth/AAyM-rH3kkKcNfb4tCq-F5IewM6uuwK0ks5uoKGFgaJpZM4WDFyL>
>> .
>>
>

Re: What's a blocker?

Posted by Tom Graves <tg...@yahoo.com.INVALID>.
Ignoring everything else in this thread to put sharper point on one issue. In the pr multiple people referred to it's not a blocker based on it was also a bug/dropped feature in the previous release (note one was phrased slightly different as it was stated not a regression, which I read as not a regression from the previous feature release).  My thoughts on this are if multiple people think this then others may as well so I think we need a discuss thread on it.
My reasons for disagreeing with that are it specifically goes against our documented versioning policy.  The jira claims we essentially broke proper support for hive udafs, we specifically state in our docs we support hive udafs, i consider that an api, our versioning docs say we wont break api compatibility in feature releases. It shouldn't matter if that was 1 feature release ago or 10, until we do a major release we shouldn't break or drop that compatibility.
So we should not be using that as a reason to decide if a jira is a blocker or not.

Tom 
 
  On Thu, Oct 25, 2018 at 9:39 AM, Sean Owen<sr...@gmail.com> wrote:   What does "PMC members aren't saying its a block for reasons other then the actual impact the jira has" mean that isn't already widely agreed? Likewise "Committers and PMC members should not be saying its not a blocker because they personally or their company doesn't care about this feature or api". It sounds like insinuation, and I'd rather make it explicit -- call out the bad actions -- or keep it to observable technical issues.
Likewise one could say there's a problem just because A thinks X should be a blocker and B disagrees. I see no bad faith, process problem, or obvious errors. Do you? I see disagreement, and it's tempting to suspect motives. I have seen what I think are actual bad-faith decisions in the past in this project, too. I don't see it here though and want to stick to 'now'.

(Aside: the implication is that those representing vendors are steam-rolling a release. Actually, the cynical incentives cut the other way here. Blessing the latest changes as OSS Apache Spark is predominantly beneficial to users of OSS, not distros. In fact, it forces distros to make changes. And broadly, vendors have much more accountability for quality of releases, because they're paid to.)

I'm still not sure what specifically the objection is to what here? I understand a lot is in flight and nobody agrees with every decision made, but, what else is new? Concretely: the release is held again to fix a few issues, in the end. For the map_filter issue, that seems like the right call, and there are a few other important issues that could be quickly fixed too. All is well there, yes?
This has surfaced some implicit reasoning about releases that we could make explicit, like:
(Sure, if you want to write down things like, release blockers should be decided in the interests of the project by the PMC, OK)
We have a time-based release schedule, so time matters. There is an opportunity cost to not releasing. The bar for blockers goes up over time.
Not all regressions are blockers. Would you hold a release over a trivial regression? but then which must or should block? There's no objective answer, but a reasonable rule is: non-trivial regressions from minor release x.y to x.{y+1} block releases. Regressions from x.{y-1} to x.{y+1} should, but not necessarily, block the release. We try hard to avoid regressions in x.y.0 releases because these are generally consumed by aggressive upgraders, on x.{y-1}.z now. If a bug exists in x.{y-1}, they're not affected or worked around it. The cautious upgrader goes from maybe x.{y-2}.z to x.y.1 later. They're affected, but not before, maybe, a maintenance release. A crude argument, and it's not an argument that regressions are OK. It's an argument that 'old' regressions matter less. And maybe it's reasonable to draw the "must" vs "should" line between them.



On Thu, Oct 25, 2018 at 8:51 AM Tom Graves <tg...@yahoo.com> wrote:

 So just to clarify a few things in case people didn't read the entire thread in the PR, the discussion is what is the criteria for a blocker and really my concerns are what people are using as criteria for not marking a jira as a blocker.
The only thing we have documented to mark a jira as a blocker is for correctness issues: http://spark.apache.org/contributing.html.  And really I think that is initially mark it as a blocker to bring attention to it.The final decision as to whether something is a blocker is up to the PMC who votes on whether a release passes.  I think it would be impossible to properly define what a blocker is with strict rules.
Personally from this thread I would like to make sure committers and PMC members aren't saying its a block for reasons other then the actual impact the jira has and if its at all in question it should be brought to the PMC's attention for a vote.  I agree with others that if its during an RC it should be talked about on the RC thread.
A few specific things that were said that I disagree with are:   - its not a blocker because it was also an issue in the last release (meaning feature release).  ie the bug was introduced in 2.2 and now we are doing 2.4 so its automatically not a blocker.  This to me is just wrong.  Lots of things are not found immediately, or aren't reported immediately.   Now I do believe the timeframe its been in there does affect the decision on the impact but just making the decision on this to me is to strict.    - Committers and PMC members should not be saying its not a blocker because they personally or their company doesn't care about this feature or api, or state that the Spark project as a whole doesn't care about this feature unless that was specifically voted on at the project level. They need to follow the api compatibility we have documented. This is really a broader issue then just marking a jira, it goes to anything checked in and perhaps need to be a separate thread.

For the verbiage of what a regression is, it seems like that should be defined by our versioning documents. It states what we do in maintenance, feature, and major releases (http://spark.apache.org/versioning-policy.html), if its not defined by that we probably need to clarify.   There was a good example we might want to clarify about things like scala or java compatibility in feature releases.  
Obviously this is my opinion and its here for everyone to discuss and come to a consensus on.   
Tom


  
  

Re: What's a blocker?

Posted by Erik Erlandson <ee...@redhat.com>.
I'd like to expand a bit on the phrase "opportunity cost" to try and make
it more concrete: delaying a release means that the  community is *not*
receiving various bug fixes (and features).  Just as a particular example,
the wait for 2.3.2 delayed a fix for the Py3.7 iterator breaking change
that was also causing a correctness bug.  It also delays community feedback
from running new releases.  That in and of itself does not give an answer
to block/not-block for any specific case, but it's another way of saying
that blocking a release *prevents* people from getting bug fixes, as well
as potentially fixing bugs.


On Thu, Oct 25, 2018 at 7:39 AM Sean Owen <sr...@gmail.com> wrote:

> What does "PMC members aren't saying its a block for reasons other then
> the actual impact the jira has" mean that isn't already widely agreed?
> Likewise "Committers and PMC members should not be saying its not a
> blocker because they personally or their company doesn't care about this
> feature or api". It sounds like insinuation, and I'd rather make it
> explicit -- call out the bad actions -- or keep it to observable technical
> issues.
>
> Likewise one could say there's a problem just because A thinks X should be
> a blocker and B disagrees. I see no bad faith, process problem, or obvious
> errors. Do you? I see disagreement, and it's tempting to suspect motives. I
> have seen what I think are actual bad-faith decisions in the past in this
> project, too. I don't see it here though and want to stick to 'now'.
>
> (Aside: the implication is that those representing vendors are
> steam-rolling a release. Actually, the cynical incentives cut the other way
> here. Blessing the latest changes as OSS Apache Spark is predominantly
> beneficial to users of OSS, not distros. In fact, it forces distros to make
> changes. And broadly, vendors have much more accountability for quality of
> releases, because they're paid to.)
>
>
> I'm still not sure what specifically the objection is to what here? I
> understand a lot is in flight and nobody agrees with every decision made,
> but, what else is new?
> Concretely: the release is held again to fix a few issues, in the end. For
> the map_filter issue, that seems like the right call, and there are a few
> other important issues that could be quickly fixed too. All is well there,
> yes?
>
> This has surfaced some implicit reasoning about releases that we could
> make explicit, like:
>
> (Sure, if you want to write down things like, release blockers should be
> decided in the interests of the project by the PMC, OK)
>
> We have a time-based release schedule, so time matters. There is an
> opportunity cost to not releasing. The bar for blockers goes up over time.
>
> Not all regressions are blockers. Would you hold a release over a trivial
> regression? but then which must or should block? There's no objective
> answer, but a reasonable rule is: non-trivial regressions from minor
> release x.y to x.{y+1} block releases. Regressions from x.{y-1} to x.{y+1}
> should, but not necessarily, block the release. We try hard to avoid
> regressions in x.y.0 releases because these are generally consumed by
> aggressive upgraders, on x.{y-1}.z now. If a bug exists in x.{y-1}, they're
> not affected or worked around it. The cautious upgrader goes from maybe
> x.{y-2}.z to x.y.1 later. They're affected, but not before, maybe, a
> maintenance release. A crude argument, and it's not an argument that
> regressions are OK. It's an argument that 'old' regressions matter less.
> And maybe it's reasonable to draw the "must" vs "should" line between them.
>
>
>
> On Thu, Oct 25, 2018 at 8:51 AM Tom Graves <tg...@yahoo.com> wrote:
>
>> So just to clarify a few things in case people didn't read the entire
>> thread in the PR, the discussion is what is the criteria for a blocker and
>> really my concerns are what people are using as criteria for not marking a
>> jira as a blocker.
>>
>> The only thing we have documented to mark a jira as a blocker is for
>> correctness issues: http://spark.apache.org/contributing.html.  And
>> really I think that is initially mark it as a blocker to bring attention to
>> it.
>> The final decision as to whether something is a blocker is up to the PMC
>> who votes on whether a release passes.  I think it would be impossible to
>> properly define what a blocker is with strict rules.
>>
>> Personally from this thread I would like to make sure committers and PMC
>> members aren't saying its a block for reasons other then the actual impact
>> the jira has and if its at all in question it should be brought to the
>> PMC's attention for a vote.  I agree with others that if its during an RC
>> it should be talked about on the RC thread.
>>
>> A few specific things that were said that I disagree with are:
>>    - its not a blocker because it was also an issue in the last release
>> (meaning feature release).  ie the bug was introduced in 2.2 and now we are
>> doing 2.4 so its automatically not a blocker.  This to me is just wrong.
>> Lots of things are not found immediately, or aren't reported immediately.
>>  Now I do believe the timeframe its been in there does affect the decision
>> on the impact but just making the decision on this to me is to strict.
>>    - Committers and PMC members should not be saying its not a blocker
>> because they personally or their company doesn't care about this feature or
>> api, or state that the Spark project as a whole doesn't care about this
>> feature unless that was specifically voted on at the project level. They
>> need to follow the api compatibility we have documented. This is really a
>> broader issue then just marking a jira, it goes to anything checked in and
>> perhaps need to be a separate thread.
>>
>>
>> For the verbiage of what a regression is, it seems like that should be
>> defined by our versioning documents. It states what we do in maintenance,
>> feature, and major releases (
>> http://spark.apache.org/versioning-policy.html), if its not defined by
>> that we probably need to clarify.   There was a good example we might want
>> to clarify about things like scala or java compatibility in feature
>> releases.
>>
>> Obviously this is my opinion and its here for everyone to discuss and
>> come to a consensus on.
>>
>> Tom
>>
>>

Re: What's a blocker?

Posted by Sean Owen <sr...@gmail.com>.
What does "PMC members aren't saying its a block for reasons other then the
actual impact the jira has" mean that isn't already widely agreed? Likewise
"Committers and PMC members should not be saying its not a blocker because
they personally or their company doesn't care about this feature or api".
It sounds like insinuation, and I'd rather make it explicit -- call out the
bad actions -- or keep it to observable technical issues.

Likewise one could say there's a problem just because A thinks X should be
a blocker and B disagrees. I see no bad faith, process problem, or obvious
errors. Do you? I see disagreement, and it's tempting to suspect motives. I
have seen what I think are actual bad-faith decisions in the past in this
project, too. I don't see it here though and want to stick to 'now'.

(Aside: the implication is that those representing vendors are
steam-rolling a release. Actually, the cynical incentives cut the other way
here. Blessing the latest changes as OSS Apache Spark is predominantly
beneficial to users of OSS, not distros. In fact, it forces distros to make
changes. And broadly, vendors have much more accountability for quality of
releases, because they're paid to.)


I'm still not sure what specifically the objection is to what here? I
understand a lot is in flight and nobody agrees with every decision made,
but, what else is new?
Concretely: the release is held again to fix a few issues, in the end. For
the map_filter issue, that seems like the right call, and there are a few
other important issues that could be quickly fixed too. All is well there,
yes?

This has surfaced some implicit reasoning about releases that we could make
explicit, like:

(Sure, if you want to write down things like, release blockers should be
decided in the interests of the project by the PMC, OK)

We have a time-based release schedule, so time matters. There is an
opportunity cost to not releasing. The bar for blockers goes up over time.

Not all regressions are blockers. Would you hold a release over a trivial
regression? but then which must or should block? There's no objective
answer, but a reasonable rule is: non-trivial regressions from minor
release x.y to x.{y+1} block releases. Regressions from x.{y-1} to x.{y+1}
should, but not necessarily, block the release. We try hard to avoid
regressions in x.y.0 releases because these are generally consumed by
aggressive upgraders, on x.{y-1}.z now. If a bug exists in x.{y-1}, they're
not affected or worked around it. The cautious upgrader goes from maybe
x.{y-2}.z to x.y.1 later. They're affected, but not before, maybe, a
maintenance release. A crude argument, and it's not an argument that
regressions are OK. It's an argument that 'old' regressions matter less.
And maybe it's reasonable to draw the "must" vs "should" line between them.



On Thu, Oct 25, 2018 at 8:51 AM Tom Graves <tg...@yahoo.com> wrote:

> So just to clarify a few things in case people didn't read the entire
> thread in the PR, the discussion is what is the criteria for a blocker and
> really my concerns are what people are using as criteria for not marking a
> jira as a blocker.
>
> The only thing we have documented to mark a jira as a blocker is for
> correctness issues: http://spark.apache.org/contributing.html.  And
> really I think that is initially mark it as a blocker to bring attention to
> it.
> The final decision as to whether something is a blocker is up to the PMC
> who votes on whether a release passes.  I think it would be impossible to
> properly define what a blocker is with strict rules.
>
> Personally from this thread I would like to make sure committers and PMC
> members aren't saying its a block for reasons other then the actual impact
> the jira has and if its at all in question it should be brought to the
> PMC's attention for a vote.  I agree with others that if its during an RC
> it should be talked about on the RC thread.
>
> A few specific things that were said that I disagree with are:
>    - its not a blocker because it was also an issue in the last release
> (meaning feature release).  ie the bug was introduced in 2.2 and now we are
> doing 2.4 so its automatically not a blocker.  This to me is just wrong.
> Lots of things are not found immediately, or aren't reported immediately.
>  Now I do believe the timeframe its been in there does affect the decision
> on the impact but just making the decision on this to me is to strict.
>    - Committers and PMC members should not be saying its not a blocker
> because they personally or their company doesn't care about this feature or
> api, or state that the Spark project as a whole doesn't care about this
> feature unless that was specifically voted on at the project level. They
> need to follow the api compatibility we have documented. This is really a
> broader issue then just marking a jira, it goes to anything checked in and
> perhaps need to be a separate thread.
>
>
> For the verbiage of what a regression is, it seems like that should be
> defined by our versioning documents. It states what we do in maintenance,
> feature, and major releases (
> http://spark.apache.org/versioning-policy.html), if its not defined by
> that we probably need to clarify.   There was a good example we might want
> to clarify about things like scala or java compatibility in feature
> releases.
>
> Obviously this is my opinion and its here for everyone to discuss and come
> to a consensus on.
>
> Tom
>
>

Re: What's a blocker?

Posted by Tom Graves <tg...@yahoo.com.INVALID>.
 So just to clarify a few things in case people didn't read the entire thread in the PR, the discussion is what is the criteria for a blocker and really my concerns are what people are using as criteria for not marking a jira as a blocker.
The only thing we have documented to mark a jira as a blocker is for correctness issues: http://spark.apache.org/contributing.html.  And really I think that is initially mark it as a blocker to bring attention to it.The final decision as to whether something is a blocker is up to the PMC who votes on whether a release passes.  I think it would be impossible to properly define what a blocker is with strict rules.
Personally from this thread I would like to make sure committers and PMC members aren't saying its a block for reasons other then the actual impact the jira has and if its at all in question it should be brought to the PMC's attention for a vote.  I agree with others that if its during an RC it should be talked about on the RC thread.
A few specific things that were said that I disagree with are:   - its not a blocker because it was also an issue in the last release (meaning feature release).  ie the bug was introduced in 2.2 and now we are doing 2.4 so its automatically not a blocker.  This to me is just wrong.  Lots of things are not found immediately, or aren't reported immediately.   Now I do believe the timeframe its been in there does affect the decision on the impact but just making the decision on this to me is to strict.    - Committers and PMC members should not be saying its not a blocker because they personally or their company doesn't care about this feature or api, or state that the Spark project as a whole doesn't care about this feature unless that was specifically voted on at the project level. They need to follow the api compatibility we have documented. This is really a broader issue then just marking a jira, it goes to anything checked in and perhaps need to be a separate thread.

For the verbiage of what a regression is, it seems like that should be defined by our versioning documents. It states what we do in maintenance, feature, and major releases (http://spark.apache.org/versioning-policy.html), if its not defined by that we probably need to clarify.   There was a good example we might want to clarify about things like scala or java compatibility in feature releases.  
Obviously this is my opinion and its here for everyone to discuss and come to a consensus on.   
Tom
    On Wednesday, October 24, 2018, 2:09:49 PM CDT, Sean Owen <sr...@gmail.com> wrote:  
 
 Shifting this to dev@. See the PR https://github.com/apache/spark/pull/22144 for more context.
There will be no objective, complete definition of blocker, or even regression or correctness issue. Many cases are clear, some are not. We can draw up more guidelines, and feel free to open PRs against the 'contributing' doc. But in general these are the same consensus-driven decisions we negotiate all the time.
What isn't said that should be is that there is a cost to not releasing. Keep in mind we have, also, decided on a 'release train' cadence. That does properly change the calculus about what's a blocker; the right decision could change within even a week.

I wouldn't mind some verbiage around what a regression is. Since the last minor release?
We can VOTE on anything we like, but we already VOTE on the release. Weirdly, technically, the release vote criteria is simple majority, FWIW: http://www.apache.org/legal/release-policy.html#release-approval 
Yes, actually, it is only the PMC's votes that literally matter. Those votes are, surely, based on input from others too. But that is actually working as intended.

Let's understand statements like "X is not a blocker" to mean "I don't think that X is a blocker". Interpretations not proclamations, backed up by reasons, not all of which are appeals to policy and precedent.
I find it hard to argue about these in the abstract, because I believe it's already widely agreed, and written down in ASF policy, that nobody makes decisions unilaterally. Done, yes. 
Practically speaking, the urgent issue is the 2.4 release. I don't see process failures here that need fixing or debate. I do think those outstanding issues merit technical discussion. The outcome will be a tradeoff of some subjective issues, not read off of a policy sheet, and will entail tradeoffs. Let's speak freely about those technical issues and try to find the consensus position.

On Wed, Oct 24, 2018 at 12:21 PM Mark Hamstra <no...@github.com> wrote:


Thanks @tgravescs for your latest posts -- they've saved me from posting something similar in many respects but more strongly worded.

What is bothering me (not just in the discussion of this PR, but more broadly) is that we have individuals making declarative statements about whether something can or can't block a release, or that something "is not that important to Spark at this point", etc. -- things for which there is no supporting PMC vote or declared policy. It may be your opinion, @cloud-fan , that Hive compatibility should no longer be important to the Apache Spark project, and I have no problem with you expressing your opinion on the matter. That may even be the internal policy at your employer, I don't know. But you are just not in a position on your own to make this declaration for the Apache Spark project.

I don't mean to single you out, @cloud-fan , as the only offender, since this isn't a unique instance. For example, heading into a recent release we also saw individual declarations that the data correctness issue caused by the shuffle replay partitioning issue was not a blocker because it was not a regression or that it was not significant enough to alter the release schedule. Rather, my point is that things like release schedules, the declaration of release candidates, labeling JIRA tickets with "blocker", and de facto or even declared policy on regressions and release blockers are just tools in the service of the PMC. If, as was the case with the shuffle data correctness issue, PMC members think that the issue must be fixed before the next release, then release schedules, RC-status, other individuals' perceptions of importance to the project or of policy ultimately don't matter -- only the vote of the PMC does. What is concerning me is that, instead of efforts to persuade the PMC members that something should not block the next release or should not be important to the project, I am seeing flat declarations that an issue is not a blocker or not important. That may serve to stifle work to immediately fix a bug, or to discourage other contributions, but I can assure that trying to make the PMC serve the tools instead of the other way around won't serve to persuade at least some PMC members on how they should vote.

Sorry, I guess I can't avoid wording things strongly after all.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.