You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Steve Loughran <st...@cloudera.com.INVALID> on 2020/09/23 18:41:51 UTC

the v2 commit algorithm

I've got a PR up to completely remove the v2 commit algorithm

https://github.com/apache/hadoop/pull/2320

That may seem overkill, but while *we* know there's a small window of risk
(task attempt 1 failing partway through a nonatomic commit), that's not
known/appreciated by others.

The patch removes the v2 codepath from FileOutputCommitter, making it a lot
less complicated, and when v2 is requested, a warning is printed and the
option ignored.

Overkill? Maybe. But it guarantees correctness

Re: the v2 commit algorithm

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
On Wed, 23 Sep 2020 at 20:07, Igor Dvorzhak <id...@google.com.invalid> wrote:

> What will be the solution for object stores to have fast and correct
> commit algorithms?
>

https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_draft_006

There's a plugin point for you to add an explicit committer for gcs:

A key thing is: what atomic operations does your store have?


   1. HDFS has rename and create-no-overwrite
   2. S3 has only PUT/complete multipart upload, and no fail-if-exists
   checks




> On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
> <st...@cloudera.com.invalid> wrote:
>
>> I've got a PR up to completely remove the v2 commit algorithm
>>
>> https://github.com/apache/hadoop/pull/2320
>>
>> That may seem overkill, but while *we* know there's a small window of risk
>> (task attempt 1 failing partway through a nonatomic commit), that's not
>> known/appreciated by others.
>>
>> The patch removes the v2 codepath from FileOutputCommitter, making it a
>> lot
>> less complicated, and when v2 is requested, a warning is printed and the
>> option ignored.
>>
>> Overkill? Maybe. But it guarantees correctness
>>
>

Re: [E] Re: the v2 commit algorithm

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
I Think the conclusion is "no change for now", but people do need to
understand the risks better. One thing I'd like to understand are: which
FileOutputFormat subclasses generate unique filenames which are different
in different task attempts? I've heard a mention of Avro here, but not
looked in the code

On Thu, 24 Sep 2020 at 17:27, epayne@apache.org <ep...@apache.org> wrote:

> Thanks Steve and Jim for bringing this issue to our attention.
>
> IIUC, Serial commit takes minutes with mrv1, whereas with mrv2 it is very
> quick. With this kind of performance
> difference, is wise to change the default behavior for released versions
> of Hadoop? Should this be limited to
> trunk?
>
> Thanks,
> -Eric Payne
>
>
> On Wednesday, September 23, 2020, 2:16:14 PM CDT, Jim Brennan
> <ja...@verizonmedia.com.invalid> wrote:
>
> I replied in the Jira.  The speed up provided by the v2 commit algorithm
> is very important to us at Verizon Media (Yahoo).  Please do not remove it.
> I referred to this comment from Jason Lowe on the original Jira:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115
>
> I think it would be appropriate to better document the limitations of the
> v2 algorithm and possibly make it not be the default, as long as we can
> still use it.
>
> On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <id...@google.com.invalid>
> wrote:
>
> > What will be the solution for object stores to have fast and correct
> > commit algorithms?
> >
> > On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
> > <st...@cloudera.com.invalid> wrote:
> >
> >> I've got a PR up to completely remove the v2 commit algorithm
> >>
> >> https://github.com/apache/hadoop/pull/2320
> >>
> >> That may seem overkill, but while *we* know there's a small window of
> risk
> >> (task attempt 1 failing partway through a nonatomic commit), that's not
> >> known/appreciated by others.
> >>
> >> The patch removes the v2 codepath from FileOutputCommitter, making it a
> >> lot
> >> less complicated, and when v2 is requested, a warning is printed and the
> >> option ignored.
> >>
> >> Overkill? Maybe. But it guarantees correctness
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>
>

Re: [E] Re: the v2 commit algorithm

Posted by "epayne@apache.org" <ep...@apache.org>.
Thanks Steve and Jim for bringing this issue to our attention.

IIUC, Serial commit takes minutes with mrv1, whereas with mrv2 it is very quick. With this kind of performance
difference, is wise to change the default behavior for released versions of Hadoop? Should this be limited to
trunk?

Thanks,
-Eric Payne


On Wednesday, September 23, 2020, 2:16:14 PM CDT, Jim Brennan <ja...@verizonmedia.com.invalid> wrote: 

I replied in the Jira.  The speed up provided by the v2 commit algorithm
is very important to us at Verizon Media (Yahoo).  Please do not remove it.
I referred to this comment from Jason Lowe on the original Jira:
https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115

I think it would be appropriate to better document the limitations of the
v2 algorithm and possibly make it not be the default, as long as we can
still use it.

On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <id...@google.com.invalid>
wrote:

> What will be the solution for object stores to have fast and correct
> commit algorithms?
>
> On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
> <st...@cloudera.com.invalid> wrote:
>
>> I've got a PR up to completely remove the v2 commit algorithm
>>
>> https://github.com/apache/hadoop/pull/2320
>>
>> That may seem overkill, but while *we* know there's a small window of risk
>> (task attempt 1 failing partway through a nonatomic commit), that's not
>> known/appreciated by others.
>>
>> The patch removes the v2 codepath from FileOutputCommitter, making it a
>> lot
>> less complicated, and when v2 is requested, a warning is printed and the
>> option ignored.
>>
>> Overkill? Maybe. But it guarantees correctness
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


Re: [E] Re: the v2 commit algorithm

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
On Wed, 23 Sep 2020 at 20:16, Jim Brennan
<ja...@verizonmedia.com.invalid> wrote:

> I replied in the Jira.   The speed up provided by the v2 commit algorithm
> is very important to us at Verizon Media (Yahoo).  Please do not remove it.
> I referred to this comment from Jason Lowe on the original Jira:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115
>
> I think it would be appropriate to better document the limitations of the
> v2 algorithm and possibly make it not be the default, as long as we can
> still use it.
>


What about:
-change default
-log @ WARN in job setup (but not tasks)


People like yourself -aware of and happy with the risk- can carry on, but
everyone else gets a warning of risk

I could also have a special log for the warning so you can turn it off...

>
> On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <id...@google.com.invalid>
> wrote:
>
> > What will be the solution for object stores to have fast and correct
> > commit algorithms?
> >
> > On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
> > <st...@cloudera.com.invalid> wrote:
> >
> >> I've got a PR up to completely remove the v2 commit algorithm
> >>
> >> https://github.com/apache/hadoop/pull/2320
> >>
> >> That may seem overkill, but while *we* know there's a small window of
> risk
> >> (task attempt 1 failing partway through a nonatomic commit), that's not
> >> known/appreciated by others.
> >>
> >> The patch removes the v2 codepath from FileOutputCommitter, making it a
> >> lot
> >> less complicated, and when v2 is requested, a warning is printed and the
> >> option ignored.
> >>
> >> Overkill? Maybe. But it guarantees correctness
> >>
> >
>

Re: [E] Re: the v2 commit algorithm

Posted by Jim Brennan <ja...@verizonmedia.com.INVALID>.
I replied in the Jira.   The speed up provided by the v2 commit algorithm
is very important to us at Verizon Media (Yahoo).  Please do not remove it.
I referred to this comment from Jason Lowe on the original Jira:
https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115

I think it would be appropriate to better document the limitations of the
v2 algorithm and possibly make it not be the default, as long as we can
still use it.

On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <id...@google.com.invalid>
wrote:

> What will be the solution for object stores to have fast and correct
> commit algorithms?
>
> On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
> <st...@cloudera.com.invalid> wrote:
>
>> I've got a PR up to completely remove the v2 commit algorithm
>>
>> https://github.com/apache/hadoop/pull/2320
>>
>> That may seem overkill, but while *we* know there's a small window of risk
>> (task attempt 1 failing partway through a nonatomic commit), that's not
>> known/appreciated by others.
>>
>> The patch removes the v2 codepath from FileOutputCommitter, making it a
>> lot
>> less complicated, and when v2 is requested, a warning is printed and the
>> option ignored.
>>
>> Overkill? Maybe. But it guarantees correctness
>>
>

Re: the v2 commit algorithm

Posted by Igor Dvorzhak <id...@google.com.INVALID>.
What will be the solution for object stores to have fast and correct commit
algorithms?

On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran <st...@cloudera.com.invalid>
wrote:

> I've got a PR up to completely remove the v2 commit algorithm
>
> https://github.com/apache/hadoop/pull/2320
>
> That may seem overkill, but while *we* know there's a small window of risk
> (task attempt 1 failing partway through a nonatomic commit), that's not
> known/appreciated by others.
>
> The patch removes the v2 codepath from FileOutputCommitter, making it a lot
> less complicated, and when v2 is requested, a warning is printed and the
> option ignored.
>
> Overkill? Maybe. But it guarantees correctness
>