You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/01/09 13:53:58 UTC

[jira] [Updated] (MAPREDUCE-6823) FileOutputFormat to support configurable FileOutputCommitter factory

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated MAPREDUCE-6823:
--------------------------------------
    Status: Open  (was: Patch Available)

Cancelling this PoC; redesigning. In order to support existing subclasses of FOF (e.g. the Parquet one); we'll have to come in lower.

I propose adding a new algorithm, "3", which really means "plug in a new committer of classname X", with another property to define that classname. We can then add an s3 committer which supports this new protocol.

This does mean that we will need to define a committer plugin...that we can declare as unstable/limited private, and implement the s3a one

> FileOutputFormat to support configurable FileOutputCommitter factory
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6823
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6823
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.0.0-alpha2
>         Environment: Targeting S3 as the output of work
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch
>
>
> In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which can talk direct to the S3A Filesystem for more efficient operations, better failure modes, and, most critically, as part of HADOOP-13345, atomic commit of output. The normal committer relies on directory rename() being atomic for this; for S3 we don't have that luxury.
> To support a custom committer, we need to be able to tell FileOutputFormat (and implicitly, all subclasses which don't have their own custom committer), to use our new {{S3AOutputCommitter}}.
> I propose: 
> # {{FileOutputFormat}} takes a factory to create committers.
> # The factory to take a URI and {{TaskAttemptContext}} and return a committer
> # the default implementation always returns a {{FileOutputCommitter}}
> # A configuration option allows a new factory to be named
> # An {{S3AOutputCommitterFactory}} to return a  {{FileOutputCommitter}} or new {{S3AOutputCommitter}} depending upon the URI of the destination.
> Note that MRv1 already supports configurable committers; this is only the V2 API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org