You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/08/25 17:45:47 UTC

[jira] [Commented] (FLINK-2394) HadoopOutFormat OutputCommitter is default to FileOutputCommiter

    [ https://issues.apache.org/jira/browse/FLINK-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711474#comment-14711474 ] 

ASF GitHub Bot commented on FLINK-2394:
---------------------------------------

GitHub user fhueske opened a pull request:

    https://github.com/apache/flink/pull/1056

    [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.

    Right now, Flink's wrappers for Hadoop OutputFormats always use a `FileOutputCommitter`.
    
    - In the `mapreduce` API, Hadoop OutputFormats have a method `getOutputCommitter()` which can be overwritten and returns the `FileOutputFormat` by default.
    - In the `mapred`API, the `OutputCommitter` should be obtained from the `JobConf`. If nothing custom is set, a `FileOutputCommitter` is returned.
    
    This PR uses the respective methods to obtain the correct `OutputCommitter`. Since, `FileOutputCommitter` is the default in both cases, the original semantics are preserved if no custom committer is implemented or set by the user.
    I also added convenience methods to the constructors of the `mapred` wrappers to set the `OutputCommitter` in the `JobConf`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhueske/flink hadoopOutCommitter

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1056
    
----
commit a632203a948f2e7973339a0eab88750f7ce70cc5
Author: Fabian Hueske <fh...@apache.org>
Date:   2015-07-30T19:47:01Z

    [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.

----


> HadoopOutFormat OutputCommitter is default to FileOutputCommiter
> ----------------------------------------------------------------
>
>                 Key: FLINK-2394
>                 URL: https://issues.apache.org/jira/browse/FLINK-2394
>             Project: Flink
>          Issue Type: Bug
>          Components: Hadoop Compatibility
>    Affects Versions: 0.9.0
>            Reporter: Stefano Bortoli
>            Assignee: Fabian Hueske
>             Fix For: 0.10, 0.9.1
>
>
> MongoOutputFormat does not write back in collection because the HadoopOutputFormat wrapper does not allow to set the MongoOutputCommiter and is set as default to FileOutputCommitter. Therefore, on close and globalFinalize execution the commit does not happen and mongo collection stays untouched. 
> A simple solution would be to:
> 1 - create a constructor of HadoopOutputFormatBase and HadoopOutputFormat that gets the OutputCommitter as a parameter
> 2 - change the outputCommitter field of HadoopOutputFormatBase to be a generic OutputCommitter
> 3 - remove the default assignment in the open() and finalizeGlobal to the outputCommitter to FileOutputCommitter(), or keep it as a default in case of no specific assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)