You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2014/12/27 13:55:13 UTC

[jira] [Commented] (MAPREDUCE-6208) There should be an input format for MapFiles which can be configured so that only a fraction of the input data is used for the MR process

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259360#comment-14259360 ] 

Hadoop QA commented on MAPREDUCE-6208:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12689223/MAPREDUCE-6208.001.patch
  against trunk revision 1454efe.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 2 new or modified test files.

      {color:red}-1 javac{color}.  The applied patch generated 1223 javac compiler warnings (more than the trunk's current 1219 warnings).

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5095//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5095//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5095//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5095//console

This message is automatically generated.

> There should be an input format for MapFiles which can be configured so that only a fraction of the input data is used for the MR process
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6208
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: trunk
>            Reporter: Jens Rabe
>              Labels: inputformat, mapfile
>         Attachments: MAPREDUCE-6208.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In some cases there are large amounts of data organized in MapFiles, e.g., from previous MapReduce tasks, and only a fraction of the data is to be processed in a MR task. The current approach, as I understand, is to re-organize the data in a suitable partition using folders on HDFS, and only use relevant folders as input paths, and maybe doing some additional filtering in the Map task. However, sometimes the input data cannot be easily partitioned that way. For example, when processing large amounts of measured data where additional data on a time period already in HDFS arrives later.
> There should be an input format that accepts folders with MapFiles, and there should be an option to specify the input key range so that only fitting InputSplits are generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)