You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (Created) (JIRA)" <ji...@apache.org> on 2011/12/29 06:56:31 UTC

[jira] [Created] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Port missing new API mapreduce lib classes to 1.x
-------------------------------------------------

                 Key: MAPREDUCE-3607
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: client
    Affects Versions: 1.0.0
            Reporter: Tom White
            Assignee: Tom White


There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.

A few examples of where this would help:
* Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
* Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
* HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Tom White (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-3607:
---------------------------------

    Attachment: MAPREDUCE-3607.patch

Here's a new patch which adds FieldSelectionMapper/Reducer, NLineInputFormat, SequenceFile input/output formats, JobControl, and partition classes, along with tests for all of the classes.

The results of test-patch:

{noformat}
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 100 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     -1 findbugs.  The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.
{noformat}

Note the findbugs warnings are present in trunk too, since this is a backport. Tests pass.

I would like this to be considered for inclusion in 1.1.0.
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Kihwal Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263656#comment-13263656 ] 

Kihwal Lee commented on MAPREDUCE-3607:
---------------------------------------

Sorry for generating traffic on the closed jira.  

I just want to find out the reason why the follwing was added.  I hear some people complaining about this.  If there is a good reason to keep it, it can probably convince them as well.  Otherwise, I will file a jira to remove the line. 

{code:title=FileInputFormat.java.diff}
-- hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java	2011/11/27 21:31:26	1206848
+++ hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java	2012/01/24 23:30:12	1235551
@@ -422,6 +422,7 @@
    */
   public static Path[] getInputPaths(JobContext context) {
     String dirs = context.getConfiguration().get("mapred.input.dir", "");
+    System.out.println("****" + dirs);
     String [] list = StringUtils.split(dirs);
     Path[] result = new Path[list.length];
     for (int i = 0; i < list.length; i++) {
{code}
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 1.0.1
>
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Tom White (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White resolved MAPREDUCE-3607.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.1
     Hadoop Flags: Reviewed

Thanks for the review Mahadev. I've committed this to branch-1 and branch-1.0. 
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 1.0.1
>
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Matt Foley (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated MAPREDUCE-3607:
----------------------------------

    Target Version/s: 1.0.1  (was: 1.1.0)

Please commit to both branch-1 and branch-1.0.  Thank you.
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Matt Foley (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188594#comment-13188594 ] 

Matt Foley edited comment on MAPREDUCE-3607 at 1/18/12 6:29 PM:
----------------------------------------------------------------

Tom, SQOOP-384 lists four mapreduce APIs needed by sqoop, and you've included all four of them in this patch.  However, they also need a different signature of org.apache.hadoop.conf.Configuration.getInstances, as discussed in [SQOOP-384 comment 13166568 | https://issues.apache.org/jira/browse/SQOOP-384?focusedCommentId=13166568&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13166568] and shown in the patch attached to that jira.

Can you add that API to this patch, please?
                
      was (Author: mattf):
    Tom, SQOOP-384 lists four mapreduce APIs needed by sqoop, and you've included all four of them in this patch.  However, they also need a different signature of org.apache.hadoop.conf.Configuration.getInstances, as discussed in [SQOOP-384 comment 13166568 | https://issues.apache.org/jira/browse/SQOOP-384?focusedCommentId=13166568&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13166568] and shown in the patch attached to that jira.

Can you add that API to the patch, please?
                  
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Tom White (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-3607:
---------------------------------

    Attachment: MAPREDUCE-3607.patch

Here's an initial patch which adds support (and tests) for the DB classes, CombineFileInputFormat, KeyValueInputFormat, MultipleInputs, MultipleOutputs, and BinaryPartitioner.

This is a work in progress - I intend to add more classes.
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Kihwal Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264069#comment-13264069 ] 

Kihwal Lee commented on MAPREDUCE-3607:
---------------------------------------

Tom: Thanks for the clarification. MAPREDUCE-4207 has been filed.
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 1.0.1
>
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263725#comment-13263725 ] 

Tom White commented on MAPREDUCE-3607:
--------------------------------------

Kihwal - adding this line was clearly a mistake and so it should be removed. Please go ahead and file a JIRA.
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 1.0.1
>
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190074#comment-13190074 ] 

Mahadev konar commented on MAPREDUCE-3607:
------------------------------------------

+1 the changes look good to me.
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

Posted by "Tom White (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-3607:
---------------------------------

    Attachment: MAPREDUCE-3607.patch

I updated the patch with Configuration.getInstances() added. I also tested Sqoop with the copy of Hadoop build using this patch and all of its unit tests passed (see SQOOP-384).
                
> Port missing new API mapreduce lib classes to 1.x
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.0.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch, MAPREDUCE-3607.patch
>
>
> There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
> A few examples of where this would help:
> * Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
> * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
> * HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira