You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2009/09/08 04:12:58 UTC

[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

    [ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752304#action_12752304 ] 

Chris Douglas commented on MAPREDUCE-830:
-----------------------------------------

(also includes a workaround for MAPREDUCE-959, which was getting irritating, and updates the unit tests to JUnit4 semantics)

> Providing BZip2 splitting support for Text data
> -----------------------------------------------
>
>                 Key: MAPREDUCE-830
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Abdul Qadeer
>            Assignee: Abdul Qadeer
>             Fix For: 0.21.0
>
>         Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch
>
>
> HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points.  This JIRA uses that functionality in LineRecordReader.  The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers.  So BZip2 compressed data will be able to fully utilize the cluster power.  Currently BZip2 compressed Text file goes to one mapper and is not split.  So the enhancement in this JIRA provides splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.