You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "lohit vijayarenu (JIRA)" <ji...@apache.org> on 2008/03/17 19:26:24 UTC
[jira] Commented: (HADOOP-1823) want InputFormat for bzip2 files
[ https://issues.apache.org/jira/browse/HADOOP-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579545#action_12579545 ]
lohit vijayarenu commented on HADOOP-1823:
------------------------------------------
I was able to use this bzip2.jar with streaming. This would be a very useful addition.
> want InputFormat for bzip2 files
> --------------------------------
>
> Key: HADOOP-1823
> URL: https://issues.apache.org/jira/browse/HADOOP-1823
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Doug Cutting
> Attachments: bzip2.jar
>
>
> Unlike gzip, the bzip file format supports splitting. Compression is by blocks (900k by default) and blocks are separated by a synchronization marker (a 48-bit approximation of Pi). This would permit very large compressed files to be split into multiple map tasks, which is not currently possible unless using a Hadoop-specific file format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.