You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Greg Roelofs (JIRA)" <ji...@apache.org> on 2010/05/18 03:20:42 UTC
[jira] Created: (MAPREDUCE-1795) add error option if file-based
record-readers fail to consume all input (e.g., concatenated gzip, bzip2)
add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)
--------------------------------------------------------------------------------------------------------
Key: MAPREDUCE-1795
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1795
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Greg Roelofs
Assignee: Ravi Gummadi
When running MapReduce with concatenated gzip files as input only the first part is read, which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1795) add error option if file-based
record-readers fail to consume all input (e.g., concatenated gzip, bzip2)
Posted by "Greg Roelofs (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Roelofs resolved MAPREDUCE-1795.
-------------------------------------
Resolution: Won't Fix
Per previous comment, we're going to fix the underlying issue instead (i.e., make decompressors support concatenated streams). See MAPREDUCE-469.
> add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)
> --------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1795
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1795
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Greg Roelofs
> Assignee: Greg Roelofs
>
> When running MapReduce with concatenated gzip files as input, only the first part ("member" in gzip spec parlance, http://www.ietf.org/rfc/rfc1952.txt) is read; the remainder is silently ignored. As a first step toward fixing that, this issue will add a configurable option to throw an error in such cases.
> MAPREDUCE-469 is the tracker for the more complete fix/feature, whenever that occurs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.