You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/09/26 07:42:51 UTC
[jira] Resolved: (HADOOP-374) native support for gzipped text files
[ http://issues.apache.org/jira/browse/HADOOP-374?page=all ]
Owen O'Malley resolved HADOOP-374.
----------------------------------
Fix Version/s: 0.6.2
Resolution: Fixed
> native support for gzipped text files
> -------------------------------------
>
> Key: HADOOP-374
> URL: http://issues.apache.org/jira/browse/HADOOP-374
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Reporter: Yoram Arnon
> Fix For: 0.6.2
>
>
> in many cases it is convenient to store text files in dfs as gzip compressed files.
> It would be good to have built in support for processing these files in a mapreduce job.
> The getSplits implementation should return a single split per input file, ignoring the numSplits parameter.
> One can probably subclass InputFormatBase, and the getSplits method can simply call listPaths()
> and then construct and return a single split per path returned.
> The code for reading would look something like (courtesy of Vijay Murthy):
> public RecordReader getRecordReader(FileSystem fs, FileSplit split,
> JobConf job, Reporter reporter)
> throws IOException {
> final BufferedReader in =
> new BufferedReader(new InputStreamReader
> (new GZIPInputStream(fs.open(split.getPath()))));
> return new RecordReader() {
> long position;
> public synchronized boolean next(Writable key, Writable value)
> throws IOException {
> String line = in.readLine();
> if (line != null) {
> position += line.length();
> ((UTF8)value).set(line);
> return true;
> }
> return false;
> }
> public synchronized long getPos() throws IOException {
> return position;
> }
> public synchronized void close() throws IOException {
> in.close();
> }
> };
> }
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira