You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2009/10/22 00:22:59 UTC

[jira] Created: (MAPREDUCE-1130) Provide a way to open and read a side file using an existing InputFormat

Provide a way to open and read a side file using an existing InputFormat
------------------------------------------------------------------------

                 Key: MAPREDUCE-1130
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1130
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
            Reporter: Pradeep Kamath


In the Pig subproject there is a need to open a side file for implementing map side joins. In some cases, the entire file needs to be read as a side file and in some cases, there is a need to read a file beginning from a particular split to the last split. In order to use existing InputFormats to achieve this, the pig code would need to mimic hadoop in terms of calling InputFormat.getSplits and then for each split call  InputFormat.createRecordReader, RecordReader.initialize() and then call RecordReader.nextKey() repeatedly till we reach end of split - and then continue to the next split. It would be good if there are some utility methods in Hadoop to achieve this - to read the file partially to the end or entirely to the end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.