You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2010/10/13 21:00:17 UTC

How read index and data file?

Hi,

I would like to create an example to read an index file and the data
file that is produced as output in the map function. Can anyone give
me an example, please?

Thanks,
-- 
Pedro

Re: How read index and data file?

Posted by Gregory Lawrence <gr...@yahoo-inc.com>.

Pedro,

Could you explain what you mean by index file? Generally speaking, mapper output files are written as text files, sequence files, or some other format. What format uses an additional index file? In my experience, examining the contents of a text or sequence file can be accomplished by typing:

hadoop fs -text filename.txt

This should print out the contents in a human-readable format.

Regards,
Greg Lawrence

On 10/14/10 11:35 AM, "Pedro Costa" <ps...@gmail.com> wrote:

- My question is because I would like to read the map output data file
and I don't know why.
When I mean I don't know why, it's because I know that the Index file
contains the information about the start offset, the raw length, and
the compression length of the data file, and if I want to read the
data file I also have to pay attention to the type of key and value
that fills the file. I just would like to build an example to read the
data file with the help of the index file, and I don't know how to do
it?

- What the difference between the
org.apache.hadoop.mapred.IFile.Reader and the
org.apache.hadoop.fs.FSDataInputStream?

Thanks,




On Thu, Oct 14, 2010 at 6:21 PM, Gregory Lawrence <gr...@yahoo-inc.com> wrote:
> Pedro,
>
> I'm not sure I fully understand your question but if you are asking how to
> read in an index file in addition to the standard job input, you should look
> into writing your own setup function. It may look something like the
> following:
>
> public void setup(Context context) throws IOException, InterruptedException
> {
>      Configuration conf = context.getConfiguration();
>      initialize(conf);
>
>      Path path = new Path(fileName);
>      FileSystem fs = path.getFileSystem(conf);
>      BufferedReader reader = new BufferedReader(new
> InputStreamReader(fs.open(path)));
>      ...
>
> The setup function should also initialize any necessary data structures
> (e.g., hash tables). This, of course, assumes that your index file is small
> enough to fit in memory. You should also look into using the distributed
> cache option, as it should speed things up, especially when multiple
> Mapper/Reducer tasks run in sequence on the same machine.
>
> Regards,
> Greg Lawrence
>
> On 10/13/10 12:00 PM, "Pedro Costa" <ps...@gmail.com> wrote:
>
> Hi,
>
> I would like to create an example to read an index file and the data
> file that is produced as output in the map function. Can anyone give
> me an example, please?
>
> Thanks,
> --
> Pedro
>
>



--
Pedro

Re: How read index and data file?

Posted by Pedro Costa <ps...@gmail.com>.

- My question is because I would like to read the map output data file
and I don't know why.
When I mean I don't know why, it's because I know that the Index file
contains the information about the start offset, the raw length, and
the compression length of the data file, and if I want to read the
data file I also have to pay attention to the type of key and value
that fills the file. I just would like to build an example to read the
data file with the help of the index file, and I don't know how to do
it?

- What the difference between the
org.apache.hadoop.mapred.IFile.Reader and the
org.apache.hadoop.fs.FSDataInputStream?

Thanks,




On Thu, Oct 14, 2010 at 6:21 PM, Gregory Lawrence <gr...@yahoo-inc.com> wrote:
> Pedro,
>
> I’m not sure I fully understand your question but if you are asking how to
> read in an index file in addition to the standard job input, you should look
> into writing your own setup function. It may look something like the
> following:
>
> public void setup(Context context) throws IOException, InterruptedException
> {
>      Configuration conf = context.getConfiguration();
>      initialize(conf);
>
>      Path path = new Path(fileName);
>      FileSystem fs = path.getFileSystem(conf);
>      BufferedReader reader = new BufferedReader(new
> InputStreamReader(fs.open(path)));
>      ...
>
> The setup function should also initialize any necessary data structures
> (e.g., hash tables). This, of course, assumes that your index file is small
> enough to fit in memory. You should also look into using the distributed
> cache option, as it should speed things up, especially when multiple
> Mapper/Reducer tasks run in sequence on the same machine.
>
> Regards,
> Greg Lawrence
>
> On 10/13/10 12:00 PM, "Pedro Costa" <ps...@gmail.com> wrote:
>
> Hi,
>
> I would like to create an example to read an index file and the data
> file that is produced as output in the map function. Can anyone give
> me an example, please?
>
> Thanks,
> --
> Pedro
>
>



-- 
Pedro

Re: How read index and data file?

Posted by Gregory Lawrence <gr...@yahoo-inc.com>.

Pedro,

I'm not sure I fully understand your question but if you are asking how to read in an index file in addition to the standard job input, you should look into writing your own setup function. It may look something like the following:

public void setup(Context context) throws IOException, InterruptedException {
     Configuration conf = context.getConfiguration();
     initialize(conf);

     Path path = new Path(fileName);
     FileSystem fs = path.getFileSystem(conf);
     BufferedReader reader = new BufferedReader(new InputStreamReader(fs.open(path)));
     ...

The setup function should also initialize any necessary data structures (e.g., hash tables). This, of course, assumes that your index file is small enough to fit in memory. You should also look into using the distributed cache option, as it should speed things up, especially when multiple Mapper/Reducer tasks run in sequence on the same machine.

Regards,
Greg Lawrence

On 10/13/10 12:00 PM, "Pedro Costa" <ps...@gmail.com> wrote:

Hi,

I would like to create an example to read an index file and the data
file that is produced as output in the map function. Can anyone give
me an example, please?

Thanks,
--
Pedro