You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Muhammad Arshad <mu...@yahoo.com> on 2009/03/12 05:23:04 UTC
How to read output files over HDFS
Hi,
I am running multiple MapReduce jobs which generate their output in directories named output0, output1, output2, ...etc. Once these jobs complete i want to read the output stored in these files(line by line) using a Java code automatically.
Kindly tell me how i can do this.
I do not want to use 'hadoop dfs -get ... ...' command to first bring the output files to local directory. I would be greatful if somebody can write me a snipped of code for doing this task.
thanks,
--umer
Re: How to read output files over HDFS
Posted by lohit <lo...@yahoo.com>.
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample
Loht
----- Original Message ----
From: Amandeep Khurana <am...@gmail.com>
To: core-user@hadoop.apache.org
Sent: Wednesday, March 11, 2009 9:46:09 PM
Subject: Re: How to read output files over HDFS
2 ways that I can think of:
1. Write another MR job without a reducer. The mapper can be made to do
whatever logic you want to do.
OR
2. Take an instance of DistributedFileSystem class in your java code and use
it to read the file from HDFS.
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Wed, Mar 11, 2009 at 9:23 PM, Muhammad Arshad <mu...@yahoo.com> wrote:
> Hi,
>
> I am running multiple MapReduce jobs which generate their output in
> directories named output0, output1, output2, ...etc. Once these jobs
> complete i want to read the output stored in these files(line by line) using
> a Java code automatically.
>
> Kindly tell me how i can do this.
>
> I do not want to use 'hadoop dfs -get ... ...' command to first bring the
> output files to local directory. I would be greatful if somebody can write
> me a snipped of code for doing this task.
>
> thanks,
> --umer
>
>
>
>
Re: How to read output files over HDFS
Posted by Amandeep Khurana <am...@gmail.com>.
2 ways that I can think of:
1. Write another MR job without a reducer. The mapper can be made to do
whatever logic you want to do.
OR
2. Take an instance of DistributedFileSystem class in your java code and use
it to read the file from HDFS.
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Wed, Mar 11, 2009 at 9:23 PM, Muhammad Arshad <mu...@yahoo.com> wrote:
> Hi,
>
> I am running multiple MapReduce jobs which generate their output in
> directories named output0, output1, output2, ...etc. Once these jobs
> complete i want to read the output stored in these files(line by line) using
> a Java code automatically.
>
> Kindly tell me how i can do this.
>
> I do not want to use 'hadoop dfs -get ... ...' command to first bring the
> output files to local directory. I would be greatful if somebody can write
> me a snipped of code for doing this task.
>
> thanks,
> --umer
>
>
>
>