You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Boyu Zhang <bz...@cs.utsa.edu> on 2009/09/11 23:10:11 UTC

Hadoop Input Files Directory

Dear All,

 

I have an input directories of depth 3, the actual files are in the deepest
levels. (something like /data/user/dir_0/file0 , /data/user/dir_1/file0,
/data/user/dir_2/file0) And I want to write a mapreduce job to process these
files in the deepest levels. 

 

One way of doing so is to specify the input path to the directories that
contain the files, like /data/user/dir_0, /data/user/dir_1,
/data/user/dir_2. But this way is not feasible when I have much more
directories as I will. I tried to specify the input path as /data/user, but
I get error of cannot open filename /data/user/dir_0. 

 

My question is that is there any way that I can process all the files in a
hierarchy with the input path set to the top level?

 

Thanks a lot for the time!

 

Boyu Zhang

University of Delaware

RE: Hadoop Input Files Directory

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

An alternative will be to use hadoop fs apis to recursively list file status and pass that as the input files . This is slightly complicated but will give you more control and might help while debugging as well.
Just a thought.

Thanks,
Amogh

-----Original Message-----
From: Amandeep Khurana [mailto:amansk@gmail.com] 
Sent: Saturday, September 12, 2009 3:03 AM
To: common-user@hadoop.apache.org
Subject: Re: Hadoop Input Files Directory

You can give something like /path/to/directories/*/*/*


On Fri, Sep 11, 2009 at 2:10 PM, Boyu Zhang <bz...@cs.utsa.edu> wrote:

> Dear All,
>
>
>
> I have an input directories of depth 3, the actual files are in the deepest
> levels. (something like /data/user/dir_0/file0 , /data/user/dir_1/file0,
> /data/user/dir_2/file0) And I want to write a mapreduce job to process
> these
> files in the deepest levels.
>
>
>
> One way of doing so is to specify the input path to the directories that
> contain the files, like /data/user/dir_0, /data/user/dir_1,
> /data/user/dir_2. But this way is not feasible when I have much more
> directories as I will. I tried to specify the input path as /data/user, but
> I get error of cannot open filename /data/user/dir_0.
>
>
>
> My question is that is there any way that I can process all the files in a
> hierarchy with the input path set to the top level?
>
>
>
> Thanks a lot for the time!
>
>
>
> Boyu Zhang
>
> University of Delaware
>
>

Re: Hadoop Input Files Directory

Posted by Amandeep Khurana <am...@gmail.com>.

You can give something like /path/to/directories/*/*/*


On Fri, Sep 11, 2009 at 2:10 PM, Boyu Zhang <bz...@cs.utsa.edu> wrote:

> Dear All,
>
>
>
> I have an input directories of depth 3, the actual files are in the deepest
> levels. (something like /data/user/dir_0/file0 , /data/user/dir_1/file0,
> /data/user/dir_2/file0) And I want to write a mapreduce job to process
> these
> files in the deepest levels.
>
>
>
> One way of doing so is to specify the input path to the directories that
> contain the files, like /data/user/dir_0, /data/user/dir_1,
> /data/user/dir_2. But this way is not feasible when I have much more
> directories as I will. I tried to specify the input path as /data/user, but
> I get error of cannot open filename /data/user/dir_0.
>
>
>
> My question is that is there any way that I can process all the files in a
> hierarchy with the input path set to the top level?
>
>
>
> Thanks a lot for the time!
>
>
>
> Boyu Zhang
>
> University of Delaware
>
>