You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Boyu Zhang <bo...@gmail.com> on 2009/09/11 23:17:45 UTC

Hadoop Input File Directory

Dear all,

 

I have an input file hierarchy of depth 3, something like
/data/user/dir_0/file0, /data/user/dir_1/file0, /data/user/dir_2/file0. I
want to run a mapreduce job to process all the files in the deepest levels.

 

One way of doing so is to specify the input path like /data/user/dir_0,
/data/user/dir_1, /data/user/dir_2, but this becomes infeasible when the
hierarchy grows.

 

I tried to specify the input path as /data/user, but I got errors like:
cannot open filename /data/user/dir_0.

 

My question is that is there any way that I can process all the files with
specifying the input data to the top level?

 

Thanks a lot!

 

Boyu Zhang

University of Delaware