You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jaydeep Ayachit <ja...@persistent.co.in> on 2010/11/11 11:55:38 UTC

Recusrive file search for FileInputPath

Can mapreduce job recursively browse through all files and select them for processing when higher level folder is set in FileInputPath?

For example,
Dir-1
|___      Dir-2
                |____   Dir-3

If dir-1 is given in fileInput path, does it includes files from dir-2 and dir-3?

Regards
Jaydeep



DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Recusrive file search for FileInputPath

Posted by Harsh J <qw...@gmail.com>.
Hi,

On Thu, Nov 11, 2010 at 4:25 PM, Jaydeep Ayachit
<ja...@persistent.co.in> wrote:
> Can mapreduce job recursively browse through all files and select them for processing when higher level folder is set in FileInputPath?
>
> For example,
> Dir-1
> |___      Dir-2
>                |____   Dir-3
>
> If dir-1 is given in fileInput path, does it includes files from dir-2 and dir-3?
>

Not directly, no. You need to implement the logic for this yourself,
see what happens in FileInputFormat.listStatus method and override
that functionality to recurse as you need it.

In the next release, this will be given by FileInputFormat itself,
controllable by a Configuration-settable property. See Zheng Shao's
patch for that feature at MAPREDUCE-1501 :)

-- 
Harsh J
www.harshj.com