You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sonal Goyal <so...@gmail.com> on 2010/05/04 16:56:54 UTC

Re: having a directory as input split

One way to do this will be:

Create a DirectoryInputFormat which accepts the list of directories as
inputs and emits each directory path in one split. Your custom RecordReader
can then read this split and generate appropriate input for your mapper.

Thanks and Regards,
Sonal
www.meghsoft.com


On Fri, Apr 30, 2010 at 11:48 AM, akhil1988 <ak...@gmail.com> wrote:

>
> How can I make a directory as a InputSplit rather than a file. I want that
> the input split available to a map task should be a directory and not a
> file. And I will implement my own record reader which will read appropriate
> data from the directory and thus give the records to the map tasks.
>
> To explain in other words,
> I have a list of directories distributed over hdfs and I know that each of
> these directories is small enough to be present on a single node. I want
> that one directory to be given  to each map task rather than the files
> present in it. How to do this?
>
> Thanks,
>  Akhil
> --
> View this message in context:
> http://old.nabble.com/having-a-directory-as-input-split-tp28408886p28408886.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>