You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Terry Healy <th...@bnl.gov> on 2012/10/08 16:21:44 UTC

One file per mapper?

Hello-

I know that it is contrary to normal Hadoop operation, but how can I
configure my M/R job to send one complete file to each mapper task? This
is intended to be used on many files in the 1.5 MB range as the first
step in a chain of processes.

thanks.

Re: One file per mapper?

Posted by Terry Healy <th...@bnl.gov>.
thanks Bejoy.

...Feeling a bit foolish as Tom White's book was 2 feet away....

On 10/08/2012 10:28 AM, Bejoy Ks wrote:
> Hi Terry
> 
> If you are having files smaller than hdfs block size and if you are
> using Default TextInputFormat with the default properties for split
> sizes there would be just one file per mapper.
> 
> If you are having larger file sizes, greater than the size of a hdfs
> block. Please take a look at a sample implemention of
> 'WholeFileInputFormat' from 'Hadoop - The Definitive Guide' by Tom White.
> http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false
> 
> 
> 
> On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <thealy@bnl.gov
> <ma...@bnl.gov>> wrote:
> 
>     Hello-
> 
>     I know that it is contrary to normal Hadoop operation, but how can I
>     configure my M/R job to send one complete file to each mapper task? This
>     is intended to be used on many files in the 1.5 MB range as the first
>     step in a chain of processes.
> 
>     thanks.
> 
> 

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: One file per mapper?

Posted by Terry Healy <th...@bnl.gov>.
thanks Bejoy.

...Feeling a bit foolish as Tom White's book was 2 feet away....

On 10/08/2012 10:28 AM, Bejoy Ks wrote:
> Hi Terry
> 
> If you are having files smaller than hdfs block size and if you are
> using Default TextInputFormat with the default properties for split
> sizes there would be just one file per mapper.
> 
> If you are having larger file sizes, greater than the size of a hdfs
> block. Please take a look at a sample implemention of
> 'WholeFileInputFormat' from 'Hadoop - The Definitive Guide' by Tom White.
> http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false
> 
> 
> 
> On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <thealy@bnl.gov
> <ma...@bnl.gov>> wrote:
> 
>     Hello-
> 
>     I know that it is contrary to normal Hadoop operation, but how can I
>     configure my M/R job to send one complete file to each mapper task? This
>     is intended to be used on many files in the 1.5 MB range as the first
>     step in a chain of processes.
> 
>     thanks.
> 
> 

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: One file per mapper?

Posted by Terry Healy <th...@bnl.gov>.
thanks Bejoy.

...Feeling a bit foolish as Tom White's book was 2 feet away....

On 10/08/2012 10:28 AM, Bejoy Ks wrote:
> Hi Terry
> 
> If you are having files smaller than hdfs block size and if you are
> using Default TextInputFormat with the default properties for split
> sizes there would be just one file per mapper.
> 
> If you are having larger file sizes, greater than the size of a hdfs
> block. Please take a look at a sample implemention of
> 'WholeFileInputFormat' from 'Hadoop - The Definitive Guide' by Tom White.
> http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false
> 
> 
> 
> On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <thealy@bnl.gov
> <ma...@bnl.gov>> wrote:
> 
>     Hello-
> 
>     I know that it is contrary to normal Hadoop operation, but how can I
>     configure my M/R job to send one complete file to each mapper task? This
>     is intended to be used on many files in the 1.5 MB range as the first
>     step in a chain of processes.
> 
>     thanks.
> 
> 

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: One file per mapper?

Posted by Terry Healy <th...@bnl.gov>.
thanks Bejoy.

...Feeling a bit foolish as Tom White's book was 2 feet away....

On 10/08/2012 10:28 AM, Bejoy Ks wrote:
> Hi Terry
> 
> If you are having files smaller than hdfs block size and if you are
> using Default TextInputFormat with the default properties for split
> sizes there would be just one file per mapper.
> 
> If you are having larger file sizes, greater than the size of a hdfs
> block. Please take a look at a sample implemention of
> 'WholeFileInputFormat' from 'Hadoop - The Definitive Guide' by Tom White.
> http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false
> 
> 
> 
> On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <thealy@bnl.gov
> <ma...@bnl.gov>> wrote:
> 
>     Hello-
> 
>     I know that it is contrary to normal Hadoop operation, but how can I
>     configure my M/R job to send one complete file to each mapper task? This
>     is intended to be used on many files in the 1.5 MB range as the first
>     step in a chain of processes.
> 
>     thanks.
> 
> 

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Re: One file per mapper?

Posted by Bejoy Ks <be...@gmail.com>.
Hi Terry

If you are having files smaller than hdfs block size and if you are using
Default TextInputFormat with the default properties for split sizes there
would be just one file per mapper.

If you are having larger file sizes, greater than the size of a hdfs block.
Please take a look at a sample implemention of 'WholeFileInputFormat' from
'Hadoop - The Definitive Guide' by Tom White.
http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false



On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <th...@bnl.gov> wrote:

> Hello-
>
> I know that it is contrary to normal Hadoop operation, but how can I
> configure my M/R job to send one complete file to each mapper task? This
> is intended to be used on many files in the 1.5 MB range as the first
> step in a chain of processes.
>
> thanks.
>

Re: One file per mapper?

Posted by Bejoy Ks <be...@gmail.com>.
Hi Terry

If you are having files smaller than hdfs block size and if you are using
Default TextInputFormat with the default properties for split sizes there
would be just one file per mapper.

If you are having larger file sizes, greater than the size of a hdfs block.
Please take a look at a sample implemention of 'WholeFileInputFormat' from
'Hadoop - The Definitive Guide' by Tom White.
http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false



On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <th...@bnl.gov> wrote:

> Hello-
>
> I know that it is contrary to normal Hadoop operation, but how can I
> configure my M/R job to send one complete file to each mapper task? This
> is intended to be used on many files in the 1.5 MB range as the first
> step in a chain of processes.
>
> thanks.
>

Re: One file per mapper?

Posted by Bejoy Ks <be...@gmail.com>.
Hi Terry

If you are having files smaller than hdfs block size and if you are using
Default TextInputFormat with the default properties for split sizes there
would be just one file per mapper.

If you are having larger file sizes, greater than the size of a hdfs block.
Please take a look at a sample implemention of 'WholeFileInputFormat' from
'Hadoop - The Definitive Guide' by Tom White.
http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false



On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <th...@bnl.gov> wrote:

> Hello-
>
> I know that it is contrary to normal Hadoop operation, but how can I
> configure my M/R job to send one complete file to each mapper task? This
> is intended to be used on many files in the 1.5 MB range as the first
> step in a chain of processes.
>
> thanks.
>

Re: One file per mapper?

Posted by Bejoy Ks <be...@gmail.com>.
Hi Terry

If you are having files smaller than hdfs block size and if you are using
Default TextInputFormat with the default properties for split sizes there
would be just one file per mapper.

If you are having larger file sizes, greater than the size of a hdfs block.
Please take a look at a sample implemention of 'WholeFileInputFormat' from
'Hadoop - The Definitive Guide' by Tom White.
http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false



On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <th...@bnl.gov> wrote:

> Hello-
>
> I know that it is contrary to normal Hadoop operation, but how can I
> configure my M/R job to send one complete file to each mapper task? This
> is intended to be used on many files in the 1.5 MB range as the first
> step in a chain of processes.
>
> thanks.
>