You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Ramasubramanian Narayanan <ra...@gmail.com> on 2012/11/07 17:11:20 UTC

Regarding MapReduce Input Format

Hi,

I came across the below question and I feel 'D' is the correct answer but
in some site it is mentioned that 'B' is the correct answer... Can you
please tell which is the right one with explanation pls...

In a MapReduce job, you want each of you input files processed by a single
map task. How do you
configure a MapReduce job so that a single map task processes each input
file regardless of how
many blocks the input file occupies?
A. Increase the parameter that controls minimum split size in the job
configuration.
B. Write a custom MapRunner that iterates over all key-value pairs in the
entire file.
C. Set the number of mappers equal to the number of input files you want to
process.
D. Write a custom FileInputFormat and override the method isSplittable to
always return false.

regards,
Rams

Re: Regarding MapReduce Input Format

Posted by Harsh J <ha...@cloudera.com>.

You are correct. (D) automatically does (B).

On Wed, Nov 7, 2012 at 9:41 PM, Ramasubramanian Narayanan
<ra...@gmail.com> wrote:
> Hi,
>
> I came across the below question and I feel 'D' is the correct answer but in
> some site it is mentioned that 'B' is the correct answer... Can you please
> tell which is the right one with explanation pls...
>
> In a MapReduce job, you want each of you input files processed by a single
> map task. How do you
> configure a MapReduce job so that a single map task processes each input
> file regardless of how
> many blocks the input file occupies?
> A. Increase the parameter that controls minimum split size in the job
> configuration.
> B. Write a custom MapRunner that iterates over all key-value pairs in the
> entire file.
> C. Set the number of mappers equal to the number of input files you want to
> process.
> D. Write a custom FileInputFormat and override the method isSplittable to
> always return false.
>
> regards,
> Rams



-- 
Harsh J

Re: Regarding MapReduce Input Format

Posted by Harsh J <ha...@cloudera.com>.

You are correct. (D) automatically does (B).

On Wed, Nov 7, 2012 at 9:41 PM, Ramasubramanian Narayanan
<ra...@gmail.com> wrote:
> Hi,
>
> I came across the below question and I feel 'D' is the correct answer but in
> some site it is mentioned that 'B' is the correct answer... Can you please
> tell which is the right one with explanation pls...
>
> In a MapReduce job, you want each of you input files processed by a single
> map task. How do you
> configure a MapReduce job so that a single map task processes each input
> file regardless of how
> many blocks the input file occupies?
> A. Increase the parameter that controls minimum split size in the job
> configuration.
> B. Write a custom MapRunner that iterates over all key-value pairs in the
> entire file.
> C. Set the number of mappers equal to the number of input files you want to
> process.
> D. Write a custom FileInputFormat and override the method isSplittable to
> always return false.
>
> regards,
> Rams



-- 
Harsh J

Re: Regarding MapReduce Input Format

Posted by Harsh J <ha...@cloudera.com>.

You are correct. (D) automatically does (B).

On Wed, Nov 7, 2012 at 9:41 PM, Ramasubramanian Narayanan
<ra...@gmail.com> wrote:
> Hi,
>
> I came across the below question and I feel 'D' is the correct answer but in
> some site it is mentioned that 'B' is the correct answer... Can you please
> tell which is the right one with explanation pls...
>
> In a MapReduce job, you want each of you input files processed by a single
> map task. How do you
> configure a MapReduce job so that a single map task processes each input
> file regardless of how
> many blocks the input file occupies?
> A. Increase the parameter that controls minimum split size in the job
> configuration.
> B. Write a custom MapRunner that iterates over all key-value pairs in the
> entire file.
> C. Set the number of mappers equal to the number of input files you want to
> process.
> D. Write a custom FileInputFormat and override the method isSplittable to
> always return false.
>
> regards,
> Rams



-- 
Harsh J

Re: Regarding MapReduce Input Format

Posted by Harsh J <ha...@cloudera.com>.

You are correct. (D) automatically does (B).

On Wed, Nov 7, 2012 at 9:41 PM, Ramasubramanian Narayanan
<ra...@gmail.com> wrote:
> Hi,
>
> I came across the below question and I feel 'D' is the correct answer but in
> some site it is mentioned that 'B' is the correct answer... Can you please
> tell which is the right one with explanation pls...
>
> In a MapReduce job, you want each of you input files processed by a single
> map task. How do you
> configure a MapReduce job so that a single map task processes each input
> file regardless of how
> many blocks the input file occupies?
> A. Increase the parameter that controls minimum split size in the job
> configuration.
> B. Write a custom MapRunner that iterates over all key-value pairs in the
> entire file.
> C. Set the number of mappers equal to the number of input files you want to
> process.
> D. Write a custom FileInputFormat and override the method isSplittable to
> always return false.
>
> regards,
> Rams



-- 
Harsh J