You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Daniel Garcia <da...@danielgarcia.info> on 2009/12/03 17:25:34 UTC

Rewriting in image resizing program in terms of map reduce

Hello!

I'm trying to rewrite an image resizing program in terms of map/reduce. The
problem I see is that the job is not broken up in to small enough tasks. If
I only have 1 input file with 10,000 urls (the file is much less than the
HDFS block size) how can I ensure that the job is distributed amongst all
the nodes. In other words how can I ensure that the task size is  small
enough so that all nodes process a proportional size of the input.

Regards,
Daniel

Re: Rewriting in image resizing program in terms of map reduce

Posted by Chris Douglas <ch...@gmail.com>.

Consider NLineInputFormat. -C

On Fri, Dec 4, 2009 at 5:34 PM, Ted Xu <te...@gmail.com> wrote:
> Hi Daniel,
>
> I think there are better solutions, but simply chop the input file into
> pieces ( i.e. 10 urls per file ) shall work.
>
> 2009/12/4 Daniel Garcia <da...@danielgarcia.info>
>>
>> Hello!
>> I'm trying to rewrite an image resizing program in terms of
>> map/reduce. The problem I see is that the job is not broken up in to small
>> enough tasks. If I only have 1 input file with 10,000 urls (the file is much
>> less than the HDFS block size) how can I ensure that the job is distributed
>> amongst all the nodes. In other words how can I ensure that the task size is
>>  small enough so that all nodes process a proportional size of the input.
>> Regards,
>> Daniel
>
> Best Regards,
>
> Tex Xu
>

Re: Rewriting in image resizing program in terms of map reduce

Posted by Ted Xu <te...@gmail.com>.

Hi Daniel,

I think there are better solutions, but simply chop the input file into
pieces ( i.e. 10 urls per file ) shall work.

2009/12/4 Daniel Garcia <da...@danielgarcia.info>

> Hello!
>
> I'm trying to rewrite an image resizing program in terms of map/reduce. The
> problem I see is that the job is not broken up in to small enough tasks. If
> I only have 1 input file with 10,000 urls (the file is much less than the
> HDFS block size) how can I ensure that the job is distributed amongst all
> the nodes. In other words how can I ensure that the task size is  small
> enough so that all nodes process a proportional size of the input.
>
> Regards,
> Daniel
>
>
Best Regards,

Tex Xu