You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Marcos Pinto <ma...@gmail.com> on 2010/08/05 14:56:22 UTC
Problem: when I run a pig's script I got one reduce task
Hi guys, how u doing?
I am learning how to use hadoop and I got this problem:
I set up a cluster with 5 nodes( 4 datanode n 1 namenode) and I used the
same configuration for jobtracker n tasktracker.
when I run a pig's script I get many map's( like 15) but just 1 reduce!!!!!
this kills all the parallel processing. For example.
I have a file that has 1 GB and when I run the pig's script in a cluster It
takes about 50 minutes to process. =(
So I really appreciate if someone could help with any tip. Thanks for your
time.
Re: Problem: when I run a pig's script I got one reduce task
Posted by Marcos Pinto <ma...@gmail.com>.
Oh Man!!! Thanks, your tip comes in handy, =). I had tried to set some
properties but It hadnt worked out but now when I use PARALLEL clause it
works very well. thanks dude. Have nice day.
On Thu, Aug 5, 2010 at 10:38 AM, Gibbon, Robert, VF-Group <
Robert.Gibbon@vodafone.com> wrote:
>
> Use the PARALLEL clause of course!
>
> PARALLEL n
>
> Increase the parallelism of a job by specifying the number of reduce
> tasks, n. The default value for n is 1 (one reduce task). Note the
> following:
>
> * Parallel only affects the number of reduce tasks. Map parallelism
> is determined by the input file, one map for each HDFS block.
> * If you don't specify parallel, you still get the same map
> parallelism but only one reduce task.
>
> For more information, see the Pig Cookbook.
>
>
> -----Original Message-----
> From: Marcos Pinto [mailto:marcoscba@gmail.com]
> Sent: Donnerstag, 5. August 2010 14:56
> To: general@hadoop.apache.org
> Subject: Problem: when I run a pig's script I got one reduce task
>
> Hi guys, how u doing?
>
> I am learning how to use hadoop and I got this problem:
> I set up a cluster with 5 nodes( 4 datanode n 1 namenode) and I used the
> same configuration for jobtracker n tasktracker.
> when I run a pig's script I get many map's( like 15) but just 1
> reduce!!!!!
> this kills all the parallel processing. For example.
> I have a file that has 1 GB and when I run the pig's script in a cluster
> It takes about 50 minutes to process. =(
>
> So I really appreciate if someone could help with any tip. Thanks for
> your time.
>
RE: Problem: when I run a pig's script I got one reduce task
Posted by "Gibbon, Robert, VF-Group" <Ro...@vodafone.com>.
Use the PARALLEL clause of course!
PARALLEL n
Increase the parallelism of a job by specifying the number of reduce
tasks, n. The default value for n is 1 (one reduce task). Note the
following:
* Parallel only affects the number of reduce tasks. Map parallelism
is determined by the input file, one map for each HDFS block.
* If you don't specify parallel, you still get the same map
parallelism but only one reduce task.
For more information, see the Pig Cookbook.
-----Original Message-----
From: Marcos Pinto [mailto:marcoscba@gmail.com]
Sent: Donnerstag, 5. August 2010 14:56
To: general@hadoop.apache.org
Subject: Problem: when I run a pig's script I got one reduce task
Hi guys, how u doing?
I am learning how to use hadoop and I got this problem:
I set up a cluster with 5 nodes( 4 datanode n 1 namenode) and I used the
same configuration for jobtracker n tasktracker.
when I run a pig's script I get many map's( like 15) but just 1
reduce!!!!!
this kills all the parallel processing. For example.
I have a file that has 1 GB and when I run the pig's script in a cluster
It takes about 50 minutes to process. =(
So I really appreciate if someone could help with any tip. Thanks for
your time.