You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by hyoung jun kim <ha...@gmail.com> on 2008/05/07 09:57:32 UTC
how many map tasks?
Hi all,
I read "pig_hadoopsummit.pdf" and tried it.
I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2.
And ran this script.
Visits= load '/dir1/visit as (user, url, time);
Visits= foreach Visits generate user, url, time;
Pages= load '/dir2/page' as (url, pagerank);
VP= join Visits by url, Pages by url;
Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr;
store Results into '/data/users';
I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks.
But Hadoop makes 2 map tasks and 1 reduce task.
Why hadoop made only 2 map tasks?
Test environment:
- 5 hadoop cluster
- hadoop 0.16.3
- pig upated from svn repository on May.7
Re: how many map tasks?
Posted by hyoung jun kim <ha...@gmail.com>.
Thanks. but I cheecked hadoop configuraion.
"dfs.block.size" value is 67108864. and I also checked input file's number
of blocks.
Input file has 6 blocks.
2008/5/8 Vitthal Gogate <go...@yahoo-inc.com>:
> Sorry I mean check the "dfs.block.size" parameter in hadoop-site.xml file
> in
> the $HADOOP_HOME/conf directory, it may be configured as 128MB.
>
> Sorry,in the following reply, I kind of assumed default size as 128MB :)
>
> regards
>
>
> On 5/7/08 9:42 AM, "Vitthal Gogate" <go...@yahoo-inc.com> wrote:
>
> > I assume block size is 128MB. I guess it is specified in hadoop site
> > configuration file. Also for join the the mapred program creates number
> of
> > map tasks based on combined size of both tables/files being joined..
> >
> > -regards, Suhas
> >
> >
> > On 5/7/08 12:57 AM, "hyoung jun kim" <ha...@gmail.com> wrote:
> >
> >> Hi all,
> >> I read "pig_hadoopsummit.pdf" and tried it.
> >> I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2.
> >> And ran this script.
> >>
> >> Visits= load '/dir1/visit as (user, url, time);
> >> Visits= foreach Visits generate user, url, time;
> >> Pages= load '/dir2/page' as (url, pagerank);
> >> VP= join Visits by url, Pages by url;
> >> Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr;
> >> store Results into '/data/users';
> >>
> >> I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks.
> >> But Hadoop makes 2 map tasks and 1 reduce task.
> >> Why hadoop made only 2 map tasks?
> >>
> >> Test environment:
> >> - 5 hadoop cluster
> >> - hadoop 0.16.3
> >> - pig upated from svn repository on May.7
> >
>
>
Re: how many map tasks?
Posted by Vitthal Gogate <go...@yahoo-inc.com>.
Sorry I mean check the "dfs.block.size" parameter in hadoop-site.xml file in
the $HADOOP_HOME/conf directory, it may be configured as 128MB.
Sorry,in the following reply, I kind of assumed default size as 128MB :)
regards
On 5/7/08 9:42 AM, "Vitthal Gogate" <go...@yahoo-inc.com> wrote:
> I assume block size is 128MB. I guess it is specified in hadoop site
> configuration file. Also for join the the mapred program creates number of
> map tasks based on combined size of both tables/files being joined..
>
> -regards, Suhas
>
>
> On 5/7/08 12:57 AM, "hyoung jun kim" <ha...@gmail.com> wrote:
>
>> Hi all,
>> I read "pig_hadoopsummit.pdf" and tried it.
>> I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2.
>> And ran this script.
>>
>> Visits= load '/dir1/visit as (user, url, time);
>> Visits= foreach Visits generate user, url, time;
>> Pages= load '/dir2/page' as (url, pagerank);
>> VP= join Visits by url, Pages by url;
>> Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr;
>> store Results into '/data/users';
>>
>> I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks.
>> But Hadoop makes 2 map tasks and 1 reduce task.
>> Why hadoop made only 2 map tasks?
>>
>> Test environment:
>> - 5 hadoop cluster
>> - hadoop 0.16.3
>> - pig upated from svn repository on May.7
>
Re: how many map tasks?
Posted by Vitthal Gogate <go...@yahoo-inc.com>.
I assume block size is 128MB. I guess it is specified in hadoop site
configuration file. Also for join the the mapred program creates number of
map tasks based on combined size of both tables/files being joined..
-regards, Suhas
On 5/7/08 12:57 AM, "hyoung jun kim" <ha...@gmail.com> wrote:
> Hi all,
> I read "pig_hadoopsummit.pdf" and tried it.
> I made a 320MB file (visit) in dir1 and a 20MB file (page) in dir2.
> And ran this script.
>
> Visits= load '/dir1/visit as (user, url, time);
> Visits= foreach Visits generate user, url, time;
> Pages= load '/dir2/page' as (url, pagerank);
> VP= join Visits by url, Pages by url;
> Results = foreach UserVisits generate group, AVG(VP.pagerank) as avgpr;
> store Results into '/data/users';
>
> I expected 6 maps(320MB/64MB) + 1 map(20MB) tasks.
> But Hadoop makes 2 map tasks and 1 reduce task.
> Why hadoop made only 2 map tasks?
>
> Test environment:
> - 5 hadoop cluster
> - hadoop 0.16.3
> - pig upated from svn repository on May.7