You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Daniel,Wu" <ha...@163.com> on 2011/08/23 15:51:22 UTC

Why a sql only use one map task?

  I run the following simple sql
select count(*) from sales;
And the job information shows it only uses one map task.

The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task?  in my config, I didn't change hive config.

Re:Re: Why a sql only use one map task?

Posted by "Daniel,Wu" <ha...@163.com>.
No, I didn't use zip, it's just simple csv file, and then use the command
       load data  local inpath '/home/oracle/sales.csv' into table test; 
to load into hive.  I am not sure whether this command alone can distribute the file evenly into the cluster (on 3 nodes). So I used the following command in the hope to split the file into cluster.
     create table sales as select * from test;

But when I check the map tasks, it shows I have 8 splits, but all are on node test1.  If I run the sql
   select period_key,count(*) from sales group by period_key,  then it will kick of ONE map task, and 3 reduce tasks. So looks like it always uses one map tasks.  I have 2 questions:
1: why hadoop doesn't distribute the input split evenly on to each node, shouldn't we put 3 split on 2 nodes, and then 2 splits on one node  ( 3*2  +2=8 splits)?
2: how to create multiple map tasks?



Input Split Locations
/default-rack/test1
/default-rack/test1
/default-rack/test1
/default-rack/test1
/default-rack/test1
/default-rack/test1
/default-rack/test1
/default-rack/test1

At 2011-08-23 21:58:04,"Vikas Srivastava" <vi...@one97.net> wrote:
hey did u storing data in zipped format

if yes becoz of that its only split in single map.


2011/8/23 Daniel,Wu<ha...@163.com>

  I run the following simple sql
select count(*) from sales;
And the job information shows it only uses one map task.

The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task?  in my config, I didn't change hive config.







--
With Regards
Vikas Srivastava

DWH & Analytics Team
Mob:+91 9560885900
One97 | Let's get talking !


Re: Why a sql only use one map task?

Posted by Vikas Srivastava <vi...@one97.net>.
hey did u storing data in zipped format

if yes becoz of that its only split in single map.

2011/8/23 Daniel,Wu <ha...@163.com>

>   I run the following simple sql
> select count(*) from sales;
> And the job information shows it only uses one map task.
>
> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
> off 3 map tasks, one on each task nodes. What can make hive only run one map
> task? Do I need to set something to kick off multiple map task?  in my
> config, I didn't change hive config.
>
>
>


-- 
With Regards
Vikas Srivastava

DWH & Analytics Team
Mob:+91 9560885900
One97 | Let's get talking !

Re: Re:Re: Re: RE: Why a sql only use one map task?

Posted by be...@yahoo.com.
Hi Daniel
         In the hadoop eco system the number of map tasks is actually decided by the job basically based  no of input splits . Setting mapred.map.tasks wouldn't assure that only that many number of map tasks are triggered. What worked out here for you is that you were specifying that a map tasks should process a min data volume by setting value for mapred.min.split size.
 So in your case in real there were 9 input splits but when you imposed a constrain on the min data that a map task should handle, the map tasks came down to 3. 
Regards
Bejoy K S

-----Original Message-----
From: "Daniel,Wu" <ha...@163.com>
Date: Thu, 25 Aug 2011 20:02:43 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re:Re:Re: Re: RE: Why a sql only use one map task?

after I set
set mapred.min.split.size=200000000;

Then it will kick off 3 map tasks (the file I have is 500M).  So looks like we need to set mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off.


At 2011-08-25 19:38:30,"Daniel,Wu" <ha...@163.com> wrote:

It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set
set mapred.map.tasks=2;


Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%


900900 / 0
reduce100.00%


100100 / 0



At 2011-08-25 06:35:38,"Ashutosh Chauhan" <ha...@apache.org> wrote:
This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat


2011/8/24 Daniel,Wu<ha...@163.com>

I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set  mapred.map.tasks, such as 3,  it doesn't work, as it always use 1 map task (please see the completed job information).



Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes
00630000664.0000


Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%


00100.00%


1 1NANA
job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA



At 2011-08-24 18:19:38,wd <wd...@wdicc.com> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp

>
>2011/8/24 Daniel,Wu <ha...@163.com>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>>
>>
>>
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <va...@amazon.com> wrote:
>>
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>>
>>
>>
>> mapred.max.split.size appropriately.
>>
>>
>>
>> Thanks
>>
>> Vaibhav
>>
>>
>>
>> From: Daniel,Wu [mailto:hadoop_wu@163.com]
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>
>>
>>
>>   I run the following simple sql
>> select count(*) from sales;
>> And the job information shows it only uses one map task.
>>
>> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
>> off 3 map tasks, one on each task nodes. What can make hive only run one map
>> task? Do I need to set something to kick off multiple map task?  in my
>> config, I didn't change hive config.
>>
>>
>>
>>










Re:Re:Re: Re: RE: Why a sql only use one map task?

Posted by "Daniel,Wu" <ha...@163.com>.
after I set
set mapred.min.split.size=200000000;

Then it will kick off 3 map tasks (the file I have is 500M).  So looks like we need to set mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off.


At 2011-08-25 19:38:30,"Daniel,Wu" <ha...@163.com> wrote:

It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set
set mapred.map.tasks=2;


Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%


900900 / 0
reduce100.00%


100100 / 0



At 2011-08-25 06:35:38,"Ashutosh Chauhan" <ha...@apache.org> wrote:
This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat


2011/8/24 Daniel,Wu<ha...@163.com>

I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set  mapred.map.tasks, such as 3,  it doesn't work, as it always use 1 map task (please see the completed job information).



Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes
00630000664.0000


Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%


00100.00%


1 1NANA
job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA



At 2011-08-24 18:19:38,wd <wd...@wdicc.com> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp

>
>2011/8/24 Daniel,Wu <ha...@163.com>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>>
>>
>>
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <va...@amazon.com> wrote:
>>
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>>
>>
>>
>> mapred.max.split.size appropriately.
>>
>>
>>
>> Thanks
>>
>> Vaibhav
>>
>>
>>
>> From: Daniel,Wu [mailto:hadoop_wu@163.com]
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>
>>
>>
>>   I run the following simple sql
>> select count(*) from sales;
>> And the job information shows it only uses one map task.
>>
>> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
>> off 3 map tasks, one on each task nodes. What can make hive only run one map
>> task? Do I need to set something to kick off multiple map task?  in my
>> config, I didn't change hive config.
>>
>>
>>
>>









Re:Re: Re: RE: Why a sql only use one map task?

Posted by "Daniel,Wu" <ha...@163.com>.
It works, after I set as you said, but looks like I can't control the map task, it always use 9 maps, even if I set
set mapred.map.tasks=2;


Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%


900900 / 0
reduce100.00%


100100 / 0



At 2011-08-25 06:35:38,"Ashutosh Chauhan" <ha...@apache.org> wrote:
This may be because CombineHiveInputFormat is combining your splits in one map task. If you don't want that to happen, do:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat


2011/8/24 Daniel,Wu<ha...@163.com>

I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set  mapred.map.tasks, such as 3,  it doesn't work, as it always use 1 map task (please see the completed job information).



Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes
00630000664.0000


Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%


00100.00%


1 1NANA
job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA



At 2011-08-24 18:19:38,wd <wd...@wdicc.com> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp

>
>2011/8/24 Daniel,Wu <ha...@163.com>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>>
>>
>>
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <va...@amazon.com> wrote:
>>
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>>
>>
>>
>> mapred.max.split.size appropriately.
>>
>>
>>
>> Thanks
>>
>> Vaibhav
>>
>>
>>
>> From: Daniel,Wu [mailto:hadoop_wu@163.com]
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>
>>
>>
>>   I run the following simple sql
>> select count(*) from sales;
>> And the job information shows it only uses one map task.
>>
>> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
>> off 3 map tasks, one on each task nodes. What can make hive only run one map
>> task? Do I need to set something to kick off multiple map task?  in my
>> config, I didn't change hive config.
>>
>>
>>
>>






Re: Re: RE: Why a sql only use one map task?

Posted by Ashutosh Chauhan <ha...@apache.org>.
This may be because CombineHiveInputFormat is combining your splits in one
map task. If you don't want that to happen, do:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat

2011/8/24 Daniel,Wu <ha...@163.com>

> I pasted the inform I pasted blow, the map capacity is 6. And no matter how
> I set  mapred.map.tasks, such as 3,  it doesn't work, as it always use 1 map
> task (please see the completed job information).
>
> Cluster Summary (Heap Size is 16.81 MB/966.69 MB) Running Map TasksRunning
> Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved
> Map SlotsReserved Reduce Slots Map Task CapacityReduce Task CapacityAvg.
> Tasks/NodeBlacklisted NodesExcluded Nodes 0063<http://test1:50030/machines.jsp?type=active>
> 0000664.000 <http://test1:50030/machines.jsp?type=blacklisted> 0<http://test1:50030/machines.jsp?type=excluded>
> ------------------------------
> Completed Jobs *Jobid**Priority**User**Name**Map % Complete**Map Total**Maps
> Completed**Reduce % Complete**Reduce Total* *Reduces Completed**Job
> Scheduling Information**Diagnostic Info * job_201108242119_0001<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0001&refresh=0>
> NORMALoracleselect count(*) from test(Stage-1) 100.00%
> 00100.00%
> 1 1NANA job_201108242119_0002<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0002&refresh=0>
> NORMALoracleselect count(*) from test(Stage-1)100.00%
> 11100.00%
> 1 1NANA job_201108242119_0003<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0003&refresh=0>
> NORMALoracleselect count(*) from test(Stage-1)100.00%
> 11100.00%
> 1 1NANA job_201108242119_0004<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0004&refresh=0>
> NORMALoracleselect period_key,count(*) from...period_key(Stage-1) 100.00%
> 11100.00%
> 3 3NANA job_201108242119_0005<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0005&refresh=0>
> NORMALoracleselect period_key,count(*) from...period_key(Stage-1) 100.00%
> 11100.00%
> 3 3NANA job_201108242119_0006<http://test1:50030/jobdetails.jsp?jobid=job_201108242119_0006&refresh=0>
> NORMALoracleselect period_key,count(*) from...period_key(Stage-1) 100.00%
> 11100.00%
> 3 3NANA
> ------------------------------
>
>
> At 2011-08-24 18:19:38,wd <wd...@wdicc.com> wrote:
> >What about your total Map Task Capacity?
> >you may check it from http://your_jobtracker:50030/jobtracker.jsp
>
> >
> >2011/8/24 Daniel,Wu <ha...@163.com>:
> >> I checked my setting, all are with the default value.So per the book of
> >> "Hadoop the definitive guide", the split size should be 64M. And the file
> >> size is about 500M, so that's about 8 splits. And from the map job
> >> information (after the map job is done), I can see it gets 8 split from one
> >> node. But anyhow it starts only one map task.
> >>
> >>
> >>
> >> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <va...@amazon.com> wrote:
> >>
> >> If you actually have splittable files you can set the following setting to
> >> create more splits:
> >>
> >>
> >>
> >> mapred.max.split.size appropriately.
> >>
> >>
> >>
> >> Thanks
> >>
> >> Vaibhav
> >>
> >>
> >>
> >> From: Daniel,Wu [mailto:hadoop_wu@163.com]
> >> Sent: Tuesday, August 23, 2011 6:51 AM
> >> To: hive
> >> Subject: Why a sql only use one map task?
> >>
> >>
> >>
> >>   I run the following simple sql
> >> select count(*) from sales;
> >> And the job information shows it only uses one map task.
> >>
> >> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
> >> off 3 map tasks, one on each task nodes. What can make hive only run one map
> >> task? Do I need to set something to kick off multiple map task?  in my
> >> config, I didn't change hive config.
> >>
> >>
> >>
> >>
>
>
>
>

Re:Re: RE: Why a sql only use one map task?

Posted by "Daniel,Wu" <ha...@163.com>.
I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set  mapred.map.tasks, such as 3,  it doesn't work, as it always use 1 map task (please see the completed job information).



Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesExcluded Nodes
00630000664.0000


Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%


00100.00%


1 1NANA
job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA



At 2011-08-24 18:19:38,wd <wd...@wdicc.com> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp
>
>2011/8/24 Daniel,Wu <ha...@163.com>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>>
>>
>>
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <va...@amazon.com> wrote:
>>
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>>
>>
>>
>> mapred.max.split.size appropriately.
>>
>>
>>
>> Thanks
>>
>> Vaibhav
>>
>>
>>
>> From: Daniel,Wu [mailto:hadoop_wu@163.com]
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>
>>
>>
>>   I run the following simple sql
>> select count(*) from sales;
>> And the job information shows it only uses one map task.
>>
>> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
>> off 3 map tasks, one on each task nodes. What can make hive only run one map
>> task? Do I need to set something to kick off multiple map task?  in my
>> config, I didn't change hive config.
>>
>>
>>
>>

Re: RE: Why a sql only use one map task?

Posted by wd <wd...@wdicc.com>.
What about your total Map Task Capacity?
you may check it from http://your_jobtracker:50030/jobtracker.jsp

2011/8/24 Daniel,Wu <ha...@163.com>:
> I checked my setting, all are with the default value.So per the book of
> "Hadoop the definitive guide", the split size should be 64M. And the file
> size is about 500M, so that's about 8 splits. And from the map job
> information (after the map job is done), I can see it gets 8 split from one
> node. But anyhow it starts only one map task.
>
>
>
> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <va...@amazon.com> wrote:
>
> If you actually have splittable files you can set the following setting to
> create more splits:
>
>
>
> mapred.max.split.size appropriately.
>
>
>
> Thanks
>
> Vaibhav
>
>
>
> From: Daniel,Wu [mailto:hadoop_wu@163.com]
> Sent: Tuesday, August 23, 2011 6:51 AM
> To: hive
> Subject: Why a sql only use one map task?
>
>
>
>   I run the following simple sql
> select count(*) from sales;
> And the job information shows it only uses one map task.
>
> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
> off 3 map tasks, one on each task nodes. What can make hive only run one map
> task? Do I need to set something to kick off multiple map task?  in my
> config, I didn't change hive config.
>
>
>
>

RE: Re:RE: Why a sql only use one map task?

Posted by Steven Wong <sw...@netflix.com>.
I think mapred.max.split.size is not set by default. The max split size is not the same as the HDFS block size.


From: Daniel,Wu [mailto:hadoop_wu@163.com]
Sent: Tuesday, August 23, 2011 11:44 PM
To: user@hive.apache.org
Subject: Re:RE: Why a sql only use one map task?

I checked my setting, all are with the default value.So per the book of "Hadoop the definitive guide", the split size should be 64M. And the file size is about 500M, so that's about 8 splits. And from the map job information (after the map job is done), I can see it gets 8 split from one node. But anyhow it starts only one map task.


At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <va...@amazon.com>> wrote:

If you actually have splittable files you can set the following setting to create more splits:

mapred.max.split.size appropriately.

Thanks
Vaibhav

From: Daniel,Wu [mailto:hadoop_wu@163.com<ma...@163.com>]
Sent: Tuesday, August 23, 2011 6:51 AM
To: hive
Subject: Why a sql only use one map task?

  I run the following simple sql
select count(*) from sales;
And the job information shows it only uses one map task.

The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task?  in my config, I didn't change hive config.



Re:RE: Why a sql only use one map task?

Posted by "Daniel,Wu" <ha...@163.com>.
I checked my setting, all are with the default value.So per the book of "Hadoop the definitive guide", the split size should be 64M. And the file size is about 500M, so that's about 8 splits. And from the map job information (after the map job is done), I can see it gets 8 split from one node. But anyhow it starts only one map task.




At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <va...@amazon.com> wrote:


If you actually have splittable files you can set the following setting to create more splits:

 

mapred.max.split.size appropriately.

 

Thanks

Vaibhav

 

From: Daniel,Wu [mailto:hadoop_wu@163.com]
Sent: Tuesday, August 23, 2011 6:51 AM
To: hive
Subject: Why a sql only use one map task?

 

  I run the following simple sql
select count(*) from sales;
And the job information shows it only uses one map task.

The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task?  in my config, I didn't change hive config.

 

RE: Why a sql only use one map task?

Posted by "Aggarwal, Vaibhav" <va...@amazon.com>.
If you actually have splittable files you can set the following setting to create more splits:

mapred.max.split.size appropriately.

Thanks
Vaibhav

From: Daniel,Wu [mailto:hadoop_wu@163.com]
Sent: Tuesday, August 23, 2011 6:51 AM
To: hive
Subject: Why a sql only use one map task?

  I run the following simple sql
select count(*) from sales;
And the job information shows it only uses one map task.

The underlying hadoop has 3 data/data nodes. So I expect hive should kick off 3 map tasks, one on each task nodes. What can make hive only run one map task? Do I need to set something to kick off multiple map task?  in my config, I didn't change hive config.