You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Arindam Choudhury <ar...@gmail.com> on 2013/02/26 12:09:29 UTC
Running terasort with 1 map task
Hi all,
I am trying to run terasort using one map and one reduce. so, I generated
the input data using:
hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
-Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
Then I launched the hadoop terasort job using:
hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
-Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
I thought it will run the job using 1 map and 1 reduce, but when inspect
the job statistics I found:
hadoop job -history /user/hadoop/output1
Task Summary
============================
Kind Total Successful Failed Killed StartTime FinishTime
Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
10:57:55 (8sec)
Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
11:05:37 (7mins, 40sec)
Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
11:08:31 (10mins, 10sec)
Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
11:08:36 (4sec)
============================
so, though I mentioned to launch one map tasks, there are 24 of them.
How to solve this problem. How to tell hadoop to launch only one map.
Thanks,
Re: Running terasort with 1 map task
Posted by Mahesh Balija <ba...@gmail.com>.
does passing the dfs.block.size=134217728 resolves your issue? or is it
something else fixed your problem?
On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> sorry my bad, it solved
>
>
> On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
>> size
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>134217728</value>
>> <final>true</final>
>> </property>
>>
>> While running the teragen I am again specifying it to be sure:
>>
>> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
>> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
>> 320000 /user/hadoop/input
>>
>> but it generates 3 blocks:
>>
>> hadoop fsck -blocks -files -locations /user/hadoop/input
>> Status: HEALTHY
>> Total size: 32029543 B
>> Total dirs: 3
>> Total files: 4
>> Total blocks (validated): 3 (avg. block size 10676514 B)
>> Minimally replicated blocks: 3 (100.0 %)
>>
>> What I am doing wrong? How can I generate only one block?
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Thanks . As Julien said I want to do a performance measurement.
>>>
>>> Actually,
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> has generated:
>>> Total size: 3200029737 B
>>> Total dirs: 3
>>> Total files: 5
>>> Total blocks (validated): 27 (avg. block size 118519619 B)
>>>
>>> Thats why so many maps.
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <julien.muller@ezako.com
>>> > wrote:
>>>
>>>> Maybe your goal is to have a baseline for performance measurement?
>>>> In that case, you might want to consider running only one taskTracker?
>>>> You would have multiple tasks but running on only 1 machine. Also, you
>>>> could make mappers run serially, by configuring only one map slot on your 1
>>>> node cluster.
>>>>
>>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>>> case (or maybe you can give us more clues).
>>>>
>>>> Julien
>>>>
>>>>
>>>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>>>
>>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>>
>>>>> It is possible to have a single mapper if the input is not splittable
>>>>> BUT it is rarely seen as a feature.
>>>>> One could ask why you want to use a platform for distributed computing
>>>>> for a job that shouldn't be distributed.
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>>> arindamchoudhury0@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>>> generated the input data using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>>
>>>>>> Then I launched the hadoop terasort job using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>>
>>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>>> inspect the job statistics I found:
>>>>>>
>>>>>> hadoop job -history /user/hadoop/output1
>>>>>>
>>>>>> Task Summary
>>>>>> ============================
>>>>>> Kind Total Successful Failed Killed StartTime
>>>>>> FinishTime
>>>>>>
>>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>>>> 10:57:55 (8sec)
>>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>>>> 11:05:37 (7mins, 40sec)
>>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>>>> 11:08:31 (10mins, 10sec)
>>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32
>>>>>> 26-Feb-2013 11:08:36 (4sec)
>>>>>> ============================
>>>>>>
>>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>>
>>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Mahesh Balija <ba...@gmail.com>.
does passing the dfs.block.size=134217728 resolves your issue? or is it
something else fixed your problem?
On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> sorry my bad, it solved
>
>
> On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
>> size
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>134217728</value>
>> <final>true</final>
>> </property>
>>
>> While running the teragen I am again specifying it to be sure:
>>
>> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
>> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
>> 320000 /user/hadoop/input
>>
>> but it generates 3 blocks:
>>
>> hadoop fsck -blocks -files -locations /user/hadoop/input
>> Status: HEALTHY
>> Total size: 32029543 B
>> Total dirs: 3
>> Total files: 4
>> Total blocks (validated): 3 (avg. block size 10676514 B)
>> Minimally replicated blocks: 3 (100.0 %)
>>
>> What I am doing wrong? How can I generate only one block?
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Thanks . As Julien said I want to do a performance measurement.
>>>
>>> Actually,
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> has generated:
>>> Total size: 3200029737 B
>>> Total dirs: 3
>>> Total files: 5
>>> Total blocks (validated): 27 (avg. block size 118519619 B)
>>>
>>> Thats why so many maps.
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <julien.muller@ezako.com
>>> > wrote:
>>>
>>>> Maybe your goal is to have a baseline for performance measurement?
>>>> In that case, you might want to consider running only one taskTracker?
>>>> You would have multiple tasks but running on only 1 machine. Also, you
>>>> could make mappers run serially, by configuring only one map slot on your 1
>>>> node cluster.
>>>>
>>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>>> case (or maybe you can give us more clues).
>>>>
>>>> Julien
>>>>
>>>>
>>>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>>>
>>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>>
>>>>> It is possible to have a single mapper if the input is not splittable
>>>>> BUT it is rarely seen as a feature.
>>>>> One could ask why you want to use a platform for distributed computing
>>>>> for a job that shouldn't be distributed.
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>>> arindamchoudhury0@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>>> generated the input data using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>>
>>>>>> Then I launched the hadoop terasort job using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>>
>>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>>> inspect the job statistics I found:
>>>>>>
>>>>>> hadoop job -history /user/hadoop/output1
>>>>>>
>>>>>> Task Summary
>>>>>> ============================
>>>>>> Kind Total Successful Failed Killed StartTime
>>>>>> FinishTime
>>>>>>
>>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>>>> 10:57:55 (8sec)
>>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>>>> 11:05:37 (7mins, 40sec)
>>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>>>> 11:08:31 (10mins, 10sec)
>>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32
>>>>>> 26-Feb-2013 11:08:36 (4sec)
>>>>>> ============================
>>>>>>
>>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>>
>>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Mahesh Balija <ba...@gmail.com>.
does passing the dfs.block.size=134217728 resolves your issue? or is it
something else fixed your problem?
On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> sorry my bad, it solved
>
>
> On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
>> size
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>134217728</value>
>> <final>true</final>
>> </property>
>>
>> While running the teragen I am again specifying it to be sure:
>>
>> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
>> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
>> 320000 /user/hadoop/input
>>
>> but it generates 3 blocks:
>>
>> hadoop fsck -blocks -files -locations /user/hadoop/input
>> Status: HEALTHY
>> Total size: 32029543 B
>> Total dirs: 3
>> Total files: 4
>> Total blocks (validated): 3 (avg. block size 10676514 B)
>> Minimally replicated blocks: 3 (100.0 %)
>>
>> What I am doing wrong? How can I generate only one block?
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Thanks . As Julien said I want to do a performance measurement.
>>>
>>> Actually,
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> has generated:
>>> Total size: 3200029737 B
>>> Total dirs: 3
>>> Total files: 5
>>> Total blocks (validated): 27 (avg. block size 118519619 B)
>>>
>>> Thats why so many maps.
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <julien.muller@ezako.com
>>> > wrote:
>>>
>>>> Maybe your goal is to have a baseline for performance measurement?
>>>> In that case, you might want to consider running only one taskTracker?
>>>> You would have multiple tasks but running on only 1 machine. Also, you
>>>> could make mappers run serially, by configuring only one map slot on your 1
>>>> node cluster.
>>>>
>>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>>> case (or maybe you can give us more clues).
>>>>
>>>> Julien
>>>>
>>>>
>>>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>>>
>>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>>
>>>>> It is possible to have a single mapper if the input is not splittable
>>>>> BUT it is rarely seen as a feature.
>>>>> One could ask why you want to use a platform for distributed computing
>>>>> for a job that shouldn't be distributed.
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>>> arindamchoudhury0@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>>> generated the input data using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>>
>>>>>> Then I launched the hadoop terasort job using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>>
>>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>>> inspect the job statistics I found:
>>>>>>
>>>>>> hadoop job -history /user/hadoop/output1
>>>>>>
>>>>>> Task Summary
>>>>>> ============================
>>>>>> Kind Total Successful Failed Killed StartTime
>>>>>> FinishTime
>>>>>>
>>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>>>> 10:57:55 (8sec)
>>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>>>> 11:05:37 (7mins, 40sec)
>>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>>>> 11:08:31 (10mins, 10sec)
>>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32
>>>>>> 26-Feb-2013 11:08:36 (4sec)
>>>>>> ============================
>>>>>>
>>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>>
>>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Mahesh Balija <ba...@gmail.com>.
does passing the dfs.block.size=134217728 resolves your issue? or is it
something else fixed your problem?
On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> sorry my bad, it solved
>
>
> On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
>> size
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>134217728</value>
>> <final>true</final>
>> </property>
>>
>> While running the teragen I am again specifying it to be sure:
>>
>> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
>> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
>> 320000 /user/hadoop/input
>>
>> but it generates 3 blocks:
>>
>> hadoop fsck -blocks -files -locations /user/hadoop/input
>> Status: HEALTHY
>> Total size: 32029543 B
>> Total dirs: 3
>> Total files: 4
>> Total blocks (validated): 3 (avg. block size 10676514 B)
>> Minimally replicated blocks: 3 (100.0 %)
>>
>> What I am doing wrong? How can I generate only one block?
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Thanks . As Julien said I want to do a performance measurement.
>>>
>>> Actually,
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> has generated:
>>> Total size: 3200029737 B
>>> Total dirs: 3
>>> Total files: 5
>>> Total blocks (validated): 27 (avg. block size 118519619 B)
>>>
>>> Thats why so many maps.
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <julien.muller@ezako.com
>>> > wrote:
>>>
>>>> Maybe your goal is to have a baseline for performance measurement?
>>>> In that case, you might want to consider running only one taskTracker?
>>>> You would have multiple tasks but running on only 1 machine. Also, you
>>>> could make mappers run serially, by configuring only one map slot on your 1
>>>> node cluster.
>>>>
>>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>>> case (or maybe you can give us more clues).
>>>>
>>>> Julien
>>>>
>>>>
>>>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>>>
>>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>>
>>>>> It is possible to have a single mapper if the input is not splittable
>>>>> BUT it is rarely seen as a feature.
>>>>> One could ask why you want to use a platform for distributed computing
>>>>> for a job that shouldn't be distributed.
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>>> arindamchoudhury0@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>>> generated the input data using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>>
>>>>>> Then I launched the hadoop terasort job using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>>
>>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>>> inspect the job statistics I found:
>>>>>>
>>>>>> hadoop job -history /user/hadoop/output1
>>>>>>
>>>>>> Task Summary
>>>>>> ============================
>>>>>> Kind Total Successful Failed Killed StartTime
>>>>>> FinishTime
>>>>>>
>>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>>>> 10:57:55 (8sec)
>>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>>>> 11:05:37 (7mins, 40sec)
>>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>>>> 11:08:31 (10mins, 10sec)
>>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32
>>>>>> 26-Feb-2013 11:08:36 (4sec)
>>>>>> ============================
>>>>>>
>>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>>
>>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
sorry my bad, it solved
On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
> size
>
> <property>
> <name>dfs.block.size</name>
> <value>134217728</value>
> <final>true</final>
> </property>
>
> While running the teragen I am again specifying it to be sure:
>
> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
> 320000 /user/hadoop/input
>
> but it generates 3 blocks:
>
> hadoop fsck -blocks -files -locations /user/hadoop/input
> Status: HEALTHY
> Total size: 32029543 B
> Total dirs: 3
> Total files: 4
> Total blocks (validated): 3 (avg. block size 10676514 B)
> Minimally replicated blocks: 3 (100.0 %)
>
> What I am doing wrong? How can I generate only one block?
>
>
>
> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Thanks . As Julien said I want to do a performance measurement.
>>
>> Actually,
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> has generated:
>> Total size: 3200029737 B
>> Total dirs: 3
>> Total files: 5
>> Total blocks (validated): 27 (avg. block size 118519619 B)
>>
>> Thats why so many maps.
>>
>>
>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
>>
>>> Maybe your goal is to have a baseline for performance measurement?
>>> In that case, you might want to consider running only one taskTracker?
>>> You would have multiple tasks but running on only 1 machine. Also, you
>>> could make mappers run serially, by configuring only one map slot on your 1
>>> node cluster.
>>>
>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>> case (or maybe you can give us more clues).
>>>
>>> Julien
>>>
>>>
>>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>>
>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>
>>>> It is possible to have a single mapper if the input is not splittable
>>>> BUT it is rarely seen as a feature.
>>>> One could ask why you want to use a platform for distributed computing
>>>> for a job that shouldn't be distributed.
>>>>
>>>> Regards
>>>>
>>>> Bertrand
>>>>
>>>>
>>>>
>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>> arindamchoudhury0@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>> generated the input data using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>
>>>>> Then I launched the hadoop terasort job using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>
>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>> inspect the job statistics I found:
>>>>>
>>>>> hadoop job -history /user/hadoop/output1
>>>>>
>>>>> Task Summary
>>>>> ============================
>>>>> Kind Total Successful Failed Killed StartTime
>>>>> FinishTime
>>>>>
>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>>> 10:57:55 (8sec)
>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>>> 11:05:37 (7mins, 40sec)
>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>>> 11:08:31 (10mins, 10sec)
>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>>>> 11:08:36 (4sec)
>>>>> ============================
>>>>>
>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>
>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
sorry my bad, it solved
On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
> size
>
> <property>
> <name>dfs.block.size</name>
> <value>134217728</value>
> <final>true</final>
> </property>
>
> While running the teragen I am again specifying it to be sure:
>
> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
> 320000 /user/hadoop/input
>
> but it generates 3 blocks:
>
> hadoop fsck -blocks -files -locations /user/hadoop/input
> Status: HEALTHY
> Total size: 32029543 B
> Total dirs: 3
> Total files: 4
> Total blocks (validated): 3 (avg. block size 10676514 B)
> Minimally replicated blocks: 3 (100.0 %)
>
> What I am doing wrong? How can I generate only one block?
>
>
>
> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Thanks . As Julien said I want to do a performance measurement.
>>
>> Actually,
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> has generated:
>> Total size: 3200029737 B
>> Total dirs: 3
>> Total files: 5
>> Total blocks (validated): 27 (avg. block size 118519619 B)
>>
>> Thats why so many maps.
>>
>>
>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
>>
>>> Maybe your goal is to have a baseline for performance measurement?
>>> In that case, you might want to consider running only one taskTracker?
>>> You would have multiple tasks but running on only 1 machine. Also, you
>>> could make mappers run serially, by configuring only one map slot on your 1
>>> node cluster.
>>>
>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>> case (or maybe you can give us more clues).
>>>
>>> Julien
>>>
>>>
>>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>>
>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>
>>>> It is possible to have a single mapper if the input is not splittable
>>>> BUT it is rarely seen as a feature.
>>>> One could ask why you want to use a platform for distributed computing
>>>> for a job that shouldn't be distributed.
>>>>
>>>> Regards
>>>>
>>>> Bertrand
>>>>
>>>>
>>>>
>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>> arindamchoudhury0@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>> generated the input data using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>
>>>>> Then I launched the hadoop terasort job using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>
>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>> inspect the job statistics I found:
>>>>>
>>>>> hadoop job -history /user/hadoop/output1
>>>>>
>>>>> Task Summary
>>>>> ============================
>>>>> Kind Total Successful Failed Killed StartTime
>>>>> FinishTime
>>>>>
>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>>> 10:57:55 (8sec)
>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>>> 11:05:37 (7mins, 40sec)
>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>>> 11:08:31 (10mins, 10sec)
>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>>>> 11:08:36 (4sec)
>>>>> ============================
>>>>>
>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>
>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
sorry my bad, it solved
On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
> size
>
> <property>
> <name>dfs.block.size</name>
> <value>134217728</value>
> <final>true</final>
> </property>
>
> While running the teragen I am again specifying it to be sure:
>
> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
> 320000 /user/hadoop/input
>
> but it generates 3 blocks:
>
> hadoop fsck -blocks -files -locations /user/hadoop/input
> Status: HEALTHY
> Total size: 32029543 B
> Total dirs: 3
> Total files: 4
> Total blocks (validated): 3 (avg. block size 10676514 B)
> Minimally replicated blocks: 3 (100.0 %)
>
> What I am doing wrong? How can I generate only one block?
>
>
>
> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Thanks . As Julien said I want to do a performance measurement.
>>
>> Actually,
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> has generated:
>> Total size: 3200029737 B
>> Total dirs: 3
>> Total files: 5
>> Total blocks (validated): 27 (avg. block size 118519619 B)
>>
>> Thats why so many maps.
>>
>>
>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
>>
>>> Maybe your goal is to have a baseline for performance measurement?
>>> In that case, you might want to consider running only one taskTracker?
>>> You would have multiple tasks but running on only 1 machine. Also, you
>>> could make mappers run serially, by configuring only one map slot on your 1
>>> node cluster.
>>>
>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>> case (or maybe you can give us more clues).
>>>
>>> Julien
>>>
>>>
>>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>>
>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>
>>>> It is possible to have a single mapper if the input is not splittable
>>>> BUT it is rarely seen as a feature.
>>>> One could ask why you want to use a platform for distributed computing
>>>> for a job that shouldn't be distributed.
>>>>
>>>> Regards
>>>>
>>>> Bertrand
>>>>
>>>>
>>>>
>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>> arindamchoudhury0@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>> generated the input data using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>
>>>>> Then I launched the hadoop terasort job using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>
>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>> inspect the job statistics I found:
>>>>>
>>>>> hadoop job -history /user/hadoop/output1
>>>>>
>>>>> Task Summary
>>>>> ============================
>>>>> Kind Total Successful Failed Killed StartTime
>>>>> FinishTime
>>>>>
>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>>> 10:57:55 (8sec)
>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>>> 11:05:37 (7mins, 40sec)
>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>>> 11:08:31 (10mins, 10sec)
>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>>>> 11:08:36 (4sec)
>>>>> ============================
>>>>>
>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>
>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
sorry my bad, it solved
On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
> size
>
> <property>
> <name>dfs.block.size</name>
> <value>134217728</value>
> <final>true</final>
> </property>
>
> While running the teragen I am again specifying it to be sure:
>
> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
> 320000 /user/hadoop/input
>
> but it generates 3 blocks:
>
> hadoop fsck -blocks -files -locations /user/hadoop/input
> Status: HEALTHY
> Total size: 32029543 B
> Total dirs: 3
> Total files: 4
> Total blocks (validated): 3 (avg. block size 10676514 B)
> Minimally replicated blocks: 3 (100.0 %)
>
> What I am doing wrong? How can I generate only one block?
>
>
>
> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Thanks . As Julien said I want to do a performance measurement.
>>
>> Actually,
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> has generated:
>> Total size: 3200029737 B
>> Total dirs: 3
>> Total files: 5
>> Total blocks (validated): 27 (avg. block size 118519619 B)
>>
>> Thats why so many maps.
>>
>>
>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
>>
>>> Maybe your goal is to have a baseline for performance measurement?
>>> In that case, you might want to consider running only one taskTracker?
>>> You would have multiple tasks but running on only 1 machine. Also, you
>>> could make mappers run serially, by configuring only one map slot on your 1
>>> node cluster.
>>>
>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>> case (or maybe you can give us more clues).
>>>
>>> Julien
>>>
>>>
>>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>>
>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>
>>>> It is possible to have a single mapper if the input is not splittable
>>>> BUT it is rarely seen as a feature.
>>>> One could ask why you want to use a platform for distributed computing
>>>> for a job that shouldn't be distributed.
>>>>
>>>> Regards
>>>>
>>>> Bertrand
>>>>
>>>>
>>>>
>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>> arindamchoudhury0@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>> generated the input data using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>
>>>>> Then I launched the hadoop terasort job using:
>>>>>
>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>
>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>> inspect the job statistics I found:
>>>>>
>>>>> hadoop job -history /user/hadoop/output1
>>>>>
>>>>> Task Summary
>>>>> ============================
>>>>> Kind Total Successful Failed Killed StartTime
>>>>> FinishTime
>>>>>
>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>>> 10:57:55 (8sec)
>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>>> 11:05:37 (7mins, 40sec)
>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>>> 11:08:31 (10mins, 10sec)
>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>>>> 11:08:36 (4sec)
>>>>> ============================
>>>>>
>>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>>
>>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block size
<property>
<name>dfs.block.size</name>
<value>134217728</value>
<final>true</final>
</property>
While running the teragen I am again specifying it to be sure:
hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
-Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
320000 /user/hadoop/input
but it generates 3 blocks:
hadoop fsck -blocks -files -locations /user/hadoop/input
Status: HEALTHY
Total size: 32029543 B
Total dirs: 3
Total files: 4
Total blocks (validated): 3 (avg. block size 10676514 B)
Minimally replicated blocks: 3 (100.0 %)
What I am doing wrong? How can I generate only one block?
On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> Thanks . As Julien said I want to do a performance measurement.
>
> Actually,
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> has generated:
> Total size: 3200029737 B
> Total dirs: 3
> Total files: 5
> Total blocks (validated): 27 (avg. block size 118519619 B)
>
> Thats why so many maps.
>
>
> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
>
>> Maybe your goal is to have a baseline for performance measurement?
>> In that case, you might want to consider running only one taskTracker?
>> You would have multiple tasks but running on only 1 machine. Also, you
>> could make mappers run serially, by configuring only one map slot on your 1
>> node cluster.
>>
>> Nevertheless I agree with Bertrand, this is not really a realistic use
>> case (or maybe you can give us more clues).
>>
>> Julien
>>
>>
>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>
>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>
>>> It is possible to have a single mapper if the input is not splittable
>>> BUT it is rarely seen as a feature.
>>> One could ask why you want to use a platform for distributed computing
>>> for a job that shouldn't be distributed.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>> arindamchoudhury0@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to run terasort using one map and one reduce. so, I
>>>> generated the input data using:
>>>>
>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>
>>>> Then I launched the hadoop terasort job using:
>>>>
>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>
>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>> inspect the job statistics I found:
>>>>
>>>> hadoop job -history /user/hadoop/output1
>>>>
>>>> Task Summary
>>>> ============================
>>>> Kind Total Successful Failed Killed StartTime
>>>> FinishTime
>>>>
>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>> 10:57:55 (8sec)
>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>> 11:05:37 (7mins, 40sec)
>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>> 11:08:31 (10mins, 10sec)
>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>>> 11:08:36 (4sec)
>>>> ============================
>>>>
>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>
>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>
>>>> Thanks,
>>>>
>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block size
<property>
<name>dfs.block.size</name>
<value>134217728</value>
<final>true</final>
</property>
While running the teragen I am again specifying it to be sure:
hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
-Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
320000 /user/hadoop/input
but it generates 3 blocks:
hadoop fsck -blocks -files -locations /user/hadoop/input
Status: HEALTHY
Total size: 32029543 B
Total dirs: 3
Total files: 4
Total blocks (validated): 3 (avg. block size 10676514 B)
Minimally replicated blocks: 3 (100.0 %)
What I am doing wrong? How can I generate only one block?
On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> Thanks . As Julien said I want to do a performance measurement.
>
> Actually,
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> has generated:
> Total size: 3200029737 B
> Total dirs: 3
> Total files: 5
> Total blocks (validated): 27 (avg. block size 118519619 B)
>
> Thats why so many maps.
>
>
> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
>
>> Maybe your goal is to have a baseline for performance measurement?
>> In that case, you might want to consider running only one taskTracker?
>> You would have multiple tasks but running on only 1 machine. Also, you
>> could make mappers run serially, by configuring only one map slot on your 1
>> node cluster.
>>
>> Nevertheless I agree with Bertrand, this is not really a realistic use
>> case (or maybe you can give us more clues).
>>
>> Julien
>>
>>
>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>
>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>
>>> It is possible to have a single mapper if the input is not splittable
>>> BUT it is rarely seen as a feature.
>>> One could ask why you want to use a platform for distributed computing
>>> for a job that shouldn't be distributed.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>> arindamchoudhury0@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to run terasort using one map and one reduce. so, I
>>>> generated the input data using:
>>>>
>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>
>>>> Then I launched the hadoop terasort job using:
>>>>
>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>
>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>> inspect the job statistics I found:
>>>>
>>>> hadoop job -history /user/hadoop/output1
>>>>
>>>> Task Summary
>>>> ============================
>>>> Kind Total Successful Failed Killed StartTime
>>>> FinishTime
>>>>
>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>> 10:57:55 (8sec)
>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>> 11:05:37 (7mins, 40sec)
>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>> 11:08:31 (10mins, 10sec)
>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>>> 11:08:36 (4sec)
>>>> ============================
>>>>
>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>
>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>
>>>> Thanks,
>>>>
>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block size
<property>
<name>dfs.block.size</name>
<value>134217728</value>
<final>true</final>
</property>
While running the teragen I am again specifying it to be sure:
hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
-Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
320000 /user/hadoop/input
but it generates 3 blocks:
hadoop fsck -blocks -files -locations /user/hadoop/input
Status: HEALTHY
Total size: 32029543 B
Total dirs: 3
Total files: 4
Total blocks (validated): 3 (avg. block size 10676514 B)
Minimally replicated blocks: 3 (100.0 %)
What I am doing wrong? How can I generate only one block?
On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> Thanks . As Julien said I want to do a performance measurement.
>
> Actually,
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> has generated:
> Total size: 3200029737 B
> Total dirs: 3
> Total files: 5
> Total blocks (validated): 27 (avg. block size 118519619 B)
>
> Thats why so many maps.
>
>
> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
>
>> Maybe your goal is to have a baseline for performance measurement?
>> In that case, you might want to consider running only one taskTracker?
>> You would have multiple tasks but running on only 1 machine. Also, you
>> could make mappers run serially, by configuring only one map slot on your 1
>> node cluster.
>>
>> Nevertheless I agree with Bertrand, this is not really a realistic use
>> case (or maybe you can give us more clues).
>>
>> Julien
>>
>>
>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>
>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>
>>> It is possible to have a single mapper if the input is not splittable
>>> BUT it is rarely seen as a feature.
>>> One could ask why you want to use a platform for distributed computing
>>> for a job that shouldn't be distributed.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>> arindamchoudhury0@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to run terasort using one map and one reduce. so, I
>>>> generated the input data using:
>>>>
>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>
>>>> Then I launched the hadoop terasort job using:
>>>>
>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>
>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>> inspect the job statistics I found:
>>>>
>>>> hadoop job -history /user/hadoop/output1
>>>>
>>>> Task Summary
>>>> ============================
>>>> Kind Total Successful Failed Killed StartTime
>>>> FinishTime
>>>>
>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>> 10:57:55 (8sec)
>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>> 11:05:37 (7mins, 40sec)
>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>> 11:08:31 (10mins, 10sec)
>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>>> 11:08:36 (4sec)
>>>> ============================
>>>>
>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>
>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>
>>>> Thanks,
>>>>
>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block size
<property>
<name>dfs.block.size</name>
<value>134217728</value>
<final>true</final>
</property>
While running the teragen I am again specifying it to be sure:
hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
-Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
320000 /user/hadoop/input
but it generates 3 blocks:
hadoop fsck -blocks -files -locations /user/hadoop/input
Status: HEALTHY
Total size: 32029543 B
Total dirs: 3
Total files: 4
Total blocks (validated): 3 (avg. block size 10676514 B)
Minimally replicated blocks: 3 (100.0 %)
What I am doing wrong? How can I generate only one block?
On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> Thanks . As Julien said I want to do a performance measurement.
>
> Actually,
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> has generated:
> Total size: 3200029737 B
> Total dirs: 3
> Total files: 5
> Total blocks (validated): 27 (avg. block size 118519619 B)
>
> Thats why so many maps.
>
>
> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
>
>> Maybe your goal is to have a baseline for performance measurement?
>> In that case, you might want to consider running only one taskTracker?
>> You would have multiple tasks but running on only 1 machine. Also, you
>> could make mappers run serially, by configuring only one map slot on your 1
>> node cluster.
>>
>> Nevertheless I agree with Bertrand, this is not really a realistic use
>> case (or maybe you can give us more clues).
>>
>> Julien
>>
>>
>> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>>
>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>
>>> It is possible to have a single mapper if the input is not splittable
>>> BUT it is rarely seen as a feature.
>>> One could ask why you want to use a platform for distributed computing
>>> for a job that shouldn't be distributed.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>> arindamchoudhury0@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to run terasort using one map and one reduce. so, I
>>>> generated the input data using:
>>>>
>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>
>>>> Then I launched the hadoop terasort job using:
>>>>
>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>
>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>> inspect the job statistics I found:
>>>>
>>>> hadoop job -history /user/hadoop/output1
>>>>
>>>> Task Summary
>>>> ============================
>>>> Kind Total Successful Failed Killed StartTime
>>>> FinishTime
>>>>
>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>>> 10:57:55 (8sec)
>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>>> 11:05:37 (7mins, 40sec)
>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>>> 11:08:31 (10mins, 10sec)
>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>>> 11:08:36 (4sec)
>>>> ============================
>>>>
>>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>>
>>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>>
>>>> Thanks,
>>>>
>>>
>>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
Thanks . As Julien said I want to do a performance measurement.
Actually,
hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
-Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
has generated:
Total size: 3200029737 B
Total dirs: 3
Total files: 5
Total blocks (validated): 27 (avg. block size 118519619 B)
Thats why so many maps.
On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
> Maybe your goal is to have a baseline for performance measurement?
> In that case, you might want to consider running only one taskTracker?
> You would have multiple tasks but running on only 1 machine. Also, you
> could make mappers run serially, by configuring only one map slot on your 1
> node cluster.
>
> Nevertheless I agree with Bertrand, this is not really a realistic use
> case (or maybe you can give us more clues).
>
> Julien
>
>
> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>
>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> It is possible to have a single mapper if the input is not splittable BUT
>> it is rarely seen as a feature.
>> One could ask why you want to use a platform for distributed computing
>> for a job that shouldn't be distributed.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to run terasort using one map and one reduce. so, I
>>> generated the input data using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> Then I launched the hadoop terasort job using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>
>>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>>> the job statistics I found:
>>>
>>> hadoop job -history /user/hadoop/output1
>>>
>>> Task Summary
>>> ============================
>>> Kind Total Successful Failed Killed StartTime
>>> FinishTime
>>>
>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>> 10:57:55 (8sec)
>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>> 11:05:37 (7mins, 40sec)
>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>> 11:08:31 (10mins, 10sec)
>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>> 11:08:36 (4sec)
>>> ============================
>>>
>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>
>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>
>>> Thanks,
>>>
>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
Thanks . As Julien said I want to do a performance measurement.
Actually,
hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
-Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
has generated:
Total size: 3200029737 B
Total dirs: 3
Total files: 5
Total blocks (validated): 27 (avg. block size 118519619 B)
Thats why so many maps.
On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
> Maybe your goal is to have a baseline for performance measurement?
> In that case, you might want to consider running only one taskTracker?
> You would have multiple tasks but running on only 1 machine. Also, you
> could make mappers run serially, by configuring only one map slot on your 1
> node cluster.
>
> Nevertheless I agree with Bertrand, this is not really a realistic use
> case (or maybe you can give us more clues).
>
> Julien
>
>
> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>
>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> It is possible to have a single mapper if the input is not splittable BUT
>> it is rarely seen as a feature.
>> One could ask why you want to use a platform for distributed computing
>> for a job that shouldn't be distributed.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to run terasort using one map and one reduce. so, I
>>> generated the input data using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> Then I launched the hadoop terasort job using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>
>>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>>> the job statistics I found:
>>>
>>> hadoop job -history /user/hadoop/output1
>>>
>>> Task Summary
>>> ============================
>>> Kind Total Successful Failed Killed StartTime
>>> FinishTime
>>>
>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>> 10:57:55 (8sec)
>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>> 11:05:37 (7mins, 40sec)
>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>> 11:08:31 (10mins, 10sec)
>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>> 11:08:36 (4sec)
>>> ============================
>>>
>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>
>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>
>>> Thanks,
>>>
>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
Thanks . As Julien said I want to do a performance measurement.
Actually,
hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
-Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
has generated:
Total size: 3200029737 B
Total dirs: 3
Total files: 5
Total blocks (validated): 27 (avg. block size 118519619 B)
Thats why so many maps.
On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
> Maybe your goal is to have a baseline for performance measurement?
> In that case, you might want to consider running only one taskTracker?
> You would have multiple tasks but running on only 1 machine. Also, you
> could make mappers run serially, by configuring only one map slot on your 1
> node cluster.
>
> Nevertheless I agree with Bertrand, this is not really a realistic use
> case (or maybe you can give us more clues).
>
> Julien
>
>
> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>
>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> It is possible to have a single mapper if the input is not splittable BUT
>> it is rarely seen as a feature.
>> One could ask why you want to use a platform for distributed computing
>> for a job that shouldn't be distributed.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to run terasort using one map and one reduce. so, I
>>> generated the input data using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> Then I launched the hadoop terasort job using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>
>>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>>> the job statistics I found:
>>>
>>> hadoop job -history /user/hadoop/output1
>>>
>>> Task Summary
>>> ============================
>>> Kind Total Successful Failed Killed StartTime
>>> FinishTime
>>>
>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>> 10:57:55 (8sec)
>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>> 11:05:37 (7mins, 40sec)
>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>> 11:08:31 (10mins, 10sec)
>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>> 11:08:36 (4sec)
>>> ============================
>>>
>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>
>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>
>>> Thanks,
>>>
>>
>>
>
Re: Running terasort with 1 map task
Posted by Arindam Choudhury <ar...@gmail.com>.
Thanks . As Julien said I want to do a performance measurement.
Actually,
hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
-Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
has generated:
Total size: 3200029737 B
Total dirs: 3
Total files: 5
Total blocks (validated): 27 (avg. block size 118519619 B)
Thats why so many maps.
On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <ju...@ezako.com>wrote:
> Maybe your goal is to have a baseline for performance measurement?
> In that case, you might want to consider running only one taskTracker?
> You would have multiple tasks but running on only 1 machine. Also, you
> could make mappers run serially, by configuring only one map slot on your 1
> node cluster.
>
> Nevertheless I agree with Bertrand, this is not really a realistic use
> case (or maybe you can give us more clues).
>
> Julien
>
>
> 2013/2/26 Bertrand Dechoux <de...@gmail.com>
>
>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> It is possible to have a single mapper if the input is not splittable BUT
>> it is rarely seen as a feature.
>> One could ask why you want to use a platform for distributed computing
>> for a job that shouldn't be distributed.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>> arindamchoudhury0@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to run terasort using one map and one reduce. so, I
>>> generated the input data using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> Then I launched the hadoop terasort job using:
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>
>>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>>> the job statistics I found:
>>>
>>> hadoop job -history /user/hadoop/output1
>>>
>>> Task Summary
>>> ============================
>>> Kind Total Successful Failed Killed StartTime
>>> FinishTime
>>>
>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>>> 10:57:55 (8sec)
>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>>> 11:05:37 (7mins, 40sec)
>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>>> 11:08:31 (10mins, 10sec)
>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>>> 11:08:36 (4sec)
>>> ============================
>>>
>>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>>
>>> How to solve this problem. How to tell hadoop to launch only one map.
>>>
>>> Thanks,
>>>
>>
>>
>
Re: Running terasort with 1 map task
Posted by Julien Muller <ju...@ezako.com>.
Maybe your goal is to have a baseline for performance measurement?
In that case, you might want to consider running only one taskTracker? You
would have multiple tasks but running on only 1 machine. Also, you could
make mappers run serially, by configuring only one map slot on your 1 node
cluster.
Nevertheless I agree with Bertrand, this is not really a realistic use case
(or maybe you can give us more clues).
Julien
2013/2/26 Bertrand Dechoux <de...@gmail.com>
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> It is possible to have a single mapper if the input is not splittable BUT
> it is rarely seen as a feature.
> One could ask why you want to use a platform for distributed computing for
> a job that shouldn't be distributed.
>
> Regards
>
> Bertrand
>
>
>
> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Hi all,
>>
>> I am trying to run terasort using one map and one reduce. so, I generated
>> the input data using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> Then I launched the hadoop terasort job using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>
>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>> the job statistics I found:
>>
>> hadoop job -history /user/hadoop/output1
>>
>> Task Summary
>> ============================
>> Kind Total Successful Failed Killed StartTime FinishTime
>>
>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>> 10:57:55 (8sec)
>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>> 11:05:37 (7mins, 40sec)
>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>> 11:08:31 (10mins, 10sec)
>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>> 11:08:36 (4sec)
>> ============================
>>
>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>
>> How to solve this problem. How to tell hadoop to launch only one map.
>>
>> Thanks,
>>
>
>
Re: Running terasort with 1 map task
Posted by Julien Muller <ju...@ezako.com>.
Maybe your goal is to have a baseline for performance measurement?
In that case, you might want to consider running only one taskTracker? You
would have multiple tasks but running on only 1 machine. Also, you could
make mappers run serially, by configuring only one map slot on your 1 node
cluster.
Nevertheless I agree with Bertrand, this is not really a realistic use case
(or maybe you can give us more clues).
Julien
2013/2/26 Bertrand Dechoux <de...@gmail.com>
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> It is possible to have a single mapper if the input is not splittable BUT
> it is rarely seen as a feature.
> One could ask why you want to use a platform for distributed computing for
> a job that shouldn't be distributed.
>
> Regards
>
> Bertrand
>
>
>
> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Hi all,
>>
>> I am trying to run terasort using one map and one reduce. so, I generated
>> the input data using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> Then I launched the hadoop terasort job using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>
>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>> the job statistics I found:
>>
>> hadoop job -history /user/hadoop/output1
>>
>> Task Summary
>> ============================
>> Kind Total Successful Failed Killed StartTime FinishTime
>>
>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>> 10:57:55 (8sec)
>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>> 11:05:37 (7mins, 40sec)
>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>> 11:08:31 (10mins, 10sec)
>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>> 11:08:36 (4sec)
>> ============================
>>
>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>
>> How to solve this problem. How to tell hadoop to launch only one map.
>>
>> Thanks,
>>
>
>
Re: Running terasort with 1 map task
Posted by Julien Muller <ju...@ezako.com>.
Maybe your goal is to have a baseline for performance measurement?
In that case, you might want to consider running only one taskTracker? You
would have multiple tasks but running on only 1 machine. Also, you could
make mappers run serially, by configuring only one map slot on your 1 node
cluster.
Nevertheless I agree with Bertrand, this is not really a realistic use case
(or maybe you can give us more clues).
Julien
2013/2/26 Bertrand Dechoux <de...@gmail.com>
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> It is possible to have a single mapper if the input is not splittable BUT
> it is rarely seen as a feature.
> One could ask why you want to use a platform for distributed computing for
> a job that shouldn't be distributed.
>
> Regards
>
> Bertrand
>
>
>
> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Hi all,
>>
>> I am trying to run terasort using one map and one reduce. so, I generated
>> the input data using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> Then I launched the hadoop terasort job using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>
>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>> the job statistics I found:
>>
>> hadoop job -history /user/hadoop/output1
>>
>> Task Summary
>> ============================
>> Kind Total Successful Failed Killed StartTime FinishTime
>>
>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>> 10:57:55 (8sec)
>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>> 11:05:37 (7mins, 40sec)
>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>> 11:08:31 (10mins, 10sec)
>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>> 11:08:36 (4sec)
>> ============================
>>
>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>
>> How to solve this problem. How to tell hadoop to launch only one map.
>>
>> Thanks,
>>
>
>
Re: Running terasort with 1 map task
Posted by Julien Muller <ju...@ezako.com>.
Maybe your goal is to have a baseline for performance measurement?
In that case, you might want to consider running only one taskTracker? You
would have multiple tasks but running on only 1 machine. Also, you could
make mappers run serially, by configuring only one map slot on your 1 node
cluster.
Nevertheless I agree with Bertrand, this is not really a realistic use case
(or maybe you can give us more clues).
Julien
2013/2/26 Bertrand Dechoux <de...@gmail.com>
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> It is possible to have a single mapper if the input is not splittable BUT
> it is rarely seen as a feature.
> One could ask why you want to use a platform for distributed computing for
> a job that shouldn't be distributed.
>
> Regards
>
> Bertrand
>
>
>
> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
> arindamchoudhury0@gmail.com> wrote:
>
>> Hi all,
>>
>> I am trying to run terasort using one map and one reduce. so, I generated
>> the input data using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> Then I launched the hadoop terasort job using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>
>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>> the job statistics I found:
>>
>> hadoop job -history /user/hadoop/output1
>>
>> Task Summary
>> ============================
>> Kind Total Successful Failed Killed StartTime FinishTime
>>
>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
>> 10:57:55 (8sec)
>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
>> 11:05:37 (7mins, 40sec)
>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
>> 11:08:31 (10mins, 10sec)
>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
>> 11:08:36 (4sec)
>> ============================
>>
>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>
>> How to solve this problem. How to tell hadoop to launch only one map.
>>
>> Thanks,
>>
>
>
Re: Running terasort with 1 map task
Posted by Bertrand Dechoux <de...@gmail.com>.
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
It is possible to have a single mapper if the input is not splittable BUT
it is rarely seen as a feature.
One could ask why you want to use a platform for distributed computing for
a job that shouldn't be distributed.
Regards
Bertrand
On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> Hi all,
>
> I am trying to run terasort using one map and one reduce. so, I generated
> the input data using:
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> Then I launched the hadoop terasort job using:
>
> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>
> I thought it will run the job using 1 map and 1 reduce, but when inspect
> the job statistics I found:
>
> hadoop job -history /user/hadoop/output1
>
> Task Summary
> ============================
> Kind Total Successful Failed Killed StartTime FinishTime
>
> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
> 10:57:55 (8sec)
> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
> 11:05:37 (7mins, 40sec)
> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
> 11:08:31 (10mins, 10sec)
> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
> 11:08:36 (4sec)
> ============================
>
> so, though I mentioned to launch one map tasks, there are 24 of them.
>
> How to solve this problem. How to tell hadoop to launch only one map.
>
> Thanks,
>
Re: Running terasort with 1 map task
Posted by Bertrand Dechoux <de...@gmail.com>.
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
It is possible to have a single mapper if the input is not splittable BUT
it is rarely seen as a feature.
One could ask why you want to use a platform for distributed computing for
a job that shouldn't be distributed.
Regards
Bertrand
On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> Hi all,
>
> I am trying to run terasort using one map and one reduce. so, I generated
> the input data using:
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> Then I launched the hadoop terasort job using:
>
> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>
> I thought it will run the job using 1 map and 1 reduce, but when inspect
> the job statistics I found:
>
> hadoop job -history /user/hadoop/output1
>
> Task Summary
> ============================
> Kind Total Successful Failed Killed StartTime FinishTime
>
> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
> 10:57:55 (8sec)
> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
> 11:05:37 (7mins, 40sec)
> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
> 11:08:31 (10mins, 10sec)
> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
> 11:08:36 (4sec)
> ============================
>
> so, though I mentioned to launch one map tasks, there are 24 of them.
>
> How to solve this problem. How to tell hadoop to launch only one map.
>
> Thanks,
>
Re: Running terasort with 1 map task
Posted by Bertrand Dechoux <de...@gmail.com>.
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
It is possible to have a single mapper if the input is not splittable BUT
it is rarely seen as a feature.
One could ask why you want to use a platform for distributed computing for
a job that shouldn't be distributed.
Regards
Bertrand
On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> Hi all,
>
> I am trying to run terasort using one map and one reduce. so, I generated
> the input data using:
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> Then I launched the hadoop terasort job using:
>
> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>
> I thought it will run the job using 1 map and 1 reduce, but when inspect
> the job statistics I found:
>
> hadoop job -history /user/hadoop/output1
>
> Task Summary
> ============================
> Kind Total Successful Failed Killed StartTime FinishTime
>
> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
> 10:57:55 (8sec)
> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
> 11:05:37 (7mins, 40sec)
> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
> 11:08:31 (10mins, 10sec)
> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
> 11:08:36 (4sec)
> ============================
>
> so, though I mentioned to launch one map tasks, there are 24 of them.
>
> How to solve this problem. How to tell hadoop to launch only one map.
>
> Thanks,
>
Re: Running terasort with 1 map task
Posted by Bertrand Dechoux <de...@gmail.com>.
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
It is possible to have a single mapper if the input is not splittable BUT
it is rarely seen as a feature.
One could ask why you want to use a platform for distributed computing for
a job that shouldn't be distributed.
Regards
Bertrand
On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
arindamchoudhury0@gmail.com> wrote:
> Hi all,
>
> I am trying to run terasort using one map and one reduce. so, I generated
> the input data using:
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> Then I launched the hadoop terasort job using:
>
> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>
> I thought it will run the job using 1 map and 1 reduce, but when inspect
> the job statistics I found:
>
> hadoop job -history /user/hadoop/output1
>
> Task Summary
> ============================
> Kind Total Successful Failed Killed StartTime FinishTime
>
> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013
> 10:57:55 (8sec)
> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013
> 11:05:37 (7mins, 40sec)
> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013
> 11:08:31 (10mins, 10sec)
> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 26-Feb-2013
> 11:08:36 (4sec)
> ============================
>
> so, though I mentioned to launch one map tasks, there are 24 of them.
>
> How to solve this problem. How to tell hadoop to launch only one map.
>
> Thanks,
>