You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sai Sai <sa...@yahoo.in> on 2013/03/04 10:42:20 UTC

Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:



30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215 
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode


Thanks
Sai

Re: Increase the number of mappers in PM mode

Posted by Harsh J <ha...@cloudera.com>.
In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).


On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706 <yypvsxf19870706@gmail.com
> wrote:

> hi:
>    i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> s
>
>
>
>


-- 
Harsh J

Re: Increase the number of mappers in PM mode

Posted by Harsh J <ha...@cloudera.com>.
In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).


On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706 <yypvsxf19870706@gmail.com
> wrote:

> hi:
>    i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> s
>
>
>
>


-- 
Harsh J

Re: Increase the number of mappers in PM mode

Posted by Harsh J <ha...@cloudera.com>.
In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).


On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706 <yypvsxf19870706@gmail.com
> wrote:

> hi:
>    i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> s
>
>
>
>


-- 
Harsh J

Re: Increase the number of mappers in PM mode

Posted by Harsh J <ha...@cloudera.com>.
In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).


On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706 <yypvsxf19870706@gmail.com
> wrote:

> hi:
>    i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> s
>
>
>
>


-- 
Harsh J

Re: Increase the number of mappers in PM mode

Posted by yypvsxf19870706 <yy...@gmail.com>.
hi��
   i think i have got it . Thank you.

�����ҵ� iPhone

�� 2013-3-15��18:32��Zheyi RONG <ro...@gmail.com> ���

> Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e.    
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com> wrote:
>> s
> 
> 

Re: Increase the number of mappers in PM mode

Posted by yypvsxf19870706 <yy...@gmail.com>.
hi:
   i think i have got it . Thank you.

发自我的 iPhone

在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:

> Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e.    
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com> wrote:
>> s
> 
> 

Re: Increase the number of mappers in PM mode

Posted by yypvsxf19870706 <yy...@gmail.com>.
hi:
   i think i have got it . Thank you.

发自我的 iPhone

在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:

> Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e.    
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com> wrote:
>> s
> 
> 

Re: Increase the number of mappers in PM mode

Posted by yypvsxf19870706 <yy...@gmail.com>.
hi��
   i think i have got it . Thank you.

�����ҵ� iPhone

�� 2013-3-15��18:32��Zheyi RONG <ro...@gmail.com> ���

> Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e.    
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com> wrote:
>> s
> 
> 

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:

> s

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:

> s

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:

> s

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:

> s

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===================================

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -----> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224

============================




regards





2013/3/14 YouPeng Yang <yy...@gmail.com>

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG <ro...@gmail.com>
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>>         ....
>>>    Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>      Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>>         ....
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>>
>>>>
>>>>
>>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>>> or is this not possible.
>>>> Thanks
>>>> Sai
>>>>
>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===================================

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -----> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224

============================




regards





2013/3/14 YouPeng Yang <yy...@gmail.com>

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG <ro...@gmail.com>
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>>         ....
>>>    Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>      Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>>         ....
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>>
>>>>
>>>>
>>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>>> or is this not possible.
>>>> Thanks
>>>> Sai
>>>>
>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===================================

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -----> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224

============================




regards





2013/3/14 YouPeng Yang <yy...@gmail.com>

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG <ro...@gmail.com>
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>>         ....
>>>    Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>      Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>>         ....
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>>
>>>>
>>>>
>>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>>> or is this not possible.
>>>> Thanks
>>>> Sai
>>>>
>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===================================

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -----> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224

============================




regards





2013/3/14 YouPeng Yang <yy...@gmail.com>

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG <ro...@gmail.com>
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>>         ....
>>>    Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>      Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>>         ....
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>>
>>>>
>>>>
>>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>>> or is this not possible.
>>>> Thanks
>>>> Sai
>>>>
>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG <ro...@gmail.com>

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>>         ....
>>    Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>      Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>>         ....
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG <ro...@gmail.com>

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>>         ....
>>    Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>      Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>>         ....
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG <ro...@gmail.com>

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>>         ....
>>    Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>      Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>>         ....
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG <ro...@gmail.com>

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>>         ....
>>    Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>      Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>>         ....
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
>         ....
>    Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>      Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
>         ....
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai <sa...@yahoo.in>
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
>         ....
>    Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>      Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
>         ....
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai <sa...@yahoo.in>
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
>         ....
>    Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>      Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
>         ....
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai <sa...@yahoo.in>
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
>         ....
>    Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>      Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
>         ....
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai <sa...@yahoo.in>
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:
        ....
   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
     Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76
        ....

 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai <sa...@yahoo.in>

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:
        ....
   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
     Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76
        ....

 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai <sa...@yahoo.in>

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:
        ....
   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
     Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76
        ....

 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai <sa...@yahoo.in>

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:
        ....
   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
     Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76
        ....

 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai <sa...@yahoo.in>

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

Posted by Sai Sai <sa...@yahoo.in>.


In Pseudo Mode where is the setting to increase the number of mappers or is this not possible.
Thanks
Sai

Re: Increase the number of mappers in PM mode

Posted by Sai Sai <sa...@yahoo.in>.


In Pseudo Mode where is the setting to increase the number of mappers or is this not possible.
Thanks
Sai

Re: Increase the number of mappers in PM mode

Posted by Sai Sai <sa...@yahoo.in>.


In Pseudo Mode where is the setting to increase the number of mappers or is this not possible.
Thanks
Sai

Re: Increase the number of mappers in PM mode

Posted by Sai Sai <sa...@yahoo.in>.


In Pseudo Mode where is the setting to increase the number of mappers or is this not possible.
Thanks
Sai

Re: Block vs FileSplit vs record vs line

Posted by Sai Sai <sa...@yahoo.in>.
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper.
Any help is appreciated.
Thanks
Sai

RE: Unknown processes unable to terminate

Posted by Leo Leung <ll...@ddn.com>.
Hi Sai,

   The RunJar process is normally the result of someone or something running “hadoop jar <something>”
   (i.e:  org.apache.hadoop.util.RunJar  <something>)

   You probably want to find out who/what is running with a more detail info via ps –ef | grep RunJar
   <stop|start>-all.sh deals with hdfs/ M/R specific process only.   So it will not stop any other java process reported by jps.

Cheers.


From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Monday, March 04, 2013 1:42 AM
To: user@hadoop.apache.org
Subject: Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:


30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode

Thanks
Sai

RE: Unknown processes unable to terminate

Posted by Leo Leung <ll...@ddn.com>.
Hi Sai,

   The RunJar process is normally the result of someone or something running “hadoop jar <something>”
   (i.e:  org.apache.hadoop.util.RunJar  <something>)

   You probably want to find out who/what is running with a more detail info via ps –ef | grep RunJar
   <stop|start>-all.sh deals with hdfs/ M/R specific process only.   So it will not stop any other java process reported by jps.

Cheers.


From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Monday, March 04, 2013 1:42 AM
To: user@hadoop.apache.org
Subject: Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:


30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode

Thanks
Sai

RE: Unknown processes unable to terminate

Posted by Leo Leung <ll...@ddn.com>.
Hi Sai,

   The RunJar process is normally the result of someone or something running “hadoop jar <something>”
   (i.e:  org.apache.hadoop.util.RunJar  <something>)

   You probably want to find out who/what is running with a more detail info via ps –ef | grep RunJar
   <stop|start>-all.sh deals with hdfs/ M/R specific process only.   So it will not stop any other java process reported by jps.

Cheers.


From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Monday, March 04, 2013 1:42 AM
To: user@hadoop.apache.org
Subject: Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:


30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode

Thanks
Sai

RE: Unknown processes unable to terminate

Posted by Leo Leung <ll...@ddn.com>.
Hi Sai,

   The RunJar process is normally the result of someone or something running “hadoop jar <something>”
   (i.e:  org.apache.hadoop.util.RunJar  <something>)

   You probably want to find out who/what is running with a more detail info via ps –ef | grep RunJar
   <stop|start>-all.sh deals with hdfs/ M/R specific process only.   So it will not stop any other java process reported by jps.

Cheers.


From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Monday, March 04, 2013 1:42 AM
To: user@hadoop.apache.org
Subject: Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:


30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode

Thanks
Sai

Re: Unknown processes unable to terminate

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sai,

Are you fine to kill all those process on this machine? If you need
ALL those process to be killed, and if they are all Java processes,
you can use killall -9 java. That will kill ALL the java process under
this user.

JM

2013/3/4 shashwat shriparv <dw...@gmail.com>:
> You can you kill -9 13082
>
> Is there eclipse or netbeans project running, that may the this process..
>
>
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> I have a list of following processes given below, i am trying to kill the
>> process 13082 using:
>>
>> kill 13082
>>
>> Its not terminating RunJar.
>>
>> I have done a stop-all.sh hoping it would stop all the processes but only
>> stopped the hadoop related processes.
>> I am just wondering if it is necessary to stop all other processes before
>> starting the hadoop process and how to stop these other processes.
>>
>> Here is the list of processes which r appearing:
>>
>>
>> 30969 FileSystemCat
>> 30877 FileSystemCat
>> 5647 StreamCompressor
>> 32200 DataNode
>> 25015 Jps
>> 2227 URLCat
>> 5563 StreamCompressor
>> 5398 StreamCompressor
>> 13082 RunJar
>> 32578 JobTracker
>> 7215
>> 385 TaskTracker
>> 31884 NameNode
>> 32489 SecondaryNameNode
>>
>> Thanks
>> Sai
>
>

Re: Unknown processes unable to terminate

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sai,

Are you fine to kill all those process on this machine? If you need
ALL those process to be killed, and if they are all Java processes,
you can use killall -9 java. That will kill ALL the java process under
this user.

JM

2013/3/4 shashwat shriparv <dw...@gmail.com>:
> You can you kill -9 13082
>
> Is there eclipse or netbeans project running, that may the this process..
>
>
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> I have a list of following processes given below, i am trying to kill the
>> process 13082 using:
>>
>> kill 13082
>>
>> Its not terminating RunJar.
>>
>> I have done a stop-all.sh hoping it would stop all the processes but only
>> stopped the hadoop related processes.
>> I am just wondering if it is necessary to stop all other processes before
>> starting the hadoop process and how to stop these other processes.
>>
>> Here is the list of processes which r appearing:
>>
>>
>> 30969 FileSystemCat
>> 30877 FileSystemCat
>> 5647 StreamCompressor
>> 32200 DataNode
>> 25015 Jps
>> 2227 URLCat
>> 5563 StreamCompressor
>> 5398 StreamCompressor
>> 13082 RunJar
>> 32578 JobTracker
>> 7215
>> 385 TaskTracker
>> 31884 NameNode
>> 32489 SecondaryNameNode
>>
>> Thanks
>> Sai
>
>

Re: Unknown processes unable to terminate

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sai,

Are you fine to kill all those process on this machine? If you need
ALL those process to be killed, and if they are all Java processes,
you can use killall -9 java. That will kill ALL the java process under
this user.

JM

2013/3/4 shashwat shriparv <dw...@gmail.com>:
> You can you kill -9 13082
>
> Is there eclipse or netbeans project running, that may the this process..
>
>
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> I have a list of following processes given below, i am trying to kill the
>> process 13082 using:
>>
>> kill 13082
>>
>> Its not terminating RunJar.
>>
>> I have done a stop-all.sh hoping it would stop all the processes but only
>> stopped the hadoop related processes.
>> I am just wondering if it is necessary to stop all other processes before
>> starting the hadoop process and how to stop these other processes.
>>
>> Here is the list of processes which r appearing:
>>
>>
>> 30969 FileSystemCat
>> 30877 FileSystemCat
>> 5647 StreamCompressor
>> 32200 DataNode
>> 25015 Jps
>> 2227 URLCat
>> 5563 StreamCompressor
>> 5398 StreamCompressor
>> 13082 RunJar
>> 32578 JobTracker
>> 7215
>> 385 TaskTracker
>> 31884 NameNode
>> 32489 SecondaryNameNode
>>
>> Thanks
>> Sai
>
>

Re: Unknown processes unable to terminate

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sai,

Are you fine to kill all those process on this machine? If you need
ALL those process to be killed, and if they are all Java processes,
you can use killall -9 java. That will kill ALL the java process under
this user.

JM

2013/3/4 shashwat shriparv <dw...@gmail.com>:
> You can you kill -9 13082
>
> Is there eclipse or netbeans project running, that may the this process..
>
>
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> I have a list of following processes given below, i am trying to kill the
>> process 13082 using:
>>
>> kill 13082
>>
>> Its not terminating RunJar.
>>
>> I have done a stop-all.sh hoping it would stop all the processes but only
>> stopped the hadoop related processes.
>> I am just wondering if it is necessary to stop all other processes before
>> starting the hadoop process and how to stop these other processes.
>>
>> Here is the list of processes which r appearing:
>>
>>
>> 30969 FileSystemCat
>> 30877 FileSystemCat
>> 5647 StreamCompressor
>> 32200 DataNode
>> 25015 Jps
>> 2227 URLCat
>> 5563 StreamCompressor
>> 5398 StreamCompressor
>> 13082 RunJar
>> 32578 JobTracker
>> 7215
>> 385 TaskTracker
>> 31884 NameNode
>> 32489 SecondaryNameNode
>>
>> Thanks
>> Sai
>
>

Re: Unknown processes unable to terminate

Posted by shashwat shriparv <dw...@gmail.com>.
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>

Re: Unknown processes unable to terminate

Posted by shashwat shriparv <dw...@gmail.com>.
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>

Re: Unknown processes unable to terminate

Posted by shashwat shriparv <dw...@gmail.com>.
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>

Re: Block vs FileSplit vs record vs line

Posted by Sai Sai <sa...@yahoo.in>.
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper.
Any help is appreciated.
Thanks
Sai

Re: Block vs FileSplit vs record vs line

Posted by Sai Sai <sa...@yahoo.in>.
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper.
Any help is appreciated.
Thanks
Sai

Re: Unknown processes unable to terminate

Posted by shashwat shriparv <dw...@gmail.com>.
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>

Re: Block vs FileSplit vs record vs line

Posted by Sai Sai <sa...@yahoo.in>.
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper.
Any help is appreciated.
Thanks
Sai