You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by YouPeng Yang <yy...@gmail.com> on 2013/04/19 18:30:02 UTC

Map‘s number with NLineInputFormat

Hi All

 I  take NLineInputFormat  as the Text Input Format with the following code
:
 NLineInputFormat.setNumLinesPerSplit(job, 10);
 NLineInputFormat.addInputPath(job,new Path(args[0].toString()));

 My input file contains 1000 rows,so I thought it will distribute
100(1000/10) maps.However I got 4 maps.

  I'm confued by the number of Map that was distributed according to the
running log[1].
 How it distribute  maps when using NLineInputFormat


Regards



[1]=======================================================
....
....
2013-04-19 23:56:20,377 INFO  mapreduce.Job
(Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
mode : false
2013-04-19 23:56:20,377 INFO  mapreduce.Job
(Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
2013-04-19 23:56:20,381 INFO  mapred.MapTask
(MapTask.java:sortAndSpill(1597)) - Finished spill 0
2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
committing
2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:statusUpdate(501)) - map
2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
'attempt_local_0001_m_000001_0' done.
2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:run(238)) - Finishing task:
attempt_local_0001_m_000001_0
2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:run(213)) - Starting task:
attempt_local_0001_m_000002_0
2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
 Using ResourceCalculatorPlugin :
org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
2013-04-19 23:56:20,486 INFO  mapred.MapTask
(MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
mapreduce.task.io.sort.mb: 100
2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
soft limit at 83886080
2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
bufstart = 0; bufvoid = 104857600
2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
kvstart = 26214396; length = 6553600
2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:statusUpdate(501)) -
2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
Starting flush of map output
2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
Spilling map output
2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
bufstart = 0; bufend = 336; bufvoid = 104857600
2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
189/6553600
2013-04-19 23:56:20,523 INFO  mapred.MapTask
(MapTask.java:sortAndSpill(1597)) - Finished spill 0
2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
Task:attempt_local_0001_m_000002_0 is done. And is in the process of
committing
2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:statusUpdate(501)) - map
2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
'attempt_local_0001_m_000002_0' done.
2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:run(238)) - Finishing task:
attempt_local_0001_m_000002_0
2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:run(213)) - Starting task:
attempt_local_0001_m_000003_0
2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
 Using ResourceCalculatorPlugin :
org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
2013-04-19 23:56:20,666 INFO  mapred.MapTask
(MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
mapreduce.task.io.sort.mb: 100
2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
soft limit at 83886080
2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
bufstart = 0; bufvoid = 104857600
2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
kvstart = 26214396; length = 6553600
2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:statusUpdate(501)) -
2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
Starting flush of map output
2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
Spilling map output
2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
bufstart = 0; bufend = 329; bufvoid = 104857600
2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
185/6553600
2013-04-19 23:56:20,695 INFO  mapred.MapTask
(MapTask.java:sortAndSpill(1597)) - Finished spill 0
2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
committing
2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:statusUpdate(501)) - map
2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
'attempt_local_0001_m_000003_0' done.
2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:run(238)) - Finishing task:
attempt_local_0001_m_000003_0
2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:run(394)) - Map task executor complete.
2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
 Using ResourceCalculatorPlugin :
org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
Merging 4 sorted segments
2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down
to the last merge-pass, with 4 segments left of total size: 8532 bytes
2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:statusUpdate(501)) -
2013-04-19 23:56:20,807 WARN  conf.Configuration
(Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
deprecated. Instead, use mapreduce.job.skiprecords
2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
Task:attempt_local_0001_r_000000_0 is done. And is in the process of
committing
2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:statusUpdate(501)) -
2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
attempt_local_0001_r_000000_0 is allowed to commit now
2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
(FileOutputCommitter.java:commitTask(432)) - Saved output of task
'attempt_local_0001_r_000000_0' to
hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
'attempt_local_0001_r_000000_0' done.
2013-04-19 23:56:21,381 INFO  mapreduce.Job
(Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
2013-04-19 23:56:21,381 INFO  mapreduce.Job
(Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
successfully
2013-04-19 23:56:21,427 INFO  mapreduce.Job
(Job.java:monitorAndPrintJob(1311)) - Counters: 32
File System Counters
FILE: Number of bytes read=483553
FILE: Number of bytes written=1313962
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=296769
HDFS: Number of bytes written=284
HDFS: Number of read operations=66
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Map-Reduce Framework
Map input records=1000
Map output records=1000
Map output bytes=6543
Map output materialized bytes=8567
Input split bytes=516
Combine input records=0
Combine output records=0
Reduce input groups=12
Reduce shuffle bytes=0
Reduce input records=1000
Reduce output records=0
Spilled Records=2000
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=7
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=1773993984
File Input Format Counters
Bytes Read=68723
File Output Format Counters
Bytes Written=0

Re: Map‘s number with NLineInputFormat

Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi Harsh

   Thank you for suggestion . I do miss the expression to set the input format .
    Now, it works .


Thanks

Regards 

发自我的 iPhone

在 2013-4-21,1:04,Harsh J <ha...@cloudera.com> 写道:

> Do you also ensure setting your desired input format class via the
> setInputFormat*(…) API?
> 
> On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
> <yy...@gmail.com> wrote:
>> Hi
>>   I thought it would be different when adopt the NLineInputFormat
>>   So here is my conclusion the maps distribution has nothing with the
>> NLineInputFormat . The
>> NLineInputFormat could decide the number of row to each map, which map has
>> been generated according to the split.size .
>> 
>>    An I got the point?
>> 
>> 
>> Regards
>> 
>> 发自我的 iPhone
>> 
>> 在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:
>> 
>> The num of map is decided by the block size and your rawdata
>> 
>> ―
>> Sent from Mailbox for iPhone
>> 
>> 
>> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
>> wrote:
>>> 
>>> Hi All
>>> 
>>> I  take NLineInputFormat  as the Text Input Format with the following
>>> code :
>>> NLineInputFormat.setNumLinesPerSplit(job, 10);
>>> NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>> 
>>> My input file contains 1000 rows,so I thought it will distribute
>>> 100(1000/10) maps.However I got 4 maps.
>>> 
>>>  I'm confued by the number of Map that was distributed according to the
>>> running log[1].
>>> How it distribute  maps when using NLineInputFormat
>>> 
>>> 
>>> Regards
>>> 
>>> 
>>> 
>>> [1]=======================================================
>>> ....
>>> ....
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>>> mode : false
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000001_0' done.
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000001_0
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 336; bufvoid = 104857600
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
>>> 189/6553600
>>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000002_0' done.
>>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000003_0
>>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 329; bufvoid = 104857600
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
>>> 185/6553600
>>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000003_0' done.
>>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000003_0
>>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(394)) - Map task executor complete.
>>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
>>> Merging 4 sorted segments
>>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) -
>>> Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,807 WARN  conf.Configuration
>>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
>>> deprecated. Instead, use mapreduce.job.skiprecords
>>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
>>> attempt_local_0001_r_000000_0 is allowed to commit now
>>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
>>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
>>> 'attempt_local_0001_r_000000_0' to
>>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_r_000000_0' done.
>>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
>>> successfully
>>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>>> File System Counters
>>> FILE: Number of bytes read=483553
>>> FILE: Number of bytes written=1313962
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=296769
>>> HDFS: Number of bytes written=284
>>> HDFS: Number of read operations=66
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=8
>>> Map-Reduce Framework
>>> Map input records=1000
>>> Map output records=1000
>>> Map output bytes=6543
>>> Map output materialized bytes=8567
>>> Input split bytes=516
>>> Combine input records=0
>>> Combine output records=0
>>> Reduce input groups=12
>>> Reduce shuffle bytes=0
>>> Reduce input records=1000
>>> Reduce output records=0
>>> Spilled Records=2000
>>> Shuffled Maps =0
>>> Failed Shuffles=0
>>> Merged Map outputs=0
>>> GC time elapsed (ms)=7
>>> CPU time spent (ms)=0
>>> Physical memory (bytes) snapshot=0
>>> Virtual memory (bytes) snapshot=0
>>> Total committed heap usage (bytes)=1773993984
>>> File Input Format Counters
>>> Bytes Read=68723
>>> File Output Format Counters
>>> Bytes Written=0
> 
> 
> 
> -- 
> Harsh J

Re: Map‘s number with NLineInputFormat

Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi Harsh

   Thank you for suggestion . I do miss the expression to set the input format .
    Now, it works .


Thanks

Regards 

发自我的 iPhone

在 2013-4-21,1:04,Harsh J <ha...@cloudera.com> 写道:

> Do you also ensure setting your desired input format class via the
> setInputFormat*(…) API?
> 
> On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
> <yy...@gmail.com> wrote:
>> Hi
>>   I thought it would be different when adopt the NLineInputFormat
>>   So here is my conclusion the maps distribution has nothing with the
>> NLineInputFormat . The
>> NLineInputFormat could decide the number of row to each map, which map has
>> been generated according to the split.size .
>> 
>>    An I got the point?
>> 
>> 
>> Regards
>> 
>> 发自我的 iPhone
>> 
>> 在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:
>> 
>> The num of map is decided by the block size and your rawdata
>> 
>> ―
>> Sent from Mailbox for iPhone
>> 
>> 
>> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
>> wrote:
>>> 
>>> Hi All
>>> 
>>> I  take NLineInputFormat  as the Text Input Format with the following
>>> code :
>>> NLineInputFormat.setNumLinesPerSplit(job, 10);
>>> NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>> 
>>> My input file contains 1000 rows,so I thought it will distribute
>>> 100(1000/10) maps.However I got 4 maps.
>>> 
>>>  I'm confued by the number of Map that was distributed according to the
>>> running log[1].
>>> How it distribute  maps when using NLineInputFormat
>>> 
>>> 
>>> Regards
>>> 
>>> 
>>> 
>>> [1]=======================================================
>>> ....
>>> ....
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>>> mode : false
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000001_0' done.
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000001_0
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 336; bufvoid = 104857600
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
>>> 189/6553600
>>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000002_0' done.
>>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000003_0
>>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 329; bufvoid = 104857600
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
>>> 185/6553600
>>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000003_0' done.
>>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000003_0
>>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(394)) - Map task executor complete.
>>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
>>> Merging 4 sorted segments
>>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) -
>>> Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,807 WARN  conf.Configuration
>>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
>>> deprecated. Instead, use mapreduce.job.skiprecords
>>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
>>> attempt_local_0001_r_000000_0 is allowed to commit now
>>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
>>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
>>> 'attempt_local_0001_r_000000_0' to
>>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_r_000000_0' done.
>>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
>>> successfully
>>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>>> File System Counters
>>> FILE: Number of bytes read=483553
>>> FILE: Number of bytes written=1313962
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=296769
>>> HDFS: Number of bytes written=284
>>> HDFS: Number of read operations=66
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=8
>>> Map-Reduce Framework
>>> Map input records=1000
>>> Map output records=1000
>>> Map output bytes=6543
>>> Map output materialized bytes=8567
>>> Input split bytes=516
>>> Combine input records=0
>>> Combine output records=0
>>> Reduce input groups=12
>>> Reduce shuffle bytes=0
>>> Reduce input records=1000
>>> Reduce output records=0
>>> Spilled Records=2000
>>> Shuffled Maps =0
>>> Failed Shuffles=0
>>> Merged Map outputs=0
>>> GC time elapsed (ms)=7
>>> CPU time spent (ms)=0
>>> Physical memory (bytes) snapshot=0
>>> Virtual memory (bytes) snapshot=0
>>> Total committed heap usage (bytes)=1773993984
>>> File Input Format Counters
>>> Bytes Read=68723
>>> File Output Format Counters
>>> Bytes Written=0
> 
> 
> 
> -- 
> Harsh J

Re: Map‘s number with NLineInputFormat

Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi Harsh

   Thank you for suggestion . I do miss the expression to set the input format .
    Now, it works .


Thanks

Regards 

发自我的 iPhone

在 2013-4-21,1:04,Harsh J <ha...@cloudera.com> 写道:

> Do you also ensure setting your desired input format class via the
> setInputFormat*(…) API?
> 
> On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
> <yy...@gmail.com> wrote:
>> Hi
>>   I thought it would be different when adopt the NLineInputFormat
>>   So here is my conclusion the maps distribution has nothing with the
>> NLineInputFormat . The
>> NLineInputFormat could decide the number of row to each map, which map has
>> been generated according to the split.size .
>> 
>>    An I got the point?
>> 
>> 
>> Regards
>> 
>> 发自我的 iPhone
>> 
>> 在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:
>> 
>> The num of map is decided by the block size and your rawdata
>> 
>> ―
>> Sent from Mailbox for iPhone
>> 
>> 
>> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
>> wrote:
>>> 
>>> Hi All
>>> 
>>> I  take NLineInputFormat  as the Text Input Format with the following
>>> code :
>>> NLineInputFormat.setNumLinesPerSplit(job, 10);
>>> NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>> 
>>> My input file contains 1000 rows,so I thought it will distribute
>>> 100(1000/10) maps.However I got 4 maps.
>>> 
>>>  I'm confued by the number of Map that was distributed according to the
>>> running log[1].
>>> How it distribute  maps when using NLineInputFormat
>>> 
>>> 
>>> Regards
>>> 
>>> 
>>> 
>>> [1]=======================================================
>>> ....
>>> ....
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>>> mode : false
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000001_0' done.
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000001_0
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 336; bufvoid = 104857600
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
>>> 189/6553600
>>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000002_0' done.
>>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000003_0
>>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 329; bufvoid = 104857600
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
>>> 185/6553600
>>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000003_0' done.
>>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000003_0
>>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(394)) - Map task executor complete.
>>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
>>> Merging 4 sorted segments
>>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) -
>>> Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,807 WARN  conf.Configuration
>>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
>>> deprecated. Instead, use mapreduce.job.skiprecords
>>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
>>> attempt_local_0001_r_000000_0 is allowed to commit now
>>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
>>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
>>> 'attempt_local_0001_r_000000_0' to
>>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_r_000000_0' done.
>>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
>>> successfully
>>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>>> File System Counters
>>> FILE: Number of bytes read=483553
>>> FILE: Number of bytes written=1313962
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=296769
>>> HDFS: Number of bytes written=284
>>> HDFS: Number of read operations=66
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=8
>>> Map-Reduce Framework
>>> Map input records=1000
>>> Map output records=1000
>>> Map output bytes=6543
>>> Map output materialized bytes=8567
>>> Input split bytes=516
>>> Combine input records=0
>>> Combine output records=0
>>> Reduce input groups=12
>>> Reduce shuffle bytes=0
>>> Reduce input records=1000
>>> Reduce output records=0
>>> Spilled Records=2000
>>> Shuffled Maps =0
>>> Failed Shuffles=0
>>> Merged Map outputs=0
>>> GC time elapsed (ms)=7
>>> CPU time spent (ms)=0
>>> Physical memory (bytes) snapshot=0
>>> Virtual memory (bytes) snapshot=0
>>> Total committed heap usage (bytes)=1773993984
>>> File Input Format Counters
>>> Bytes Read=68723
>>> File Output Format Counters
>>> Bytes Written=0
> 
> 
> 
> -- 
> Harsh J

Re: Map‘s number with NLineInputFormat

Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi Harsh

   Thank you for suggestion . I do miss the expression to set the input format .
    Now, it works .


Thanks

Regards 

发自我的 iPhone

在 2013-4-21,1:04,Harsh J <ha...@cloudera.com> 写道:

> Do you also ensure setting your desired input format class via the
> setInputFormat*(…) API?
> 
> On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
> <yy...@gmail.com> wrote:
>> Hi
>>   I thought it would be different when adopt the NLineInputFormat
>>   So here is my conclusion the maps distribution has nothing with the
>> NLineInputFormat . The
>> NLineInputFormat could decide the number of row to each map, which map has
>> been generated according to the split.size .
>> 
>>    An I got the point?
>> 
>> 
>> Regards
>> 
>> 发自我的 iPhone
>> 
>> 在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:
>> 
>> The num of map is decided by the block size and your rawdata
>> 
>> ―
>> Sent from Mailbox for iPhone
>> 
>> 
>> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
>> wrote:
>>> 
>>> Hi All
>>> 
>>> I  take NLineInputFormat  as the Text Input Format with the following
>>> code :
>>> NLineInputFormat.setNumLinesPerSplit(job, 10);
>>> NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>> 
>>> My input file contains 1000 rows,so I thought it will distribute
>>> 100(1000/10) maps.However I got 4 maps.
>>> 
>>>  I'm confued by the number of Map that was distributed according to the
>>> running log[1].
>>> How it distribute  maps when using NLineInputFormat
>>> 
>>> 
>>> Regards
>>> 
>>> 
>>> 
>>> [1]=======================================================
>>> ....
>>> ....
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>>> mode : false
>>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000001_0' done.
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000001_0
>>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 336; bufvoid = 104857600
>>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
>>> 189/6553600
>>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000002_0' done.
>>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000002_0
>>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(213)) - Starting task:
>>> attempt_local_0001_m_000003_0
>>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
>>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>>> mapreduce.task.io.sort.mb: 100
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>>> soft limit at 83886080
>>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>>> bufstart = 0; bufvoid = 104857600
>>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>>> kvstart = 26214396; length = 6553600
>>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>>> Starting flush of map output
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>>> Spilling map output
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>>> bufstart = 0; bufend = 329; bufvoid = 104857600
>>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>>> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
>>> 185/6553600
>>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
>>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - map
>>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_m_000003_0' done.
>>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(238)) - Finishing task:
>>> attempt_local_0001_m_000003_0
>>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:run(394)) - Map task executor complete.
>>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>>> Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
>>> Merging 4 sorted segments
>>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) -
>>> Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:20,807 WARN  conf.Configuration
>>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
>>> deprecated. Instead, use mapreduce.job.skiprecords
>>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
>>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>>> committing
>>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) -
>>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
>>> attempt_local_0001_r_000000_0 is allowed to commit now
>>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
>>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
>>> 'attempt_local_0001_r_000000_0' to
>>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
>>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) -
>>> Task 'attempt_local_0001_r_000000_0' done.
>>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
>>> successfully
>>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
>>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>>> File System Counters
>>> FILE: Number of bytes read=483553
>>> FILE: Number of bytes written=1313962
>>> FILE: Number of read operations=0
>>> FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=296769
>>> HDFS: Number of bytes written=284
>>> HDFS: Number of read operations=66
>>> HDFS: Number of large read operations=0
>>> HDFS: Number of write operations=8
>>> Map-Reduce Framework
>>> Map input records=1000
>>> Map output records=1000
>>> Map output bytes=6543
>>> Map output materialized bytes=8567
>>> Input split bytes=516
>>> Combine input records=0
>>> Combine output records=0
>>> Reduce input groups=12
>>> Reduce shuffle bytes=0
>>> Reduce input records=1000
>>> Reduce output records=0
>>> Spilled Records=2000
>>> Shuffled Maps =0
>>> Failed Shuffles=0
>>> Merged Map outputs=0
>>> GC time elapsed (ms)=7
>>> CPU time spent (ms)=0
>>> Physical memory (bytes) snapshot=0
>>> Virtual memory (bytes) snapshot=0
>>> Total committed heap usage (bytes)=1773993984
>>> File Input Format Counters
>>> Bytes Read=68723
>>> File Output Format Counters
>>> Bytes Written=0
> 
> 
> 
> -- 
> Harsh J

Re: Map‘s number with NLineInputFormat

Posted by Harsh J <ha...@cloudera.com>.
Do you also ensure setting your desired input format class via the
setInputFormat*(…) API?

On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
<yy...@gmail.com> wrote:
> Hi
>    I thought it would be different when adopt the NLineInputFormat
>    So here is my conclusion the maps distribution has nothing with the
> NLineInputFormat . The
> NLineInputFormat could decide the number of row to each map, which map has
> been generated according to the split.size .
>
>     An I got the point?
>
>
> Regards
>
> 发自我的 iPhone
>
> 在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:
>
> The num of map is decided by the block size and your rawdata
>
> ―
> Sent from Mailbox for iPhone
>
>
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
> wrote:
>>
>> Hi All
>>
>>  I  take NLineInputFormat  as the Text Input Format with the following
>> code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>
>>  My input file contains 1000 rows,so I thought it will distribute
>> 100(1000/10) maps.However I got 4 maps.
>>
>>   I'm confued by the number of Map that was distributed according to the
>> running log[1].
>>  How it distribute  maps when using NLineInputFormat
>>
>>
>> Regards
>>
>>
>>
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>> mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000001_0' done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(213)) - Starting task:
>> attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>> mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>> soft limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>> bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>> kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>> Starting flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>> Spilling map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>> bufstart = 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
>> 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000002_0' done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(213)) - Starting task:
>> attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>> mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>> soft limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>> bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>> kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>> Starting flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>> Spilling map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>> bufstart = 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
>> 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000003_0' done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(394)) - Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
>> Merging 4 sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) -
>> Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration
>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
>> deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
>> attempt_local_0001_r_000000_0 is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
>> 'attempt_local_0001_r_000000_0' to
>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_r_000000_0' done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
>> successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>> File System Counters
>> FILE: Number of bytes read=483553
>> FILE: Number of bytes written=1313962
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=296769
>> HDFS: Number of bytes written=284
>> HDFS: Number of read operations=66
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=8
>> Map-Reduce Framework
>> Map input records=1000
>> Map output records=1000
>> Map output bytes=6543
>> Map output materialized bytes=8567
>> Input split bytes=516
>> Combine input records=0
>> Combine output records=0
>> Reduce input groups=12
>> Reduce shuffle bytes=0
>> Reduce input records=1000
>> Reduce output records=0
>> Spilled Records=2000
>> Shuffled Maps =0
>> Failed Shuffles=0
>> Merged Map outputs=0
>> GC time elapsed (ms)=7
>> CPU time spent (ms)=0
>> Physical memory (bytes) snapshot=0
>> Virtual memory (bytes) snapshot=0
>> Total committed heap usage (bytes)=1773993984
>> File Input Format Counters
>> Bytes Read=68723
>> File Output Format Counters
>> Bytes Written=0
>>
>>
>>
>



-- 
Harsh J

Re: Map‘s number with NLineInputFormat

Posted by Harsh J <ha...@cloudera.com>.
Do you also ensure setting your desired input format class via the
setInputFormat*(…) API?

On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
<yy...@gmail.com> wrote:
> Hi
>    I thought it would be different when adopt the NLineInputFormat
>    So here is my conclusion the maps distribution has nothing with the
> NLineInputFormat . The
> NLineInputFormat could decide the number of row to each map, which map has
> been generated according to the split.size .
>
>     An I got the point?
>
>
> Regards
>
> 发自我的 iPhone
>
> 在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:
>
> The num of map is decided by the block size and your rawdata
>
> ―
> Sent from Mailbox for iPhone
>
>
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
> wrote:
>>
>> Hi All
>>
>>  I  take NLineInputFormat  as the Text Input Format with the following
>> code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>
>>  My input file contains 1000 rows,so I thought it will distribute
>> 100(1000/10) maps.However I got 4 maps.
>>
>>   I'm confued by the number of Map that was distributed according to the
>> running log[1].
>>  How it distribute  maps when using NLineInputFormat
>>
>>
>> Regards
>>
>>
>>
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>> mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000001_0' done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(213)) - Starting task:
>> attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>> mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>> soft limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>> bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>> kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>> Starting flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>> Spilling map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>> bufstart = 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
>> 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000002_0' done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(213)) - Starting task:
>> attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>> mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>> soft limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>> bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>> kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>> Starting flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>> Spilling map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>> bufstart = 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
>> 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000003_0' done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(394)) - Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
>> Merging 4 sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) -
>> Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration
>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
>> deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
>> attempt_local_0001_r_000000_0 is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
>> 'attempt_local_0001_r_000000_0' to
>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_r_000000_0' done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
>> successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>> File System Counters
>> FILE: Number of bytes read=483553
>> FILE: Number of bytes written=1313962
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=296769
>> HDFS: Number of bytes written=284
>> HDFS: Number of read operations=66
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=8
>> Map-Reduce Framework
>> Map input records=1000
>> Map output records=1000
>> Map output bytes=6543
>> Map output materialized bytes=8567
>> Input split bytes=516
>> Combine input records=0
>> Combine output records=0
>> Reduce input groups=12
>> Reduce shuffle bytes=0
>> Reduce input records=1000
>> Reduce output records=0
>> Spilled Records=2000
>> Shuffled Maps =0
>> Failed Shuffles=0
>> Merged Map outputs=0
>> GC time elapsed (ms)=7
>> CPU time spent (ms)=0
>> Physical memory (bytes) snapshot=0
>> Virtual memory (bytes) snapshot=0
>> Total committed heap usage (bytes)=1773993984
>> File Input Format Counters
>> Bytes Read=68723
>> File Output Format Counters
>> Bytes Written=0
>>
>>
>>
>



-- 
Harsh J

Re: Map‘s number with NLineInputFormat

Posted by Harsh J <ha...@cloudera.com>.
Do you also ensure setting your desired input format class via the
setInputFormat*(…) API?

On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
<yy...@gmail.com> wrote:
> Hi
>    I thought it would be different when adopt the NLineInputFormat
>    So here is my conclusion the maps distribution has nothing with the
> NLineInputFormat . The
> NLineInputFormat could decide the number of row to each map, which map has
> been generated according to the split.size .
>
>     An I got the point?
>
>
> Regards
>
> 发自我的 iPhone
>
> 在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:
>
> The num of map is decided by the block size and your rawdata
>
> ―
> Sent from Mailbox for iPhone
>
>
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
> wrote:
>>
>> Hi All
>>
>>  I  take NLineInputFormat  as the Text Input Format with the following
>> code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>
>>  My input file contains 1000 rows,so I thought it will distribute
>> 100(1000/10) maps.However I got 4 maps.
>>
>>   I'm confued by the number of Map that was distributed according to the
>> running log[1].
>>  How it distribute  maps when using NLineInputFormat
>>
>>
>> Regards
>>
>>
>>
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>> mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000001_0' done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(213)) - Starting task:
>> attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>> mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>> soft limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>> bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>> kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>> Starting flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>> Spilling map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>> bufstart = 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
>> 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000002_0' done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(213)) - Starting task:
>> attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>> mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>> soft limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>> bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>> kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>> Starting flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>> Spilling map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>> bufstart = 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
>> 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000003_0' done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(394)) - Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
>> Merging 4 sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) -
>> Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration
>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
>> deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
>> attempt_local_0001_r_000000_0 is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
>> 'attempt_local_0001_r_000000_0' to
>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_r_000000_0' done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
>> successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>> File System Counters
>> FILE: Number of bytes read=483553
>> FILE: Number of bytes written=1313962
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=296769
>> HDFS: Number of bytes written=284
>> HDFS: Number of read operations=66
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=8
>> Map-Reduce Framework
>> Map input records=1000
>> Map output records=1000
>> Map output bytes=6543
>> Map output materialized bytes=8567
>> Input split bytes=516
>> Combine input records=0
>> Combine output records=0
>> Reduce input groups=12
>> Reduce shuffle bytes=0
>> Reduce input records=1000
>> Reduce output records=0
>> Spilled Records=2000
>> Shuffled Maps =0
>> Failed Shuffles=0
>> Merged Map outputs=0
>> GC time elapsed (ms)=7
>> CPU time spent (ms)=0
>> Physical memory (bytes) snapshot=0
>> Virtual memory (bytes) snapshot=0
>> Total committed heap usage (bytes)=1773993984
>> File Input Format Counters
>> Bytes Read=68723
>> File Output Format Counters
>> Bytes Written=0
>>
>>
>>
>



-- 
Harsh J

Re: Map‘s number with NLineInputFormat

Posted by Harsh J <ha...@cloudera.com>.
Do you also ensure setting your desired input format class via the
setInputFormat*(…) API?

On Sat, Apr 20, 2013 at 6:48 AM, yypvsxf19870706
<yy...@gmail.com> wrote:
> Hi
>    I thought it would be different when adopt the NLineInputFormat
>    So here is my conclusion the maps distribution has nothing with the
> NLineInputFormat . The
> NLineInputFormat could decide the number of row to each map, which map has
> been generated according to the split.size .
>
>     An I got the point?
>
>
> Regards
>
> 发自我的 iPhone
>
> 在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:
>
> The num of map is decided by the block size and your rawdata
>
> ―
> Sent from Mailbox for iPhone
>
>
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
> wrote:
>>
>> Hi All
>>
>>  I  take NLineInputFormat  as the Text Input Format with the following
>> code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>>
>>  My input file contains 1000 rows,so I thought it will distribute
>> 100(1000/10) maps.However I got 4 maps.
>>
>>   I'm confued by the number of Map that was distributed according to the
>> running log[1].
>>  How it distribute  maps when using NLineInputFormat
>>
>>
>> Regards
>>
>>
>>
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
>> mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000001_0' done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(213)) - Starting task:
>> attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>> mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>> soft limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>> bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>> kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>> Starting flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>> Spilling map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>> bufstart = 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
>> 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000002_0' done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(213)) - Starting task:
>> attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
>> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
>> mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
>> soft limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
>> bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
>> kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
>> Starting flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
>> Spilling map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
>> bufstart = 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
>> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
>> 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
>> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_m_000003_0' done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(238)) - Finishing task:
>> attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(394)) - Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>> Using ResourceCalculatorPlugin :
>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
>> Merging 4 sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) -
>> Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration
>> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
>> deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>> committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) -
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
>> attempt_local_0001_r_000000_0 is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
>> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
>> 'attempt_local_0001_r_000000_0' to
>> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
>> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) -
>> Task 'attempt_local_0001_r_000000_0' done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
>> successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
>> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>> File System Counters
>> FILE: Number of bytes read=483553
>> FILE: Number of bytes written=1313962
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=296769
>> HDFS: Number of bytes written=284
>> HDFS: Number of read operations=66
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=8
>> Map-Reduce Framework
>> Map input records=1000
>> Map output records=1000
>> Map output bytes=6543
>> Map output materialized bytes=8567
>> Input split bytes=516
>> Combine input records=0
>> Combine output records=0
>> Reduce input groups=12
>> Reduce shuffle bytes=0
>> Reduce input records=1000
>> Reduce output records=0
>> Spilled Records=2000
>> Shuffled Maps =0
>> Failed Shuffles=0
>> Merged Map outputs=0
>> GC time elapsed (ms)=7
>> CPU time spent (ms)=0
>> Physical memory (bytes) snapshot=0
>> Virtual memory (bytes) snapshot=0
>> Total committed heap usage (bytes)=1773993984
>> File Input Format Counters
>> Bytes Read=68723
>> File Output Format Counters
>> Bytes Written=0
>>
>>
>>
>



-- 
Harsh J

Re: Map‘s number with NLineInputFormat

Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi
   I thought it would be different when adopt the NLineInputFormat
   So here is my conclusion the maps distribution has nothing with the  
NLineInputFormat . The 
NLineInputFormat could decide the number of row to each map, which map has been generated according to the split.size . 

    An I got the point?


Regards

�����ҵ� iPhone

�� 2013-4-20��8:39��"Ҧ����" <ge...@gmail.com> д����

> The num of map is decided by the block size and your rawdata 
> 
> ��
> Sent from Mailbox for iPhone
> 
> 
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com> wrote:
> 
>> Hi All
>>    
>>  I  take NLineInputFormat  as the Text Input Format with the following code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>> 
>>  My input file contains 1000 rows,so I thought it will distribute 100(1000/10) maps.However I got 4 maps.
>> 
>>   I'm confued by the number of Map that was distributed according to the running log[1].
>>  How it distribute  maps when using NLineInputFormat
>> 
>> 
>> Regards
>> 
>> 
>> 
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000001_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000001_0' done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214208(104856832); length = 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000002_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000002_0' done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214212(104856848); length = 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000003_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000003_0' done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(394)) - Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) - Merging 4 sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task attempt_local_0001_r_000000_0 is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432)) - Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_r_000000_0' done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>> 	File System Counters
>> 		FILE: Number of bytes read=483553
>> 		FILE: Number of bytes written=1313962
>> 		FILE: Number of read operations=0
>> 		FILE: Number of large read operations=0
>> 		FILE: Number of write operations=0
>> 		HDFS: Number of bytes read=296769
>> 		HDFS: Number of bytes written=284
>> 		HDFS: Number of read operations=66
>> 		HDFS: Number of large read operations=0
>> 		HDFS: Number of write operations=8
>> 	Map-Reduce Framework
>> 		Map input records=1000
>> 		Map output records=1000
>> 		Map output bytes=6543
>> 		Map output materialized bytes=8567
>> 		Input split bytes=516
>> 		Combine input records=0
>> 		Combine output records=0
>> 		Reduce input groups=12
>> 		Reduce shuffle bytes=0
>> 		Reduce input records=1000
>> 		Reduce output records=0
>> 		Spilled Records=2000
>> 		Shuffled Maps =0
>> 		Failed Shuffles=0
>> 		Merged Map outputs=0
>> 		GC time elapsed (ms)=7
>> 		CPU time spent (ms)=0
>> 		Physical memory (bytes) snapshot=0
>> 		Virtual memory (bytes) snapshot=0
>> 		Total committed heap usage (bytes)=1773993984
>> 	File Input Format Counters 
>> 		Bytes Read=68723
>> 	File Output Format Counters 
>> 		Bytes Written=0
> 

Re: Map‘s number with NLineInputFormat

Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi
   I thought it would be different when adopt the NLineInputFormat
   So here is my conclusion the maps distribution has nothing with the  
NLineInputFormat . The 
NLineInputFormat could decide the number of row to each map, which map has been generated according to the split.size . 

    An I got the point?


Regards

发自我的 iPhone

在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:

> The num of map is decided by the block size and your rawdata 
> 
> ―
> Sent from Mailbox for iPhone
> 
> 
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com> wrote:
> 
>> Hi All
>>    
>>  I  take NLineInputFormat  as the Text Input Format with the following code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>> 
>>  My input file contains 1000 rows,so I thought it will distribute 100(1000/10) maps.However I got 4 maps.
>> 
>>   I'm confued by the number of Map that was distributed according to the running log[1].
>>  How it distribute  maps when using NLineInputFormat
>> 
>> 
>> Regards
>> 
>> 
>> 
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000001_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000001_0' done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214208(104856832); length = 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000002_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000002_0' done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214212(104856848); length = 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000003_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000003_0' done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(394)) - Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) - Merging 4 sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task attempt_local_0001_r_000000_0 is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432)) - Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_r_000000_0' done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>> 	File System Counters
>> 		FILE: Number of bytes read=483553
>> 		FILE: Number of bytes written=1313962
>> 		FILE: Number of read operations=0
>> 		FILE: Number of large read operations=0
>> 		FILE: Number of write operations=0
>> 		HDFS: Number of bytes read=296769
>> 		HDFS: Number of bytes written=284
>> 		HDFS: Number of read operations=66
>> 		HDFS: Number of large read operations=0
>> 		HDFS: Number of write operations=8
>> 	Map-Reduce Framework
>> 		Map input records=1000
>> 		Map output records=1000
>> 		Map output bytes=6543
>> 		Map output materialized bytes=8567
>> 		Input split bytes=516
>> 		Combine input records=0
>> 		Combine output records=0
>> 		Reduce input groups=12
>> 		Reduce shuffle bytes=0
>> 		Reduce input records=1000
>> 		Reduce output records=0
>> 		Spilled Records=2000
>> 		Shuffled Maps =0
>> 		Failed Shuffles=0
>> 		Merged Map outputs=0
>> 		GC time elapsed (ms)=7
>> 		CPU time spent (ms)=0
>> 		Physical memory (bytes) snapshot=0
>> 		Virtual memory (bytes) snapshot=0
>> 		Total committed heap usage (bytes)=1773993984
>> 	File Input Format Counters 
>> 		Bytes Read=68723
>> 	File Output Format Counters 
>> 		Bytes Written=0
> 

Re: Map‘s number with NLineInputFormat

Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi
   I thought it would be different when adopt the NLineInputFormat
   So here is my conclusion the maps distribution has nothing with the  
NLineInputFormat . The 
NLineInputFormat could decide the number of row to each map, which map has been generated according to the split.size . 

    An I got the point?


Regards

�����ҵ� iPhone

�� 2013-4-20��8:39��"Ҧ����" <ge...@gmail.com> д����

> The num of map is decided by the block size and your rawdata 
> 
> ��
> Sent from Mailbox for iPhone
> 
> 
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com> wrote:
> 
>> Hi All
>>    
>>  I  take NLineInputFormat  as the Text Input Format with the following code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>> 
>>  My input file contains 1000 rows,so I thought it will distribute 100(1000/10) maps.However I got 4 maps.
>> 
>>   I'm confued by the number of Map that was distributed according to the running log[1].
>>  How it distribute  maps when using NLineInputFormat
>> 
>> 
>> Regards
>> 
>> 
>> 
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000001_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000001_0' done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214208(104856832); length = 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000002_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000002_0' done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214212(104856848); length = 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000003_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000003_0' done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(394)) - Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) - Merging 4 sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task attempt_local_0001_r_000000_0 is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432)) - Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_r_000000_0' done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>> 	File System Counters
>> 		FILE: Number of bytes read=483553
>> 		FILE: Number of bytes written=1313962
>> 		FILE: Number of read operations=0
>> 		FILE: Number of large read operations=0
>> 		FILE: Number of write operations=0
>> 		HDFS: Number of bytes read=296769
>> 		HDFS: Number of bytes written=284
>> 		HDFS: Number of read operations=66
>> 		HDFS: Number of large read operations=0
>> 		HDFS: Number of write operations=8
>> 	Map-Reduce Framework
>> 		Map input records=1000
>> 		Map output records=1000
>> 		Map output bytes=6543
>> 		Map output materialized bytes=8567
>> 		Input split bytes=516
>> 		Combine input records=0
>> 		Combine output records=0
>> 		Reduce input groups=12
>> 		Reduce shuffle bytes=0
>> 		Reduce input records=1000
>> 		Reduce output records=0
>> 		Spilled Records=2000
>> 		Shuffled Maps =0
>> 		Failed Shuffles=0
>> 		Merged Map outputs=0
>> 		GC time elapsed (ms)=7
>> 		CPU time spent (ms)=0
>> 		Physical memory (bytes) snapshot=0
>> 		Virtual memory (bytes) snapshot=0
>> 		Total committed heap usage (bytes)=1773993984
>> 	File Input Format Counters 
>> 		Bytes Read=68723
>> 	File Output Format Counters 
>> 		Bytes Written=0
> 

Re: Map‘s number with NLineInputFormat

Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi
   I thought it would be different when adopt the NLineInputFormat
   So here is my conclusion the maps distribution has nothing with the  
NLineInputFormat . The 
NLineInputFormat could decide the number of row to each map, which map has been generated according to the split.size . 

    An I got the point?


Regards

发自我的 iPhone

在 2013-4-20,8:39,"姚吉龙" <ge...@gmail.com> 写道:

> The num of map is decided by the block size and your rawdata 
> 
> ―
> Sent from Mailbox for iPhone
> 
> 
> On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com> wrote:
> 
>> Hi All
>>    
>>  I  take NLineInputFormat  as the Text Input Format with the following code :
>>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>> 
>>  My input file contains 1000 rows,so I thought it will distribute 100(1000/10) maps.However I got 4 maps.
>> 
>>   I'm confued by the number of Map that was distributed according to the running log[1].
>>  How it distribute  maps when using NLineInputFormat
>> 
>> 
>> Regards
>> 
>> 
>> 
>> [1]=======================================================
>> ....
>> ....
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber mode : false
>> 2013-04-19 23:56:20,377 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
>> 2013-04-19 23:56:20,381 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000001_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000001_0' done.
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000001_0
>> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080
>> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 336; bufvoid = 104857600
>> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214208(104856832); length = 189/6553600
>> 2013-04-19 23:56:20,523 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000002_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000002_0' done.
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000002_0
>> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(213)) - Starting task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) - mapreduce.task.io.sort.mb: 100
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) - soft limit at 83886080
>> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) - bufstart = 0; bufvoid = 104857600
>> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) - kvstart = 26214396; length = 6553600
>> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) - Starting flush of map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) - Spilling map output
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) - bufstart = 0; bufend = 329; bufvoid = 104857600
>> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) - kvstart = 26214396(104857584); kvend = 26214212(104856848); length = 185/6553600
>> 2013-04-19 23:56:20,695 INFO  mapred.MapTask (MapTask.java:sortAndSpill(1597)) - Finished spill 0
>> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_m_000003_0 is done. And is in the process of committing
>> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - map
>> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_m_000003_0' done.
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(238)) - Finishing task: attempt_local_0001_m_000003_0
>> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner (LocalJobRunner.java:run(394)) - Map task executor complete.
>> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
>> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) - Merging 4 sorted segments
>> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down to the last merge-pass, with 4 segments left of total size: 8532 bytes
>> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:20,807 WARN  conf.Configuration (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
>> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) - Task:attempt_local_0001_r_000000_0 is done. And is in the process of committing
>> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - 
>> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task attempt_local_0001_r_000000_0 is allowed to commit now
>> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter (FileOutputCommitter.java:commitTask(432)) - Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
>> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
>> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task 'attempt_local_0001_r_000000_0' done.
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
>> 2013-04-19 23:56:21,381 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed successfully
>> 2013-04-19 23:56:21,427 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1311)) - Counters: 32
>> 	File System Counters
>> 		FILE: Number of bytes read=483553
>> 		FILE: Number of bytes written=1313962
>> 		FILE: Number of read operations=0
>> 		FILE: Number of large read operations=0
>> 		FILE: Number of write operations=0
>> 		HDFS: Number of bytes read=296769
>> 		HDFS: Number of bytes written=284
>> 		HDFS: Number of read operations=66
>> 		HDFS: Number of large read operations=0
>> 		HDFS: Number of write operations=8
>> 	Map-Reduce Framework
>> 		Map input records=1000
>> 		Map output records=1000
>> 		Map output bytes=6543
>> 		Map output materialized bytes=8567
>> 		Input split bytes=516
>> 		Combine input records=0
>> 		Combine output records=0
>> 		Reduce input groups=12
>> 		Reduce shuffle bytes=0
>> 		Reduce input records=1000
>> 		Reduce output records=0
>> 		Spilled Records=2000
>> 		Shuffled Maps =0
>> 		Failed Shuffles=0
>> 		Merged Map outputs=0
>> 		GC time elapsed (ms)=7
>> 		CPU time spent (ms)=0
>> 		Physical memory (bytes) snapshot=0
>> 		Virtual memory (bytes) snapshot=0
>> 		Total committed heap usage (bytes)=1773993984
>> 	File Input Format Counters 
>> 		Bytes Read=68723
>> 	File Output Format Counters 
>> 		Bytes Written=0
> 

Re: Map‘s number with NLineInputFormat

Posted by 姚吉龙 <ge...@gmail.com>.
The num of map is decided by the block size and your rawdata 
—
Sent from Mailbox for iPhone

On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
wrote:

> Hi All
>  I  take NLineInputFormat  as the Text Input Format with the following code
> :
>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>  My input file contains 1000 rows,so I thought it will distribute
> 100(1000/10) maps.However I got 4 maps.
>   I'm confued by the number of Map that was distributed according to the
> running log[1].
>  How it distribute  maps when using NLineInputFormat
> Regards
> [1]=======================================================
> ....
> ....
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
> mode : false
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000001_0' done.
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000001_0
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 336; bufvoid = 104857600
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
> 189/6553600
> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000002_0' done.
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 329; bufvoid = 104857600
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
> 185/6553600
> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000003_0' done.
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(394)) - Map task executor complete.
> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
> Merging 4 sorted segments
> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down
> to the last merge-pass, with 4 segments left of total size: 8532 bytes
> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,807 WARN  conf.Configuration
> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
> deprecated. Instead, use mapreduce.job.skiprecords
> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
> attempt_local_0001_r_000000_0 is allowed to commit now
> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
> 'attempt_local_0001_r_000000_0' to
> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_r_000000_0' done.
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
> successfully
> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
> File System Counters
> FILE: Number of bytes read=483553
> FILE: Number of bytes written=1313962
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=296769
> HDFS: Number of bytes written=284
> HDFS: Number of read operations=66
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=8
> Map-Reduce Framework
> Map input records=1000
> Map output records=1000
> Map output bytes=6543
> Map output materialized bytes=8567
> Input split bytes=516
> Combine input records=0
> Combine output records=0
> Reduce input groups=12
> Reduce shuffle bytes=0
> Reduce input records=1000
> Reduce output records=0
> Spilled Records=2000
> Shuffled Maps =0
> Failed Shuffles=0
> Merged Map outputs=0
> GC time elapsed (ms)=7
> CPU time spent (ms)=0
> Physical memory (bytes) snapshot=0
> Virtual memory (bytes) snapshot=0
> Total committed heap usage (bytes)=1773993984
> File Input Format Counters
> Bytes Read=68723
> File Output Format Counters
> Bytes Written=0

Re: Map‘s number with NLineInputFormat

Posted by 姚吉龙 <ge...@gmail.com>.
The num of map is decided by the block size and your rawdata 
—
Sent from Mailbox for iPhone

On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
wrote:

> Hi All
>  I  take NLineInputFormat  as the Text Input Format with the following code
> :
>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>  My input file contains 1000 rows,so I thought it will distribute
> 100(1000/10) maps.However I got 4 maps.
>   I'm confued by the number of Map that was distributed according to the
> running log[1].
>  How it distribute  maps when using NLineInputFormat
> Regards
> [1]=======================================================
> ....
> ....
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
> mode : false
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000001_0' done.
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000001_0
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 336; bufvoid = 104857600
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
> 189/6553600
> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000002_0' done.
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 329; bufvoid = 104857600
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
> 185/6553600
> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000003_0' done.
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(394)) - Map task executor complete.
> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
> Merging 4 sorted segments
> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down
> to the last merge-pass, with 4 segments left of total size: 8532 bytes
> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,807 WARN  conf.Configuration
> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
> deprecated. Instead, use mapreduce.job.skiprecords
> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
> attempt_local_0001_r_000000_0 is allowed to commit now
> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
> 'attempt_local_0001_r_000000_0' to
> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_r_000000_0' done.
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
> successfully
> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
> File System Counters
> FILE: Number of bytes read=483553
> FILE: Number of bytes written=1313962
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=296769
> HDFS: Number of bytes written=284
> HDFS: Number of read operations=66
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=8
> Map-Reduce Framework
> Map input records=1000
> Map output records=1000
> Map output bytes=6543
> Map output materialized bytes=8567
> Input split bytes=516
> Combine input records=0
> Combine output records=0
> Reduce input groups=12
> Reduce shuffle bytes=0
> Reduce input records=1000
> Reduce output records=0
> Spilled Records=2000
> Shuffled Maps =0
> Failed Shuffles=0
> Merged Map outputs=0
> GC time elapsed (ms)=7
> CPU time spent (ms)=0
> Physical memory (bytes) snapshot=0
> Virtual memory (bytes) snapshot=0
> Total committed heap usage (bytes)=1773993984
> File Input Format Counters
> Bytes Read=68723
> File Output Format Counters
> Bytes Written=0

Re: Map‘s number with NLineInputFormat

Posted by 姚吉龙 <ge...@gmail.com>.
The num of map is decided by the block size and your rawdata 
—
Sent from Mailbox for iPhone

On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
wrote:

> Hi All
>  I  take NLineInputFormat  as the Text Input Format with the following code
> :
>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>  My input file contains 1000 rows,so I thought it will distribute
> 100(1000/10) maps.However I got 4 maps.
>   I'm confued by the number of Map that was distributed according to the
> running log[1].
>  How it distribute  maps when using NLineInputFormat
> Regards
> [1]=======================================================
> ....
> ....
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
> mode : false
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000001_0' done.
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000001_0
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 336; bufvoid = 104857600
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
> 189/6553600
> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000002_0' done.
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 329; bufvoid = 104857600
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
> 185/6553600
> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000003_0' done.
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(394)) - Map task executor complete.
> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
> Merging 4 sorted segments
> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down
> to the last merge-pass, with 4 segments left of total size: 8532 bytes
> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,807 WARN  conf.Configuration
> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
> deprecated. Instead, use mapreduce.job.skiprecords
> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
> attempt_local_0001_r_000000_0 is allowed to commit now
> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
> 'attempt_local_0001_r_000000_0' to
> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_r_000000_0' done.
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
> successfully
> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
> File System Counters
> FILE: Number of bytes read=483553
> FILE: Number of bytes written=1313962
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=296769
> HDFS: Number of bytes written=284
> HDFS: Number of read operations=66
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=8
> Map-Reduce Framework
> Map input records=1000
> Map output records=1000
> Map output bytes=6543
> Map output materialized bytes=8567
> Input split bytes=516
> Combine input records=0
> Combine output records=0
> Reduce input groups=12
> Reduce shuffle bytes=0
> Reduce input records=1000
> Reduce output records=0
> Spilled Records=2000
> Shuffled Maps =0
> Failed Shuffles=0
> Merged Map outputs=0
> GC time elapsed (ms)=7
> CPU time spent (ms)=0
> Physical memory (bytes) snapshot=0
> Virtual memory (bytes) snapshot=0
> Total committed heap usage (bytes)=1773993984
> File Input Format Counters
> Bytes Read=68723
> File Output Format Counters
> Bytes Written=0

Re: Map‘s number with NLineInputFormat

Posted by 姚吉龙 <ge...@gmail.com>.
The num of map is decided by the block size and your rawdata 
—
Sent from Mailbox for iPhone

On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang <yy...@gmail.com>
wrote:

> Hi All
>  I  take NLineInputFormat  as the Text Input Format with the following code
> :
>  NLineInputFormat.setNumLinesPerSplit(job, 10);
>  NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
>  My input file contains 1000 rows,so I thought it will distribute
> 100(1000/10) maps.However I got 4 maps.
>   I'm confued by the number of Map that was distributed according to the
> running log[1].
>  How it distribute  maps when using NLineInputFormat
> Regards
> [1]=======================================================
> ....
> ....
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1286)) - Job job_local_0001 running in uber
> mode : false
> 2013-04-19 23:56:20,377 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 25% reduce 0%
> 2013-04-19 23:56:20,381 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,384 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,388 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,389 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000001_0' done.
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000001_0
> 2013-04-19 23:56:20,389 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,391 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@36bf7916
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,486 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,487 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,515 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,515 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 336; bufvoid = 104857600
> 2013-04-19 23:56:20,516 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214208(104856832); length =
> 189/6553600
> 2013-04-19 23:56:20,523 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,552 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000002_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,555 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,556 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000002_0' done.
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000002_0
> 2013-04-19 23:56:20,556 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(213)) - Starting task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,558 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@746a63d3
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask
> (MapTask.java:setEquator(1127)) - (EQUATOR) 0 kvi 26214396(104857584)
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(923)) -
> mapreduce.task.io.sort.mb: 100
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(924)) -
> soft limit at 83886080
> 2013-04-19 23:56:20,666 INFO  mapred.MapTask (MapTask.java:<init>(925)) -
> bufstart = 0; bufvoid = 104857600
> 2013-04-19 23:56:20,667 INFO  mapred.MapTask (MapTask.java:<init>(926)) -
> kvstart = 26214396; length = 6553600
> 2013-04-19 23:56:20,690 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1389)) -
> Starting flush of map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1408)) -
> Spilling map output
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1409)) -
> bufstart = 0; bufend = 329; bufvoid = 104857600
> 2013-04-19 23:56:20,690 INFO  mapred.MapTask (MapTask.java:flush(1411)) -
> kvstart = 26214396(104857584); kvend = 26214212(104856848); length =
> 185/6553600
> 2013-04-19 23:56:20,695 INFO  mapred.MapTask
> (MapTask.java:sortAndSpill(1597)) - Finished spill 0
> 2013-04-19 23:56:20,697 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:20,717 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - map
> 2013-04-19 23:56:20,718 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_m_000003_0' done.
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(238)) - Finishing task:
> attempt_local_0001_m_000003_0
> 2013-04-19 23:56:20,718 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:run(394)) - Map task executor complete.
> 2013-04-19 23:56:20,752 INFO  mapred.Task (Task.java:initialize(565)) -
>  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@52cd19d
> 2013-04-19 23:56:20,760 INFO  mapred.Merger (Merger.java:merge(549)) -
> Merging 4 sorted segments
> 2013-04-19 23:56:20,767 INFO  mapred.Merger (Merger.java:merge(648)) - Down
> to the last merge-pass, with 4 segments left of total size: 8532 bytes
> 2013-04-19 23:56:20,768 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:20,807 WARN  conf.Configuration
> (Configuration.java:warnOnceIfDeprecated(808)) - mapred.skip.on is
> deprecated. Instead, use mapreduce.job.skiprecords
> 2013-04-19 23:56:21,129 INFO  mapred.Task (Task.java:done(979)) -
> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> committing
> 2013-04-19 23:56:21,131 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) -
> 2013-04-19 23:56:21,131 INFO  mapred.Task (Task.java:commit(1140)) - Task
> attempt_local_0001_r_000000_0 is allowed to commit now
> 2013-04-19 23:56:21,138 INFO  output.FileOutputCommitter
> (FileOutputCommitter.java:commitTask(432)) - Saved output of task
> 'attempt_local_0001_r_000000_0' to
> hdfs://Hadoop01:8040/user/hadoop/d/multi9/_temporary/0/task_local_0001_r_000000
> 2013-04-19 23:56:21,139 INFO  mapred.LocalJobRunner
> (LocalJobRunner.java:statusUpdate(501)) - reduce > reduce
> 2013-04-19 23:56:21,139 INFO  mapred.Task (Task.java:sendDone(1099)) - Task
> 'attempt_local_0001_r_000000_0' done.
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1293)) -  map 100% reduce 100%
> 2013-04-19 23:56:21,381 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1304)) - Job job_local_0001 completed
> successfully
> 2013-04-19 23:56:21,427 INFO  mapreduce.Job
> (Job.java:monitorAndPrintJob(1311)) - Counters: 32
> File System Counters
> FILE: Number of bytes read=483553
> FILE: Number of bytes written=1313962
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=296769
> HDFS: Number of bytes written=284
> HDFS: Number of read operations=66
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=8
> Map-Reduce Framework
> Map input records=1000
> Map output records=1000
> Map output bytes=6543
> Map output materialized bytes=8567
> Input split bytes=516
> Combine input records=0
> Combine output records=0
> Reduce input groups=12
> Reduce shuffle bytes=0
> Reduce input records=1000
> Reduce output records=0
> Spilled Records=2000
> Shuffled Maps =0
> Failed Shuffles=0
> Merged Map outputs=0
> GC time elapsed (ms)=7
> CPU time spent (ms)=0
> Physical memory (bytes) snapshot=0
> Virtual memory (bytes) snapshot=0
> Total committed heap usage (bytes)=1773993984
> File Input Format Counters
> Bytes Read=68723
> File Output Format Counters
> Bytes Written=0