You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by - <co...@ymail.com> on 2013/11/08 22:03:00 UTC

Hadoop 2.02 multiple merges in map phase

Hi All,

In my map logs I see the "Merging 2 sorted segments" message twice and after each message it takes time to merge. Why is it merging twice? And where does the merge sizes "69954754" and "69631730" come from?

My expected result would be a singe merge of 2 sorted segments with a total size of filesplit size (which is 128MB in my case).
Thanks!

...
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0
INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 1
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69954754 bytes
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69631730 bytes
INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1383580088156_0010_m_000035_0 is done. And is in the process of committing
INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1383580088156_0010_m_000035_0' done.
...

Re: Hadoop 2.02 multiple merges in map phase

Posted by - <co...@ymail.com>.
Correction*: version is 2.2.0

Thanks!



On Friday, November 8, 2013 4:03 PM, - <co...@ymail.com> wrote:
 
Hi All,

In my map logs I see the "Merging 2 sorted segments" message twice and after each message it takes time to merge. Why is it merging twice? And where does the merge sizes "69954754" and "69631730" come from?

My expected result would be a singe merge of 2 sorted segments with a total size of filesplit size (which is 128MB in my case).
Thanks!

...
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0
INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 1
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69954754 bytes
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69631730 bytes
INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1383580088156_0010_m_000035_0 is done. And is in the process of committing
INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1383580088156_0010_m_000035_0' done.
...

Re: Hadoop 2.02 multiple merges in map phase

Posted by - <co...@ymail.com>.
Correction*: version is 2.2.0

Thanks!



On Friday, November 8, 2013 4:03 PM, - <co...@ymail.com> wrote:
 
Hi All,

In my map logs I see the "Merging 2 sorted segments" message twice and after each message it takes time to merge. Why is it merging twice? And where does the merge sizes "69954754" and "69631730" come from?

My expected result would be a singe merge of 2 sorted segments with a total size of filesplit size (which is 128MB in my case).
Thanks!

...
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0
INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 1
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69954754 bytes
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69631730 bytes
INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1383580088156_0010_m_000035_0 is done. And is in the process of committing
INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1383580088156_0010_m_000035_0' done.
...

Re: Hadoop 2.02 multiple merges in map phase

Posted by - <co...@ymail.com>.
Correction*: version is 2.2.0

Thanks!



On Friday, November 8, 2013 4:03 PM, - <co...@ymail.com> wrote:
 
Hi All,

In my map logs I see the "Merging 2 sorted segments" message twice and after each message it takes time to merge. Why is it merging twice? And where does the merge sizes "69954754" and "69631730" come from?

My expected result would be a singe merge of 2 sorted segments with a total size of filesplit size (which is 128MB in my case).
Thanks!

...
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0
INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 1
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69954754 bytes
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69631730 bytes
INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1383580088156_0010_m_000035_0 is done. And is in the process of committing
INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1383580088156_0010_m_000035_0' done.
...

Re: Hadoop 2.02 multiple merges in map phase

Posted by - <co...@ymail.com>.
Correction*: version is 2.2.0

Thanks!



On Friday, November 8, 2013 4:03 PM, - <co...@ymail.com> wrote:
 
Hi All,

In my map logs I see the "Merging 2 sorted segments" message twice and after each message it takes time to merge. Why is it merging twice? And where does the merge sizes "69954754" and "69631730" come from?

My expected result would be a singe merge of 2 sorted segments with a total size of filesplit size (which is 128MB in my case).
Thanks!

...
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0
INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 1
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69954754 bytes
INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 69631730 bytes
INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1383580088156_0010_m_000035_0 is done. And is in the process of committing
INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1383580088156_0010_m_000035_0' done.
...