You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tez.apache.org by Jonathan Eagles <je...@gmail.com> on 2016/01/20 23:06:34 UTC

[DISCUSS] Merge OOM improvements

A new pattern of pig jobs that succeed with MR and fail with Tez have been
identified. Hoping to brainstorm ideas so we can identify issues and file
target jiras.

Here is a typical stack trace, though sometimes it occurs with final merge
(since in-memory segment overhead > mapout overhead)

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.io.DataInputBuffer.<init>(DataInputBuffer.java:68)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.<init>(InMemoryReader.java:42)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.createInMemorySegments(MergeManager.java:837)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.access$200(MergeManager.java:75)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$InMemoryMerger.merge(MergeManager.java:642)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)

*Details*
  around 1,000,000 spills were fetched committing around 100MB to the
memory budget (500,000 in memory). However, actual memory used for 500,000
segments (50-350 bytes) is 480MB (expected 100-200MB)

*Mapout overhead is not budgeted*
  Each mapoutput needs around 50 bytes in addition to the data

*In Memory Segment overhead is not budgeted*
  Each In memory segment needs around 80 bytes in addition to the data

*Interaction with auto reduce parallelism*
  In this scenario, the upstream vertex was assuming 999 (pig's default
hint to use auto-reduce parallelism) downstream tasks. However, was reduced
to 24 due to auto-reduce parallelism. This is putting 40 times more
segments per downstream task. Should auto-reduce parallelism consider merge
overhead when calculating parallelism?

*Legacy Default Sorter Empty Segment*
  Default sorter does not optimize empty segments like pipeline sorter does
and shows this symptom more.


2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7]
|orderedgrouped.MergeManager|: closeInMemoryFile -&gt; map-output of size:
116, inMemoryMapOutputs.size() -&gt; 571831, commitMemory -&gt; 91503730,
usedMemory -&gt;91503846, mapOutput=MapOutput( AttemptIdentifier:
InputAttemptIdentifier [inputIdentifier=Input
Identifier [inputIndex=763962], attemptNumber=0,
pathComponent=attempt_1444791925832_10460712_1_00_017766_0_10003,
spillType=0, spillId=-1], Type: MEMORY)
2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7]
|orderedgrouped.ShuffleScheduler|: Completed fetch for attempt: {763962, 0,
attempt_1444791925832_10460712_1_00_017766_0_10003} to MEMORY, csize=128,
dsize=116, EndTime=1452426361208, TimeTaken=0, Rate=0.00 MB/s
2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7]
|orderedgrouped.ShuffleScheduler|: scope_601: All inputs fetched for input
vertex : scope-601
2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7]
|orderedgrouped.ShuffleScheduler|: copy(1091856 (spillsFetched=1091856) of
1091856. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.68
MB/s)

Re: [DISCUSS] Merge OOM improvements

Posted by Jonathan Eagles <je...@gmail.com>.

Thanks for the recommendation. I can definitely run this job with the
proposed setting. In addition, I created a patch in
https://issues.apache.org/jira/browse/TEZ-3076 that reduces the memory need
in MapOutput and InMemoryReader that allows this job to run without the
need for tez.runtime.shuffle.memory-to-memory.enable setting enabled. I'll
update the jira with the overall reduction per mapoutput entry and
inmemoryreader.

Please have a look.

Jon

On Wed, Jan 20, 2016 at 5:18 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

>
> >  around 1,000,000 spills were fetched committing around 100MB to the
> >memory budget (500,000 in memory). However, actual memory used for 500,000
> >segments (50-350 bytes) is 480MB (expected 100-200MB)
>
> This is effectively the problem the mem2merger solves - but is not enabled
> by default.
>
> I noticed that this build up of >100 segment in-memory is generally a bad
> thing and merging it back into 1 segment in-memory was a significant boost
> to perf when producing the iterators for the reducers.
>
> can you re-run the scenario with in-mem merge enabled with an
> io.sort.factor = 100 ?
>
> Cheers,
> Gopal
>
>
>

Re: [DISCUSS] Merge OOM improvements

Posted by Gopal Vijayaraghavan <go...@apache.org>.

>  around 1,000,000 spills were fetched committing around 100MB to the
>memory budget (500,000 in memory). However, actual memory used for 500,000
>segments (50-350 bytes) is 480MB (expected 100-200MB)

This is effectively the problem the mem2merger solves - but is not enabled
by default.

I noticed that this build up of >100 segment in-memory is generally a bad
thing and merging it back into 1 segment in-memory was a significant boost
to perf when producing the iterators for the reducers.

can you re-run the scenario with in-mem merge enabled with an
io.sort.factor = 100 ?

Cheers,
Gopal