You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Gábor Hermann <ma...@gaborhermann.com> on 2016/11/16 16:29:12 UTC

Any way to increase sort buffer size?

Hi all,

Is there any way to increase the sort buffer size other than increasing 
the overall TaskManager memory?
The following error comes up running a job with huge matrix block 
objects on a cluster:

Error obtaining the sorted input: Thread 'SortMerger Reading Thread' 
terminated due to an exception: The record exceeds the maximum size of a 
sort buffer (current maximum: 100499456 bytes).

Every TM has at least 40 GB of memory while the maximum sort buffer size 
is at 100 MB. What is the reason for this limit? Sorry if I'm missing 
something, but I have not found any related discussion or documentation yet.

Cheers,
Gabor


Re: Any way to increase sort buffer size?

Posted by Stephan Ewen <se...@apache.org>.
Posting an update here, because it came up again:

Have a look at https://issues.apache.org/jira/browse/FLINK-17192 specifically
this comment:

> There is a hidden/experimental feature in the sorter to offload large
records, but it is not active by default.
>
> You can try and add "taskmanager.runtime.large-record-handler: true" to
the config.
>
> The reason why it is a "hidden feature" is that it has some restrictions:
The key must be serializable by Flink's default serializers and
> regonized by the TypeExtractor, meaning you cannot use a custom
serializer or a specific type information.
>
> For keys that are string, int, long, arrays, simple POJOs etc. it should
be fine. For keys that are Avro types with a specific schema,
> or types with custom serializers (including custom Kryo serializers) it
might not work.

Re: Any way to increase sort buffer size?

Posted by Gábor Hermann <ma...@gaborhermann.com>.
Hi Fabian,

Thanks for your answer!

I see that it's not a lightweight change. I guess it's easier if I find 
a workaround for using smaller objects.

Cheers,
Gabor


On 2016-11-18 11:02, Fabian Hueske wrote:
> Hi Gabor,
>
> I don't think there is a way to tune the memory settings for specific 
> operators.
>
> For that you would need to change the memory allocation in the 
> optimizers, which is possible but not a lightweight change either.
> If you want to get something working, you could add a method to the 
> API to manually specify a memory fraction. The information could be 
> passed through the API to the optimizer which takes the explicitly 
> specified fraction into account when assigning memory budgets (see 
> Optimizer.java [1] and PlanFinalizer.java [2]) for how that works.
>
> Cheers,
> Fabian
>
> [1] 
> https://github.com/apache/flink/blob/master/flink-optimizer/src/main/java/org/apache/flink/optimizer/Optimizer.java 
>
> [2] 
> https://github.com/apache/flink/blob/master/flink-optimizer/src/main/java/org/apache/flink/optimizer/traversals/PlanFinalizer.java
>
> 2016-11-16 17:29 GMT+01:00 G�bor Hermann <mail@gaborhermann.com 
> <ma...@gaborhermann.com>>:
>
>     Hi all,
>
>     Is there any way to increase the sort buffer size other than
>     increasing the overall TaskManager memory?
>     The following error comes up running a job with huge matrix block
>     objects on a cluster:
>
>     Error obtaining the sorted input: Thread 'SortMerger Reading
>     Thread' terminated due to an exception: The record exceeds the
>     maximum size of a sort buffer (current maximum: 100499456 bytes).
>
>     Every TM has at least 40 GB of memory while the maximum sort
>     buffer size is at 100 MB. What is the reason for this limit? Sorry
>     if I'm missing something, but I have not found any related
>     discussion or documentation yet.
>
>     Cheers,
>     Gabor
>
>


Re: Any way to increase sort buffer size?

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Gabor,

I don't think there is a way to tune the memory settings for specific
operators.

For that you would need to change the memory allocation in the optimizers,
which is possible but not a lightweight change either.
If you want to get something working, you could add a method to the API to
manually specify a memory fraction. The information could be passed through
the API to the optimizer which takes the explicitly specified fraction into
account when assigning memory budgets (see Optimizer.java [1] and
PlanFinalizer.java [2]) for how that works.

Cheers,
Fabian

[1]
https://github.com/apache/flink/blob/master/flink-optimizer/src/main/java/org/apache/flink/optimizer/Optimizer.java
[2]
https://github.com/apache/flink/blob/master/flink-optimizer/src/main/java/org/apache/flink/optimizer/traversals/PlanFinalizer.java

2016-11-16 17:29 GMT+01:00 Gábor Hermann <ma...@gaborhermann.com>:

> Hi all,
>
> Is there any way to increase the sort buffer size other than increasing
> the overall TaskManager memory?
> The following error comes up running a job with huge matrix block objects
> on a cluster:
>
> Error obtaining the sorted input: Thread 'SortMerger Reading Thread'
> terminated due to an exception: The record exceeds the maximum size of a
> sort buffer (current maximum: 100499456 bytes).
>
> Every TM has at least 40 GB of memory while the maximum sort buffer size
> is at 100 MB. What is the reason for this limit? Sorry if I'm missing
> something, but I have not found any related discussion or documentation yet.
>
> Cheers,
> Gabor
>
>