You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/04/01 06:45:00 UTC
[jira] [Commented] (TEZ-4130) Config for hard limiting the number
of splits
[ https://issues.apache.org/jira/browse/TEZ-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072437#comment-17072437 ]
László Bodor commented on TEZ-4130:
-----------------------------------
[~belugabehr]: I'm writing a unit test for this, the current implementation is not correct
the real grouping logic starts after maximizing desiredNumSplits, so there is a chance to get more splits finally than configured
> Config for hard limiting the number of splits
> ---------------------------------------------
>
> Key: TEZ-4130
> URL: https://issues.apache.org/jira/browse/TEZ-4130
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Attachments: TEZ-4130.01.patch, TEZ-4130.02.patch
>
>
> During the investigation of a customer issue, I found that tez generated a dag plan containing >4k tasks. It failed for hive because of bucket number limitations (4k). It can be configured properly, e.g. bigger splits (tez.grouping.min-size), but maybe it would be more convenient for users to config a hard limit for the number of splits.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)