You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Turner Eagles (Jira)" <ji...@apache.org> on 2021/04/23 21:29:00 UTC

[jira] [Commented] (TEZ-4245) Optimise split grouping when locality information is set to null/empty

    [ https://issues.apache.org/jira/browse/TEZ-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331064#comment-17331064 ] 

Jonathan Turner Eagles commented on TEZ-4245:
---------------------------------------------

[~rajesh.balamohan], is this patch ready for review?

> Optimise split grouping when locality information is set to null/empty
> ----------------------------------------------------------------------
>
>                 Key: TEZ-4245
>                 URL: https://issues.apache.org/jira/browse/TEZ-4245
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Priority: Major
>         Attachments: TEZ-4245.1.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In objectstores like S3, locality information always shows up as "localhost".  Having this information in inputsplit slows down scheduling as explained in https://issues.apache.org/jira/browse/HIVE-14060 Systems like hive remove "localhost" information from splits.
>  
> Split information without any locality information (localhost/null/empty) should be treated equally, so that split grouping can do meaningful grouping based on cluster size. This is to avoid creating small split groups, which can significantly increase runtime due to sequential processing (i.e same map task getting lots of inputs and system ends up spending time in open/seek/close on objectstores).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)