You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Attila Magyar (Jira)" <ji...@apache.org> on 2020/05/06 09:52:00 UTC

[jira] [Comment Edited] (TEZ-4170) RootInputInitializerManager could make use of ThreadPool from appContext

    [ https://issues.apache.org/jira/browse/TEZ-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100579#comment-17100579 ] 

Attila Magyar edited comment on TEZ-4170 at 5/6/20, 9:51 AM:
-------------------------------------------------------------

Hey [~rajesh.balamohan],

??Also, initialization part of InputInitializer could be moved inside this thread. For e.g, in certain cases like HiveSplitGenerator, it ends up with some heavy operations which can be offloaded from blocking central dispatcher thread (e.g unpacking payloads, running kryo deserialization)??

 

You mean moving the initializer = createInitializer(input, context); part to the appContex execturor's thread?

If I do that the initializerMap won't be populated immediately after runInputInitializers() returned. That might cause some problems in VertexImpl when handling pending events, right after the runInputInitializers was called.

If the problem is that instance creation is expensive in certain cases like HiveSplitGenerator, can we move the the expensive part out of HiveSplitGenerator's constructor, into HiveSplitGenerator's initialize() instead?

cc [~abstractdog]

 


was (Author: amagyar):
Hey [~rajesh.balamohan],

??Also, initialization part of InputInitializer could be moved inside this thread. For e.g, in certain cases like HiveSplitGenerator, it ends up with some heavy operations which can be offloaded from blocking central dispatcher thread (e.g unpacking payloads, running kryo deserialization)??

 

You mean moving the initializer = createInitializer(input, context); part to the appContex execturor's thread?

If I do that the initializerMap won't be populated immediately after runInputInitializers() returned. That might cause some problems in VertexImpl when handling pending events, right after the runInputInitializers was called.

If the problem is that instance creation is expensive in certain cases like HiveSplitGenerator, can we move the the expensive part out of HiveSplitGenerator's constructor, into HiveSplitGenerator's initialize() instead?

 

> RootInputInitializerManager could make use of ThreadPool from appContext
> ------------------------------------------------------------------------
>
>                 Key: TEZ-4170
>                 URL: https://issues.apache.org/jira/browse/TEZ-4170
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Attila Magyar
>            Priority: Major
>         Attachments: Screenshot 2020-05-06 at 6.26.34 AM.png, TEZ-4170.1.patch
>
>
> [https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/RootInputInitializerManager.java#L106]
>  
> This could make use of executor from {{appContext}} instead of spinning one for every root input.
>  
> Also, initialization part of InputInitializer could be moved inside this thread. For e.g, in certain cases like HiveSplitGenerator, it ends up with some heavy operations which can be offloaded from blocking central dispatcher thread (e.g unpacking payloads, running kryo deserialization)
>  
> !Screenshot 2020-05-06 at 6.26.34 AM.png|width=972,height=740!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)