You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2013/08/13 20:38:52 UTC

[jira] [Commented] (TEZ-359) Allow splits to be specified via the vertex payload

    [ https://issues.apache.org/jira/browse/TEZ-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738639#comment-13738639 ] 

Bikas Saha commented on TEZ-359:
--------------------------------

This sounds good but what is the problem we are trying to solve? e.g. Is it saving time to localize the file?
There are some downsides to this.
1) Bigger size of user payload being sent over RPC causing network congestion on the AM side. TEZ-307 tracks sending user payload as local resource for bigger user payloads. How big can the split information be?
2) Its currently memory efficient on the AM to have 1 copy of the user payload on the vertex thats shared across all its task attempts. If each task attempt now has a different user payload then that may use up significant memory.
                
> Allow splits to be specified via the vertex payload
> ---------------------------------------------------
>
>                 Key: TEZ-359
>                 URL: https://issues.apache.org/jira/browse/TEZ-359
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>
> Hive should be able to specify splits as part of the Vertex payload.
> Instead of tasks expecting the entire split file to be localized for each and every task, the AM could read the user payload and send specific split information to individual tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira