You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2017/12/05 03:33:00 UTC

[jira] [Comment Edited] (HIVE-18153) refactor reopen and file management in TezTask

    [ https://issues.apache.org/jira/browse/HIVE-18153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277959#comment-16277959 ] 

Sergey Shelukhin edited comment on HIVE-18153 at 12/5/17 3:32 AM:
------------------------------------------------------------------

This basically moves Hive resources out of the Tez scratch dir, and also removes the resource logic that is split between TezTask (that had most but not all of conf resources' logic) and TezSessionState (that had most by not all of the non-conf resources logic). The session now localizes all the resources and does all the checks, AM calls, and whatnot.
Additionally, resources are tracked in a separate POJO that can be reused on reopen. It keeps track both of the directory and of what has been localized into it; that can potentially reduce number of FS calls for conf-based resources that are currently just refreshed blindly on every use of the same session. 
https://reviews.apache.org/r/64324/

cc [~sseth] [~prasanth_j]


was (Author: sershe):
This basically moves Hive resources out of the Tez scratch dir, and also removes the resource logic that is split between TezTask (that had most but not all of conf resources' logic) and TezSessionState (that had most by not all of the non-conf resources logic). The session now localizes all the resources and does all the checks, AM calls, and whatnot.
Additionally, resources are tracked in a separate POJO that can be reused on reopen. It keeps track both of the directory and of what has been localized into it; that can potentially reduce number of FS calls for conf-based resources that are currently just refreshed blindly on every use of the same session. 

> refactor reopen and file management in TezTask
> ----------------------------------------------
>
>                 Key: HIVE-18153
>                 URL: https://issues.apache.org/jira/browse/HIVE-18153
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-18153.patch
>
>
> TezTask reopen relies on getting the same session object in terms of setup; WM reopen returns a new session from the pool. 
> The former has the advantage of not having to reupload files and stuff... but the object reuse results in a lot of ugly code, and also reopen might be slower on average with the session pool than just getting a session from the pool. Either WM needs to do the object-preserving reopen, or TezTask needs to be refactored. It looks like DAG would have to be rebuilt to do the latter because of some paths tied to a directory of the old session. Let me see if I can get around that; if not we can do the former; and then if the former results in too much ugly code in WM to account for object reuse for different Tez client I'd do the latter anyway since it's a failure path :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)