You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (Jira)" <ji...@apache.org> on 2020/07/06 12:35:00 UTC

[jira] [Commented] (HIVE-22301) Hive lineage is not generated for insert overwrite queries on partitioned tables

    [ https://issues.apache.org/jira/browse/HIVE-22301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151990#comment-17151990 ] 

Zoltan Haindrich commented on HIVE-22301:
-----------------------------------------

the [removal of these entities|https://github.com/apache/hive/blob/ae6976b0b0a4b04ea1b2ee5906a3980ce7e1cddd/ql/src/java/org/apache/hadoop/hive/ql/Executor.java#L480]  seems to be intentionally introduced in HIVE-1781

it seems like they were avoiding to change the "postexec" hooks...

> Hive lineage is not generated for insert overwrite queries on partitioned tables
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-22301
>                 URL: https://issues.apache.org/jira/browse/HIVE-22301
>             Project: Hive
>          Issue Type: Bug
>          Components: lineage
>    Affects Versions: 3.1.2
>            Reporter: Sidharth Kumar Mishra
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: ScreenShot HookContext.png, ScreenShot RunPostExecHook.png, ScreenShot runBeforeExecution.png
>
>
> Problem: When I run the below mentioned queries, the last query should have given the proper hive lineage info (through HookContext) from table_b to table_t.
>  * Create table table_t (id int) partitioned by (dob date);
>  * Create table table_b (id int) partitioned by (dob date);
>  * from table_b a insert overwrite table table_t select a.id,a.dob;
> Note : for CTAS query from a partitioned table , this issue is not seen. Only for insert queries like insert into <table> select * from <table> and query like above, issue is seen.
>  
> Technical Observations:
> At HookContext (passed from hive.ql.Driver to Hive Hook of Atlas through hookRunner.runPostExecHooks call) contains no outputs. Check below screenshot from IntelliJ.
> !ScreenShot RunPostExecHook.png|width=728,height=427!
>  
> I found that the PrivateHookContext is getting created with proper outputs value as shown below initially:
>   !ScreenShot HookContext.png|width=714,height=541!
> The same is passed properly to runBeforeExecutionHook as shown below:
> !ScreenShot runBeforeExecution.png|width=719,height=620!
>  
> Later when we pass HookContext to runPostExecHooks, there is no output populated. Kindly check the reason and let me know if you need any further information from my end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)