You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by af...@163.com on 2018/11/22 03:51:25 UTC

Re: Review Request 68883: ATLAS-2893: Atlas Column Lineage of Hive Hook to support the hive old version

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68883/#review210780
-----------------------------------------------------------


Ship it!




Ship It!

- aflyary


On Sept. 29, 2018, 7:04 a.m., aflyary wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68883/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2018, 7:04 a.m.)
> 
> 
> Review request for atlas, Apoorv Naik, Ashutosh Mestry, keval bhatt, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-2893
>     https://issues.apache.org/jira/browse/ATLAS-2893
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Right now atlas hook for apache hive of column lineage is limited, only "Column level lineage works with Hive version 1.2.1 after the patch for HIVE-13112 is applied to Hive source" could support the hive column lineage. In some product env, hive version still very old, but we should also support the hive column lineage.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java ae01d504d 
> 
> 
> Diff: https://reviews.apache.org/r/68883/diff/1/
> 
> 
> Testing
> -------
> 
> I refer to the hive source code, when we set the altas hook class at the hive-site.xml with the name 'org.apache.atlas.hive.hook.HiveHook', hive optimizer would not set the linegae info into the hookContext, after the patch HIVE-13112 already fix this. But early hive version still could not work, so i go ahead and dive into the hive lineageState of SessionState, maybe we could get the private info of the LineageCtx to work out the lineage dependency,  then get the column lineage.
>    
>    Under the Atlas hive-bridge module, i modified the CreateHiveProcess Class, and added the method to get the column lineage from the hive lineageState of SessionState, it could work with the pre-released hive version before hive-1.2.1. I test the code with the hive version 1.2.0 and 0.12.0, it could get the hive column lineage correctly.
>    
>    The hive sql which i test are below:
> 
>    CREATE TABLE table1(id int, name string, age int, address string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
>    CREATE TABLE table2 AS SELECT id,name,age,address FROM table1;
> 
> 
> Thanks,
> 
> aflyary
> 
>