You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by af...@163.com on 2018/09/29 06:56:03 UTC

Review Request 68883: ATLAS-2893?Atlas Column Lineage of Hive Hook to support the hive old version

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68883/
-----------------------------------------------------------

Review request for atlas.


Repository: atlas


Description
-------

Right now atlas hook for apache hive of column lineage is limited, only "Column level lineage works with Hive version 1.2.1 after the patch for HIVE-13112 is applied to Hive source" could support the hive column lineage. In some product env, hive version still very old, but we should also support the hive column lineage.


Diffs
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java ae01d504d 


Diff: https://reviews.apache.org/r/68883/diff/1/


Testing
-------

I refer to the hive source code, when we set the altas hook class at the hive-site.xml with the name 'org.apache.atlas.hive.hook.HiveHook', hive optimizer would not set the linegae info into the hookContext, after the patch HIVE-13112 already fix this. But early hive version still could not work, so i go ahead and dive into the hive lineageState of SessionState, maybe we could get the private info of the LineageCtx to work out the lineage dependency,  then get the column lineage.
   
   Under the Atlas hive-bridge module, i modified the CreateHiveProcess Class, and added the method to get the column lineage from the hive lineageState of SessionState, it could work with the pre-released hive version before hive-1.2.1. I test the code with the hive version 1.2.0 and 0.12.0, it could get the hive column lineage correctly.
   
   The hive sql which i test are below:

   CREATE TABLE table1(id int, name string, age int, address string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
   CREATE TABLE table2 AS SELECT id,name,age,address FROM table1;


Thanks,

aflyary


Re: Review Request 68883: ATLAS-2893: Atlas Column Lineage of Hive Hook to support the hive old version

Posted by af...@163.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68883/
-----------------------------------------------------------

(Updated Sept. 29, 2018, 6:56 a.m.)


Review request for atlas.


Summary (updated)
-----------------

ATLAS-2893: Atlas Column Lineage of Hive Hook to support the hive old version


Repository: atlas


Description
-------

Right now atlas hook for apache hive of column lineage is limited, only "Column level lineage works with Hive version 1.2.1 after the patch for HIVE-13112 is applied to Hive source" could support the hive column lineage. In some product env, hive version still very old, but we should also support the hive column lineage.


Diffs
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java ae01d504d 


Diff: https://reviews.apache.org/r/68883/diff/1/


Testing
-------

I refer to the hive source code, when we set the altas hook class at the hive-site.xml with the name 'org.apache.atlas.hive.hook.HiveHook', hive optimizer would not set the linegae info into the hookContext, after the patch HIVE-13112 already fix this. But early hive version still could not work, so i go ahead and dive into the hive lineageState of SessionState, maybe we could get the private info of the LineageCtx to work out the lineage dependency,  then get the column lineage.
   
   Under the Atlas hive-bridge module, i modified the CreateHiveProcess Class, and added the method to get the column lineage from the hive lineageState of SessionState, it could work with the pre-released hive version before hive-1.2.1. I test the code with the hive version 1.2.0 and 0.12.0, it could get the hive column lineage correctly.
   
   The hive sql which i test are below:

   CREATE TABLE table1(id int, name string, age int, address string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
   CREATE TABLE table2 AS SELECT id,name,age,address FROM table1;


Thanks,

aflyary