You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Vimal Sharma <vi...@hortonworks.com> on 2016/09/20 06:49:45 UTC

Review Request 52077: Column level lineage in Hive

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/
-----------------------------------------------------------

Review request for atlas.


Repository: atlas


Description
-------

After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.


Diffs
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 

Diff: https://reviews.apache.org/r/52077/diff/


Testing
-------


Thanks,

Vimal Sharma


Re: Review Request 52077: Column level lineage in Hive

Posted by Vimal Sharma <vi...@hortonworks.com>.

> On Sept. 21, 2016, 9:01 p.m., Suma Shivaprasad wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java, line 628
> > <https://reviews.apache.org/r/52077/diff/2/?file=1505618#file1505618line628>
> >
> >     shouldnt the lineage process refer to the same set of column instances which are already part of a table reference. Why are we recreating them?

Column Referenceable instances are not re-created. Column Referenceable instances which are part of table reference are stored in Map columnQNameToRef and then this map is used in function createColumnLineageObjects to set column lineage process.

The name createColumnLineageObjects is misleading though. I have changed it to createColumnLineageProcessInstances


- Vimal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149897
-----------------------------------------------------------


On Sept. 20, 2016, 9:07 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2016, 9:07 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149897
-----------------------------------------------------------




addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 628)
<https://reviews.apache.org/r/52077/#comment217664>

    shouldnt the lineage process refer to the same set of column instances which are already part of a table reference. Why are we recreating them?


- Suma Shivaprasad


On Sept. 20, 2016, 9:07 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2016, 9:07 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review150215
-----------------------------------------------------------




addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java (line 101)
<https://reviews.apache.org/r/52077/#comment218104>

    should be replaced with HMSB.getTableQualifiedName


- Suma Shivaprasad


On Sept. 22, 2016, 11:53 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 22, 2016, 11:53 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Vimal Sharma <vi...@hortonworks.com>.

> On Sept. 26, 2016, 10:29 a.m., Shwetha GS wrote:
> > 1. Can you add a test with lineage query on column?
> > 2. ReservedTypesRegistrar should change to updateTypes, so that upgrades work
> > 3. Once you make sure the tests work, disable the tests so that tests don't break with apache hive 1.2

Added test with lineage API query
ReservedTypesRegistrar  changes will be addressed in ATLAS-1184. I will be uploading patch for ATLAS-1184 shortly
Disabled the test after verifying its working


- Vimal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review150383
-----------------------------------------------------------


On Sept. 26, 2016, 1:06 p.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 26, 2016, 1:06 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-1184 and ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-1184
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Shwetha GS <ss...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review150383
-----------------------------------------------------------



1. Can you add a test with lineage query on column?
2. ReservedTypesRegistrar should change to updateTypes, so that upgrades work
3. Once you make sure the tests work, disable the tests so that tests don't break with apache hive 1.2


addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java (line 334)
<https://reviews.apache.org/r/52077/#comment218361>

    rename to query/command as this class type is also process


- Shwetha GS


On Sept. 22, 2016, 11:53 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 22, 2016, 11:53 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Shwetha GS <ss...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review150990
-----------------------------------------------------------


Ship it!




Ship It!

- Shwetha GS


On Sept. 29, 2016, 8:15 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2016, 8:15 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-1184 and ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-1184
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Vimal Sharma <vi...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/
-----------------------------------------------------------

(Updated Sept. 29, 2016, 8:15 a.m.)


Review request for atlas.


Changes
-------

Addressed Shwetha's review comments. I think it would make sense to address Type update changes in ATLAS-1184. Marked ATLAS-1184 as required for this patch


Bugs: ATLAS-1184 and ATLAS-247
    https://issues.apache.org/jira/browse/ATLAS-1184
    https://issues.apache.org/jira/browse/ATLAS-247


Repository: atlas


Description
-------

After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.


Diffs (updated)
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 

Diff: https://reviews.apache.org/r/52077/diff/


Testing
-------


Thanks,

Vimal Sharma


Re: Review Request 52077: Column level lineage in Hive

Posted by Shwetha GS <ss...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review150821
-----------------------------------------------------------


Fix it, then Ship it!





addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 634)
<https://reviews.apache.org/r/52077/#comment218885>

    change to warn as we continue even without it



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 1049)
<https://reviews.apache.org/r/52077/#comment218887>

    remove toString



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 1051)
<https://reviews.apache.org/r/52077/#comment218886>

    Change to warn



addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java (line 1110)
<https://reviews.apache.org/r/52077/#comment218891>

    Add a comment on why its disabled and when the test can be enabled



addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java (line 1163)
<https://reviews.apache.org/r/52077/#comment218892>

    add assert that vertices contains a_guid and b_guid



addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java (line 1169)
<https://reviews.apache.org/r/52077/#comment218893>

    Add assert that vertices contains sourceTableGUID


- Shwetha GS


On Sept. 26, 2016, 1:06 p.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 26, 2016, 1:06 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-1184 and ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-1184
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Vimal Sharma <vi...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/
-----------------------------------------------------------

(Updated Sept. 26, 2016, 1:06 p.m.)


Review request for atlas.


Changes
-------

Addressed Shwetha's review comments. I think it would make sense to address Type update changes in ATLAS-1184. Marked ATLAS-1184 as required for this patch.


Bugs: ATLAS-1184 and ATLAS-247
    https://issues.apache.org/jira/browse/ATLAS-1184
    https://issues.apache.org/jira/browse/ATLAS-247


Repository: atlas


Description
-------

After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.


Diffs (updated)
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 

Diff: https://reviews.apache.org/r/52077/diff/


Testing
-------


Thanks,

Vimal Sharma


Re: Review Request 52077: Column level lineage in Hive

Posted by Vimal Sharma <vi...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/
-----------------------------------------------------------

(Updated Sept. 22, 2016, 11:53 a.m.)


Review request for atlas.


Changes
-------

Fixed review comments from Shwetha and Suma. Added try catch to make sure that Atlas doesn't fail to register CTAS process when Column lineage can't be set due to absence of lineage data from Hive Hook.


Bugs: ATLAS-247
    https://issues.apache.org/jira/browse/ATLAS-247


Repository: atlas


Description
-------

After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.


Diffs (updated)
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 

Diff: https://reviews.apache.org/r/52077/diff/


Testing
-------


Thanks,

Vimal Sharma


Re: Review Request 52077: Column level lineage in Hive

Posted by Suma Shivaprasad <su...@gmail.com>.

> On Sept. 21, 2016, 8:58 p.m., Suma Shivaprasad wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java, line 97
> > <https://reviews.apache.org/r/52077/diff/2/?file=1505617#file1505617line97>
> >
> >     why is column qualifiedName different from the convention we are using for hive_column instances which are referred to from the table. Why is clustername removed?
> 
> Vimal Sharma wrote:
>     Cluster information is not available in Lineage information provided by Hive. Further, qualifiedName used in this patch is used only while setting column lineage and is not used for communication with rest of Atlas codebase.
> 
> Suma Shivaprasad wrote:
>     If we do not provide the same qualifiedName as in the current HMSB.getColumnQualifiedName() , it will result in a another entity being created for the columns. Cluster information is available in HMSB.getClusterName()
> 
> Vimal Sharma wrote:
>     In the function populateColumnReferenceableMap, we are setting a mapping from column string identifier(named as column qualified name) to its corresponding column Referenceable object in Atlas. No new column Referenceable entity is created. 
>     
>     Further, in buildLineageMap, we are setting a mapping from destination column qualified name to list of source column qualified names. Now, in the key value pairs of the type (LineageInfo.DependencyKey, LineageInfo.Dependency) in LineageInfo from Hive, there is no cluster information available. So here we can't use the same pattern for column qualified name as used in HMSB.getColumnQualifiedName.
>     
>     If we set column string identifier as HMSB.getColumnQualifiedName in function populateColumnReferenceableMap, we won't be able to access the column referenceable objects from the map(created in populateColumnReferenceableMap) in HiveHook when we are setting up column lineage process in function createColumnLineageProcessInstances(lines 803 and 812).

Sounds good


- Suma


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149896
-----------------------------------------------------------


On Sept. 29, 2016, 8:15 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2016, 8:15 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-1184 and ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-1184
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Suma Shivaprasad <su...@gmail.com>.

> On Sept. 21, 2016, 8:58 p.m., Suma Shivaprasad wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java, line 97
> > <https://reviews.apache.org/r/52077/diff/2/?file=1505617#file1505617line97>
> >
> >     why is column qualifiedName different from the convention we are using for hive_column instances which are referred to from the table. Why is clustername removed?
> 
> Vimal Sharma wrote:
>     Cluster information is not available in Lineage information provided by Hive. Further, qualifiedName used in this patch is used only while setting column lineage and is not used for communication with rest of Atlas codebase.

If we do not provide the same qualifiedName as in the current HMSB.getColumnQualifiedName() , it will result in a another entity being created for the columns. Cluster information is available in HMSB.getClusterName()


- Suma


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149896
-----------------------------------------------------------


On Sept. 22, 2016, 11:53 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 22, 2016, 11:53 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Vimal Sharma <vi...@hortonworks.com>.

> On Sept. 21, 2016, 8:58 p.m., Suma Shivaprasad wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java, line 97
> > <https://reviews.apache.org/r/52077/diff/2/?file=1505617#file1505617line97>
> >
> >     why is column qualifiedName different from the convention we are using for hive_column instances which are referred to from the table. Why is clustername removed?
> 
> Vimal Sharma wrote:
>     Cluster information is not available in Lineage information provided by Hive. Further, qualifiedName used in this patch is used only while setting column lineage and is not used for communication with rest of Atlas codebase.
> 
> Suma Shivaprasad wrote:
>     If we do not provide the same qualifiedName as in the current HMSB.getColumnQualifiedName() , it will result in a another entity being created for the columns. Cluster information is available in HMSB.getClusterName()

In the function populateColumnReferenceableMap, we are setting a mapping from column string identifier(named as column qualified name) to its corresponding column Referenceable object in Atlas. No new column Referenceable entity is created. 

Further, in buildLineageMap, we are setting a mapping from destination column qualified name to list of source column qualified names. Now, in the key value pairs of the type (LineageInfo.DependencyKey, LineageInfo.Dependency) in LineageInfo from Hive, there is no cluster information available. So here we can't use the same pattern for column qualified name as used in HMSB.getColumnQualifiedName.

If we set column string identifier as HMSB.getColumnQualifiedName in function populateColumnReferenceableMap, we won't be able to access the column referenceable objects from the map(created in populateColumnReferenceableMap) in HiveHook when we are setting up column lineage process in function createColumnLineageProcessInstances(lines 803 and 812).


- Vimal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149896
-----------------------------------------------------------


On Sept. 22, 2016, 11:53 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 22, 2016, 11:53 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Vimal Sharma <vi...@hortonworks.com>.

> On Sept. 21, 2016, 8:58 p.m., Suma Shivaprasad wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java, line 97
> > <https://reviews.apache.org/r/52077/diff/2/?file=1505617#file1505617line97>
> >
> >     why is column qualifiedName different from the convention we are using for hive_column instances which are referred to from the table. Why is clustername removed?

Cluster information is not available in Lineage information provided by Hive. Further, qualifiedName used in this patch is used only while setting column lineage and is not used for communication with rest of Atlas codebase.


- Vimal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149896
-----------------------------------------------------------


On Sept. 20, 2016, 9:07 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2016, 9:07 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149896
-----------------------------------------------------------




addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java (line 94)
<https://reviews.apache.org/r/52077/#comment217662>

    Use constant for "columns". Also can remove .getValuesMap .get should work on Referenceable



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java (line 97)
<https://reviews.apache.org/r/52077/#comment217663>

    why is column qualifiedName different from the convention we are using for hive_column instances which are referred to from the table. Why is clustername removed?


- Suma Shivaprasad


On Sept. 20, 2016, 9:07 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2016, 9:07 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Shwetha GS <ss...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/#review149804
-----------------------------------------------------------




addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java (line 73)
<https://reviews.apache.org/r/52077/#comment217525>

    Don't add toString() to arguments



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 634)
<https://reviews.apache.org/r/52077/#comment217532>

    Why have you changed Entity update request to entity create request? It should be update



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 819)
<https://reviews.apache.org/r/52077/#comment217533>

    e.key() is the output column name? e.key doesn't contain cluster name, so might get de-duped across other lineage processes.
    
    Qualified name for lineage process should be
    'qualified name of hive command process : column name'.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 1044)
<https://reviews.apache.org/r/52077/#comment217522>

    Use syntax: LOG.debug("Column Lineage Map  - {}", this.lineageInfo.entrySet()); so that toString() is not evaluated if debug log level is not enabled



addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java (line 1121)
<https://reviews.apache.org/r/52077/#comment217523>

    rename to testColumnLevelLineage



addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java (line 1143)
<https://reviews.apache.org/r/52077/#comment217524>

    Also assert on lineage API response on columns


- Shwetha GS


On Sept. 20, 2016, 9:07 a.m., Vimal Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52077/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2016, 9:07 a.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-247
>     https://issues.apache.org/jira/browse/ATLAS-247
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 
> 
> Diff: https://reviews.apache.org/r/52077/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vimal Sharma
> 
>


Re: Review Request 52077: Column level lineage in Hive

Posted by Vimal Sharma <vi...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52077/
-----------------------------------------------------------

(Updated Sept. 20, 2016, 9:07 a.m.)


Review request for atlas.


Changes
-------

Newly added file was not available in the previous patch. Updating the patch


Bugs: ATLAS-247
    https://issues.apache.org/jira/browse/ATLAS-247


Repository: atlas


Description
-------

After a CTAS query, lineage relationship between source columns and destination column can be captured. This information can be used to create a column lineage process.


Diffs (updated)
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/ColumnLineageUtils.java PRE-CREATION 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java a3464a0 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataModelGenerator.java 45f0bc9 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java e094cb6 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java a5838b4 

Diff: https://reviews.apache.org/r/52077/diff/


Testing
-------


Thanks,

Vimal Sharma