You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Suma Shivaprasad <su...@gmail.com> on 2016/04/06 01:54:19 UTC

Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
-----------------------------------------------------------

Review request for atlas.


Bugs: ATLAS-527
    https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
-------

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.


Diffs
-----

  addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
  addons/hive-bridge/pom.xml e125f18 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
  addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 0a04c5f 
  repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
-------


Thanks,

Suma Shivaprasad


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Shwetha GS <ss...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/#review127565
-----------------------------------------------------------


Ship it!




Ship It!

- Shwetha GS


On April 6, 2016, 7:08 p.m., Suma Shivaprasad wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45784/
> -----------------------------------------------------------
> 
> (Updated April 6, 2016, 7:08 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-527
>     https://issues.apache.org/jira/browse/ATLAS-527
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Added support to track lineage between HDFS Paths and hive tables  in 
> 
> a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
> d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.
> 
> Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order
> 
> 
> Diffs
> -----
> 
>   addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
>   addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
>   addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
>   client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
>   repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 
> 
> Diff: https://reviews.apache.org/r/45784/diff/
> 
> 
> Testing
> -------
> 
> Added tests in HiveHookIT
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
-----------------------------------------------------------

(Updated April 6, 2016, 7:08 p.m.)


Review request for atlas.


Changes
-------

Removed clusterName attribute since this may be incorrect


Bugs: ATLAS-527
    https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
-------

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order


Diffs (updated)
-----

  addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
  addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
  client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
  repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
-------

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
-----------------------------------------------------------

(Updated April 6, 2016, 6:10 p.m.)


Review request for atlas.


Bugs: ATLAS-527
    https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
-------

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order


Diffs (updated)
-----

  addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
  addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
  client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
  repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
-------

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
-----------------------------------------------------------

(Updated April 6, 2016, 6:08 p.m.)


Review request for atlas.


Changes
-------

Removed extra constants from DMS


Bugs: ATLAS-527
    https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
-------

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order


Diffs (updated)
-----

  addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
  addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
  client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
  repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
-------

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
-----------------------------------------------------------

(Updated April 6, 2016, 5:44 p.m.)


Review request for atlas.


Changes
-------

Fixed review comments


Bugs: ATLAS-527
    https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
-------

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order


Diffs (updated)
-----

  addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
  addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
  client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 0a04c5f 
  repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
-------

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
-----------------------------------------------------------

(Updated April 6, 2016, 5:03 p.m.)


Review request for atlas.


Bugs: ATLAS-527
    https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
-------

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order


Diffs (updated)
-----

  addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
  addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
  client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
  repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
-------

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.

> On April 6, 2016, 11:21 a.m., Shwetha GS wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java, line 519
> > <https://reviews.apache.org/r/45784/diff/1/?file=1327185#file1327185line519>
> >
> >     Aren't there cases where input/output is local fs, for example load from local path?

I am filtering out the cases where it is LOCAL_DIR  by checking getType = DFS_DIR and theres also test case for LOAD local DIR and INSERT into local dir which confirms that this case is addressed. You are suggesting we ignore local dirs right?


> On April 6, 2016, 11:21 a.m., Shwetha GS wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java, line 558
> > <https://reviews.apache.org/r/45784/diff/1/?file=1327185#file1327185line558>
> >
> >     This should be part of HiveMetaStoreBridge and should be used in import-hive as well?
> >     
> >     Because this lineage will be created in import-hive, process name should be just tablename for create table so that its created just once.

Initially this was my thought too. However not sure how to get the query for the create table itself. I checked how show create table constructs this  and it is on the fly and it does not store in metadata. Also, if we dont address this, tt will look different from the other lineages where we will always hav the query in the process . So did nto want to address this now till we figure out how we can construct the query itself. Created a separate issue to track this - https://issues.apache.org/jira/browse/ATLAS-642


- Suma


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/#review127310
-----------------------------------------------------------


On April 5, 2016, 11:58 p.m., Suma Shivaprasad wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45784/
> -----------------------------------------------------------
> 
> (Updated April 5, 2016, 11:58 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-527
>     https://issues.apache.org/jira/browse/ATLAS-527
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Added support to track lineage between HDFS Paths and hive tables  in 
> 
> a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
> d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.
> 
> Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order
> 
> 
> Diffs
> -----
> 
>   addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
>   addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
>   addons/hive-bridge/pom.xml e125f18 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
>   addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 0a04c5f 
>   repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 
> 
> Diff: https://reviews.apache.org/r/45784/diff/
> 
> 
> Testing
> -------
> 
> Added tests in HiveHookIT
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.

> On April 6, 2016, 11:21 a.m., Shwetha GS wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java, line 480
> > <https://reviews.apache.org/r/45784/diff/1/?file=1327184#file1327184line480>
> >
> >     We need to fix the clusterName mess later - can't pickup hdfs clustername from hive conf

Have removed it for now since we dont know the right clusterName


- Suma


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/#review127310
-----------------------------------------------------------


On April 6, 2016, 7:08 p.m., Suma Shivaprasad wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45784/
> -----------------------------------------------------------
> 
> (Updated April 6, 2016, 7:08 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-527
>     https://issues.apache.org/jira/browse/ATLAS-527
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Added support to track lineage between HDFS Paths and hive tables  in 
> 
> a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
> d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.
> 
> Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order
> 
> 
> Diffs
> -----
> 
>   addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
>   addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
>   addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
>   client/src/main/java/org/apache/atlas/AtlasClient.java c3b4ba9 
>   repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 
> 
> Diff: https://reviews.apache.org/r/45784/diff/
> 
> 
> Testing
> -------
> 
> Added tests in HiveHookIT
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Shwetha GS <ss...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/#review127310
-----------------------------------------------------------




addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java (line 472)
<https://reviews.apache.org/r/45784/#comment190571>

    We need to fix the clusterName mess later - can't pickup hdfs clustername from hive conf



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 220)
<https://reviews.apache.org/r/45784/#comment190572>

    Earlier one was more readable. You can use set methods instead of this long constructor?



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 454)
<https://reviews.apache.org/r/45784/#comment190574>

    Aren't there cases where input/output is local fs, for example load from local path?



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java (line 493)
<https://reviews.apache.org/r/45784/#comment190573>

    This should be part of HiveMetaStoreBridge and should be used in import-hive as well?
    
    Because this lineage will be created in import-hive, process name should be just tablename for create table so that its created just once.



repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java (line 165)
<https://reviews.apache.org/r/45784/#comment190575>

    Use these in Process type definition. 
    
    Actually, these should be in AtlasClient?


- Shwetha GS


On April 5, 2016, 11:58 p.m., Suma Shivaprasad wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45784/
> -----------------------------------------------------------
> 
> (Updated April 5, 2016, 11:58 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-527
>     https://issues.apache.org/jira/browse/ATLAS-527
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Added support to track lineage between HDFS Paths and hive tables  in 
> 
> a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
> c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
> d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.
> 
> Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order
> 
> 
> Diffs
> -----
> 
>   addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
>   addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
>   addons/hive-bridge/pom.xml e125f18 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
>   addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
>   repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 0a04c5f 
>   repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 
> 
> Diff: https://reviews.apache.org/r/45784/diff/
> 
> 
> Testing
> -------
> 
> Added tests in HiveHookIT
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
-----------------------------------------------------------

(Updated April 5, 2016, 11:58 p.m.)


Review request for atlas.


Bugs: ATLAS-527
    https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description (updated)
-------

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.

Also changed the ordering of model registration by sorting them by modifiedTime to ensure they are registered in correct order


Diffs
-----

  addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
  addons/hive-bridge/pom.xml e125f18 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
  addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 0a04c5f 
  repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing
-------

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad


Re: Review Request 45784: Hve Hook - Support tracking lineage for External Tables( Create/alter) , Load, import, export

Posted by Suma Shivaprasad <su...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45784/
-----------------------------------------------------------

(Updated April 5, 2016, 11:54 p.m.)


Review request for atlas.


Bugs: ATLAS-527
    https://issues.apache.org/jira/browse/ATLAS-527


Repository: atlas


Description
-------

Added support to track lineage between HDFS Paths and hive tables  in 

a. LOAD( at table, partition level) - input is a HDFS path and output is table( even though we dont create partition entities, we are still tracking the lineage at table level for partitions. This could be an issue if there are large number of partition queries which is not being addressed in this jira - https://issues.apache.org/jira/browse/ATLAS-619) . refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
b. IMPORT, EXPORT to and from hdfs paths - Refer https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
c. CREATE EXTERNAL TABLE - input is hdfs path and o/p is table
d. ALTER TABLE LOCATION for an external table - input is the new hdfs path and o/p is the table.


Diffs
-----

  addons/hdfs-model/src/main/java/org/apache/atlas/fs/model/FSDataModelGenerator.java 555d565 
  addons/hdfs-model/src/main/scala/org/apache/atlas/fs/model/FSDataModel.scala c964f73 
  addons/hive-bridge/pom.xml e125f18 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 3a802d7 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 68e32ff 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e17afb8 
  addons/storm-bridge/src/main/java/org/apache/atlas/storm/hook/StormAtlasHook.java 5665856 
  repository/src/main/java/org/apache/atlas/services/DefaultMetadataService.java 0a04c5f 
  repository/src/main/java/org/apache/atlas/services/ReservedTypesRegistrar.java 430bb6b 

Diff: https://reviews.apache.org/r/45784/diff/


Testing (updated)
-------

Added tests in HiveHookIT


Thanks,

Suma Shivaprasad