You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Vimal Sharma <vi...@hortonworks.com> on 2017/02/01 06:23:04 UTC

Re: Using .7.1rc3 with import-hive.sh - Lineage is produced from EXTERNAL tables only - not MANAGED tables : by desing or bug ?

Hi Russell,
Can you please file a JIRA to document this as a limitation. The documentation will get updated in the next release.

I am not aware of the branch_intersect example are you referring to.

Thanks
Vimal




On 1/30/17, 4:52 PM, "Russell Anderson" <rg...@us.ibm.com> wrote:

>Hi Vimal
>
>This is very helpful understanding why lineage is not created.
>
>I appreciate your explanation and will follow through with the creation of a bug/enhancement against the import-hive
>
>I suggest at least two things be done immediately:
>
>1) document this as a limitation in the limitations section
>
>2) the example that is on-line for branch_intersect by Horton works be modified to only use external tables 
>
>Regards 
>
>Russ
>
>Sent from my iPhone
>
>> On Jan 30, 2017, at 12:04 AM, Vimal Sharma <vi...@hortonworks.com> wrote:
>> 
>> Hi Russell,
>> I responded to this question on HCC at https://community.hortonworks.com/questions/66547/hivemetastorebridge-code-only-creating-lineage-for.html#answer-70360.
>> 
>> When using import_hive.sh, lineage is created only for external tables. This is indeed by design. For external tables, it makes sense to mark the source HDFS path as the “source” node in lineage diagram.
>> 
>> For MANAGED tables, I am not sure how much value it adds to create lineage diagram since the source HDFS path will inherently be {HIVE_DATA_ROOT}/{TABLENAME}. 
>> 
>> 
>> For managed tables created using CTAS as shown below:
>> 
>> > create table dest as select * from source;
>> 
>> We don’t have corresponding lineage after import_hive.sh 
>> 
>> source —>  CTAS Process —> dest
>> 
>> This is because we don’t process the tables present in Hive metastore in a specific order which is necessary to get the above lineage. It would be a good improvement to the import_hive.sh utility and you can raise a bug to track it.
>> 
>> Hope this helps
>> - Vimal
>> 
>> 
>> From: Russell Anderson <rg...@us.ibm.com>
>> Date: Sunday, January 29, 2017 at 11:14 PM
>> To: "dev@atlas.incubator.apache.org" <de...@atlas.incubator.apache.org>
>> Cc: Ashutosh Mestry <am...@hortonworks.com>, Barry Rosen <ro...@us.ibm.com>, David Radley <da...@uk.ibm.com>, Madhan Neethiraj <ma...@apache.org>, Apoorv Naik <na...@gmail.com>, Sarath Subramanian <sa...@gmail.com>, default <vi...@hortonworks.com>, Russell Anderson <rg...@us.ibm.com>
>> Subject: Using .7.1rc3 with import-hive.sh - Lineage is produced from EXTERNAL tables only - not MANAGED tables : by desing or bug ?
>> 
>> Hi,
>> 
>> Using the latest .7.1rc3 source - after building and installing on test system I have found that 'lineage' is only generated from EXTERNAL tables and not from MANAGED tables.
>> 
>> I repeat 'lineage' - meaning the left to right flow. I get Metadata of the assets from MANAGED table but not left to right lineage.
>> 
>> I do get lineage from External tables.
>> 
>> Is this by design or is this a P1 bug?
>> 
>> In prior release there was a code fix around that area of the Hive Bridge that checks this, and I am wondering has this been re-introduced ?
>> 
>> If no one responds I will assume it is a bug, and will created one.
>> 
>> Regards,
>> 
>> Russ.
>> 
>> Russell Anderson---01/24/2017 04:05:11 PM---Hi, What used to work in .7rc2 no longer seems to work with the Hive Hook: [ see stack trace below f
>> 
>> From: Russell Anderson/Worcester/IBM
>> To: dev@atlas.incubator.apache.org
>> Cc: Ashutosh Mestry <am...@hortonworks.com>, Barry Rosen/Worcester/IBM@IBMUS, David Radley <da...@uk.ibm.com>, Madhan Neethiraj <ma...@apache.org>, Apoorv Naik <na...@gmail.com>, Sarath Subramanian <sa...@gmail.com>, "Vimal Sharma" <vi...@hortonworks.com>
>> Date: 01/24/2017 04:05 PM
>> Subject: Caused by: org.apache.hadoop.hive.ql.metadata.InvalidTableException - Table not found in Atlas .7.1rc3
>> 
>> 
>> Hi,
>> 
>> What used to work in .7rc2 no longer seems to work with the Hive Hook: [ see stack trace below from hiveserver2.log]
>> 
>> Looking at the code it cannot find the new table 'russ88' - this simple test case worked in the .7rc2 version.
>> 
>> I have complete permission to make this happen in the HIVEVIEW but somehow the Hive Hook cannot deal with it.
>> 
>> Any ideas?
>> 
>> 
>> 
>> 
>> 
>> 2017-01-24 12:30:25,917 INFO bridge.HiveMetaStoreBridge (HiveMetaStoreBridge.java:createOrUpdate\
>> DBInstance(166)) - Importing objects from databaseName : bigsql
>> 2017-01-24 12:30:25,917 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 5: get_\
>> table : db=bigsql tbl=russ88
>> 2017-01-24 12:30:25,917 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=h\
>> ive ip=unknown-ip-addr cmd=get_table : db=bigsql tbl=russ88
>> 2017-01-24 12:30:25,919 ERROR metadata.Hive (Hive.java:getTable(1119)) - Table russ88 not found: \
>> bigsql.russ88 table not found
>> 2017-01-24 12:30:25,920 ERROR hook.HiveHook (HiveHook.java:run(207)) - Atlas hook failed due to e\
>> rror
>> java.lang.reflect.UndeclaredThrowableException
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
>> at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:197)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found russ88
>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1120)
>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1090)
>> at org.apache.atlas.hive.hook.HiveHook.createOrUpdateEntities(HiveHook.java:559)
>> at org.apache.atlas.hive.hook.HiveHook.createOrUpdateEntities(HiveHook.java:581)
>> at org.apache.atlas.hive.hook.HiveHook.processHiveEntity(HiveHook.java:669)
>> at org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:649)
>> at org.apache.atlas.hive.hook.HiveHook.collect(HiveHook.java:270)
>> at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:85)
>> at org.apache.atlas.hive.hook.HiveHook$2$1.run(HiveHook.java:200)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> ... 6 more
>> 
>> 
>> 
>> Hemanth Yamijala ---01/23/2017 11:06:57 PM---Hi Russell, I am unable to see the exact error that you are facing - in case you attached an image o
>> 
>> From: Hemanth Yamijala <hy...@hortonworks.com>
>> To: "dev@atlas.incubator.apache.org" <de...@atlas.incubator.apache.org>
>> Cc: Apoorv Naik <na...@gmail.com>, Madhan Neethiraj <ma...@apache.org>, Ashutosh Mestry <am...@hortonworks.com>, Sarath Subramanian <sa...@gmail.com>, David Radley <da...@uk.ibm.com>, "Vimal Sharma" <vi...@hortonworks.com>, Barry Rosen/Worcester/IBM@IBMUS
>> Date: 01/23/2017 11:06 PM
>> Subject: Re: .7.1 rc3 - change in requirements at Run Time ?
>> 
>> 
>> 
>> Hi Russell,
>> 
>> 
>> I am unable to see the exact error that you are facing - in case you attached an image or message.
>> 
>> 
>> AFAIK, there is no change in requirement for 0.7.1 from 0.7.0. Specifically, BerkeleyDB jars are not bundled with Atlas due to Apache licensing restrictions. The default profile (when we build with -Pdist) expects a setup of external HBase and Solr, which is  the preferred deployment mode. If you need to build with BerkeleyDB and ElasticSearch, you should use a specific profile and also manually get the dependent jars copied to the deployment. This is documented in the 0.7 documentation here: http://atlas.incubator.apache.org/0.7.0-incubating/InstallationSteps.html?. Please lookup for the profile "berkeley-elasticsearch" and let us know if that gives you information required.
>> 
>> 
>> Thanks
>> 
>> Hemanth
>> 
>> ________________________________
>> From: Russell Anderson <rg...@us.ibm.com>
>> Sent: Tuesday, January 24, 2017 6:48 AM
>> To: dev@atlas.incubator.apache.org
>> Cc: Apoorv Naik; Madhan Neethiraj; Ashutosh Mestry; Sarath Subramanian; David Radley; Vimal Sharma; Russell Anderson; Barry Rosen
>> Subject: .7.1 rc3 - change in requirements at Run Time ?
>> 
>> 
>> [cid:1__=8FBB0A21DF9506DF8f9e8a93df938690918c8FB@]
>> 
>> Can someone please tell me if there is a new run time required library for Atlas .7.1 rc3 versus .7.0 rc2 ?
>> 
>> Atlas will not start up without this class - this appears to be a specific Berkeley DB java class. What version of the JDK is required ?
>> 
>> Regards,
>> 
>> Russ.
>> 
>> 
>> [Inactive hide details for Sarath Subramanian ---01/23/2017 07:56:19 PM---------------------------------------------------------]Sarath Subramanian ---01/23/2017 07:56:19 PM-------------------------------------------------------------- This is an automatically generated e-mai
>> 
>> From: Sarath Subramanian <sa...@gmail.com>
>> To: Apoorv Naik <na...@gmail.com>, Madhan Neethiraj <ma...@apache.org>, Ashutosh Mestry <am...@hortonworks.com>
>> Cc: Sarath Subramanian <sa...@gmail.com>, atlas <de...@atlas.incubator.apache.org>, David Radley <da...@uk.ibm.com>, Vimal Sharma <vi...@hortonworks.com>
>> Date: 01/23/2017 07:56 PM
>> Subject: Re: Review Request 55358: [ATLAS-1312] Update QuickStart to use the new APIs for type and entities creation
>> Sent by: Sarath Subramanian <no...@reviews.apache.org>
>> 
>> ________________________________
>> 
>> 
>> 
>> 
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/55358/
>> -----------------------------------------------------------
>> 
>> (Updated Jan. 23, 2017, 4:55 p.m.)
>> 
>> 
>> Review request for atlas, Apoorv Naik, Ashutosh Mestry, Madhan Neethiraj, and Suma Shivaprasad.
>> 
>> 
>> Bugs: ATLAS-1312
>>   https://issues.apache.org/jira/browse/ATLAS-1312
>> 
>> 
>> Repository: atlas
>> 
>> 
>> Description
>> -------
>> 
>> The quick start currently uses old APIs to create types and entities. This needs to be updated to use the v2 APIs for types and entities.
>> 
>> 
>> Diffs (updated)
>> -----
>> 
>> client/src/main/java/org/apache/atlas/AtlasBaseClient.java d055b78
>> client/src/main/java/org/apache/atlas/AtlasLineageClientV2.java PRE-CREATION
>> distro/src/bin/quick_start.py 14c8464
>> distro/src/bin/quick_start_v1.py PRE-CREATION
>> intg/src/main/java/org/apache/atlas/type/AtlasTypeUtil.java c866946
>> webapp/src/main/java/org/apache/atlas/examples/QuickStart.java 8322bc6
>> webapp/src/main/java/org/apache/atlas/examples/QuickStartV2.java PRE-CREATION
>> webapp/src/test/java/org/apache/atlas/examples/QuickStartV2IT.java PRE-CREATION
>> webapp/src/test/java/org/apache/atlas/web/resources/BaseResourceIT.java 51be64c
>> 
>> Diff: https://reviews.apache.org/r/55358/diff/
>> 
>> 
>> Testing
>> -------
>> 
>> Tested using POstman REST Client and new ITs added
>> 
>> 
>> Thanks,
>> 
>> Sarath Subramanian
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>