You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Dharshana M Krishnamoorthy (Jira)" <ji...@apache.org> on 2022/04/28 11:53:00 UTC

[jira] [Updated] (ATLAS-4595) [Hive import v2]When using file name to import via v2 api, the entities are not reflected in atlas though the import is successful

     [ https://issues.apache.org/jira/browse/ATLAS-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dharshana M Krishnamoorthy updated ATLAS-4595:
----------------------------------------------
    Description: 
Scenario:

use --filename in the import script in along with --output so that v2 api is invoked 

Eg:
{code:java}
export JAVA_HOME=/usr/java/default; /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip{code}
Steps:
 # Create 2 databases db_1 and db_2
 # Create 2 tables under each db
 # Run import using filename that has database db_1 name 

The import was success, but the entities are not reflected in atlas
{code:java}
2022-04-28 10:50:52,693|INFO|MainThread|machine.py:185 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|RUNNING: ssh -l root -i /tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null quasar-jagdkt-5.quasar-jagdkt.root.hwx.site "sudo -u root sh -c 'export JAVA_HOME=/usr/java/default; /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip'" 2022-04-28 10:50:52,957|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Using Hive configuration directory [/etc/hive/conf] 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|/etc/hive/conf:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/.//* 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Log file for import is /var/log/atlas/import-hive.log 2022-04-28 10:50:55,328|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout. 2022-04-28 10:50:55,329|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout. 2022-04-28 10:51:18,889|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: An illegal reflective access operation has occurred 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Illegal reflective access by org.apache.hadoop.hive.common.StringInternUtils (file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/jars/hive-exec-3.1.3000.7.1.8.0-581.jar) to field java.net.URI.string 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.hive.common.StringInternUtils 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations 2022-04-28 10:51:18,891|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: All illegal access operations will be denied in a future release 2022-04-28 10:51:20,824|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Hive Meta Data imported successfully! 2022-04-28 10:51:20,850|INFO|MainThread|machine.py:227 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Exit Code: 0 {code}
 

Additional details: file_tejqc.txt file content
{code:java}
cat /tmp/file_tejqc.txt
db_hive_db_dumeh {code}
Tables in the db:
{code:java}
0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> use db_hive_db_dumeh;
INFO  : Compiling command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use db_hive_db_dumeh
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time taken: 0.016 seconds
INFO  : Executing command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use db_hive_db_dumeh
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time taken: 0.007 seconds
INFO  : OK
No rows affected (0.036 seconds)
0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> show tables;
INFO  : Compiling command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show tables
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time taken: 0.134 seconds
INFO  : Executing command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time taken: 0.015 seconds
INFO  : OK
+-----------+
| tab_name  |
+-----------+
| table_1   |
| table_2   |
+-----------+
2 rows selected (0.688 seconds) {code}
 

  was:
Scenario:

use --filename in the import script in along with --output so that v2 api is invoked 

Eg:
{code:java}
'/opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_hqavs.txt --output /tmp/db_axmqv.zip {code}
There is some delay (few seconds) before it reflects in atlas.

Steps:
 # Create 2 databases db_1 and db_2
 # Run import using filename that has tables belonging to database1

When a search is performed immediately after the import, the data is not reflected in atlas, if we wait for 5 seconds and then search again, data is reflected.

This does not happen in the following scenarios:
 # when v1 api is used
 # when v2 api is used with database name
 # when v2 api is used with table name

*It happens only when v2 api is used along with file name*

This is not a blocker bug as the data reflects in atlas.

But creating to find the reason why this happens only while using file name in v2 api.

 

 

        Summary: [Hive import v2]When using file name to import via v2 api, the entities are not reflected in atlas though the import is successful  (was: [Hive import v2] [Performance]When using file name to import via v2 api, there is some delay before the entities are reflected in atlas)

> [Hive import v2]When using file name to import via v2 api, the entities are not reflected in atlas though the import is successful
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-4595
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4595
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>            Reporter: Dharshana M Krishnamoorthy
>            Priority: Major
>
> Scenario:
> use --filename in the import script in along with --output so that v2 api is invoked 
> Eg:
> {code:java}
> export JAVA_HOME=/usr/java/default; /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip{code}
> Steps:
>  # Create 2 databases db_1 and db_2
>  # Create 2 tables under each db
>  # Run import using filename that has database db_1 name 
> The import was success, but the entities are not reflected in atlas
> {code:java}
> 2022-04-28 10:50:52,693|INFO|MainThread|machine.py:185 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|RUNNING: ssh -l root -i /tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null quasar-jagdkt-5.quasar-jagdkt.root.hwx.site "sudo -u root sh -c 'export JAVA_HOME=/usr/java/default; /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip'" 2022-04-28 10:50:52,957|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Using Hive configuration directory [/etc/hive/conf] 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|/etc/hive/conf:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/.//* 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Log file for import is /var/log/atlas/import-hive.log 2022-04-28 10:50:55,328|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout. 2022-04-28 10:50:55,329|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout. 2022-04-28 10:51:18,889|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: An illegal reflective access operation has occurred 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Illegal reflective access by org.apache.hadoop.hive.common.StringInternUtils (file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/jars/hive-exec-3.1.3000.7.1.8.0-581.jar) to field java.net.URI.string 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.hive.common.StringInternUtils 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations 2022-04-28 10:51:18,891|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: All illegal access operations will be denied in a future release 2022-04-28 10:51:20,824|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Hive Meta Data imported successfully! 2022-04-28 10:51:20,850|INFO|MainThread|machine.py:227 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Exit Code: 0 {code}
>  
> Additional details: file_tejqc.txt file content
> {code:java}
> cat /tmp/file_tejqc.txt
> db_hive_db_dumeh {code}
> Tables in the db:
> {code:java}
> 0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> use db_hive_db_dumeh;
> INFO  : Compiling command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use db_hive_db_dumeh
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time taken: 0.016 seconds
> INFO  : Executing command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use db_hive_db_dumeh
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time taken: 0.007 seconds
> INFO  : OK
> No rows affected (0.036 seconds)
> 0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> show tables;
> INFO  : Compiling command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show tables
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
> INFO  : Completed compiling command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time taken: 0.134 seconds
> INFO  : Executing command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show tables
> INFO  : Starting task [Stage-0:DDL] in serial mode
> INFO  : Completed executing command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time taken: 0.015 seconds
> INFO  : OK
> +-----------+
> | tab_name  |
> +-----------+
> | table_1   |
> | table_2   |
> +-----------+
> 2 rows selected (0.688 seconds) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)