You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Dharshana M Krishnamoorthy (Jira)" <ji...@apache.org> on 2022/04/28 11:53:00 UTC
[jira] [Updated] (ATLAS-4595) [Hive import v2]When using file name to import via v2 api, the entities are not reflected in atlas though the import is successful
[ https://issues.apache.org/jira/browse/ATLAS-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dharshana M Krishnamoorthy updated ATLAS-4595:
----------------------------------------------
Description:
Scenario:
use --filename in the import script in along with --output so that v2 api is invoked
Eg:
{code:java}
export JAVA_HOME=/usr/java/default; /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip{code}
Steps:
# Create 2 databases db_1 and db_2
# Create 2 tables under each db
# Run import using filename that has database db_1 name
The import was success, but the entities are not reflected in atlas
{code:java}
2022-04-28 10:50:52,693|INFO|MainThread|machine.py:185 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|RUNNING: ssh -l root -i /tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null quasar-jagdkt-5.quasar-jagdkt.root.hwx.site "sudo -u root sh -c 'export JAVA_HOME=/usr/java/default; /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip'" 2022-04-28 10:50:52,957|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Using Hive configuration directory [/etc/hive/conf] 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|/etc/hive/conf:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/.//* 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Log file for import is /var/log/atlas/import-hive.log 2022-04-28 10:50:55,328|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout. 2022-04-28 10:50:55,329|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout. 2022-04-28 10:51:18,889|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: An illegal reflective access operation has occurred 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Illegal reflective access by org.apache.hadoop.hive.common.StringInternUtils (file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/jars/hive-exec-3.1.3000.7.1.8.0-581.jar) to field java.net.URI.string 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.hive.common.StringInternUtils 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations 2022-04-28 10:51:18,891|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: All illegal access operations will be denied in a future release 2022-04-28 10:51:20,824|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Hive Meta Data imported successfully! 2022-04-28 10:51:20,850|INFO|MainThread|machine.py:227 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Exit Code: 0 {code}
Additional details: file_tejqc.txt file content
{code:java}
cat /tmp/file_tejqc.txt
db_hive_db_dumeh {code}
Tables in the db:
{code:java}
0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> use db_hive_db_dumeh;
INFO : Compiling command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use db_hive_db_dumeh
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time taken: 0.016 seconds
INFO : Executing command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use db_hive_db_dumeh
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time taken: 0.007 seconds
INFO : OK
No rows affected (0.036 seconds)
0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> show tables;
INFO : Compiling command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show tables
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time taken: 0.134 seconds
INFO : Executing command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show tables
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time taken: 0.015 seconds
INFO : OK
+-----------+
| tab_name |
+-----------+
| table_1 |
| table_2 |
+-----------+
2 rows selected (0.688 seconds) {code}
was:
Scenario:
use --filename in the import script in along with --output so that v2 api is invoked
Eg:
{code:java}
'/opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_hqavs.txt --output /tmp/db_axmqv.zip {code}
There is some delay (few seconds) before it reflects in atlas.
Steps:
# Create 2 databases db_1 and db_2
# Run import using filename that has tables belonging to database1
When a search is performed immediately after the import, the data is not reflected in atlas, if we wait for 5 seconds and then search again, data is reflected.
This does not happen in the following scenarios:
# when v1 api is used
# when v2 api is used with database name
# when v2 api is used with table name
*It happens only when v2 api is used along with file name*
This is not a blocker bug as the data reflects in atlas.
But creating to find the reason why this happens only while using file name in v2 api.
Summary: [Hive import v2]When using file name to import via v2 api, the entities are not reflected in atlas though the import is successful (was: [Hive import v2] [Performance]When using file name to import via v2 api, there is some delay before the entities are reflected in atlas)
> [Hive import v2]When using file name to import via v2 api, the entities are not reflected in atlas though the import is successful
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: ATLAS-4595
> URL: https://issues.apache.org/jira/browse/ATLAS-4595
> Project: Atlas
> Issue Type: Bug
> Components: atlas-core
> Reporter: Dharshana M Krishnamoorthy
> Priority: Major
>
> Scenario:
> use --filename in the import script in along with --output so that v2 api is invoked
> Eg:
> {code:java}
> export JAVA_HOME=/usr/java/default; /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip{code}
> Steps:
> # Create 2 databases db_1 and db_2
> # Create 2 tables under each db
> # Run import using filename that has database db_1 name
> The import was success, but the entities are not reflected in atlas
> {code:java}
> 2022-04-28 10:50:52,693|INFO|MainThread|machine.py:185 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|RUNNING: ssh -l root -i /tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null quasar-jagdkt-5.quasar-jagdkt.root.hwx.site "sudo -u root sh -c 'export JAVA_HOME=/usr/java/default; /opt/cloudera/parcels/CDH/lib/atlas/hook-bin/import-hive.sh --filename /tmp/file_tejqc.txt --output /tmp/db_okgbi.zip'" 2022-04-28 10:50:52,957|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Using Hive configuration directory [/etc/hive/conf] 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|/etc/hive/conf:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/./:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/lib/hadoop/libexec/../../hadoop-yarn/.//* 2022-04-28 10:50:53,152|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Log file for import is /var/log/atlas/import-hive.log 2022-04-28 10:50:55,328|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout. 2022-04-28 10:50:55,329|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout. 2022-04-28 10:51:18,889|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: An illegal reflective access operation has occurred 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Illegal reflective access by org.apache.hadoop.hive.common.StringInternUtils (file:/opt/cloudera/parcels/CDH-7.1.8-1.cdh7.1.8.p0.25947682/jars/hive-exec-3.1.3000.7.1.8.0-581.jar) to field java.net.URI.string 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.hive.common.StringInternUtils 2022-04-28 10:51:18,890|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations 2022-04-28 10:51:18,891|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|WARNING: All illegal access operations will be denied in a future release 2022-04-28 10:51:20,824|INFO|MainThread|machine.py:200 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Hive Meta Data imported successfully! 2022-04-28 10:51:20,850|INFO|MainThread|machine.py:227 - run()||GUID=003c431b-4087-4990-9d50-d763ef06c51a|Exit Code: 0 {code}
>
> Additional details: file_tejqc.txt file content
> {code:java}
> cat /tmp/file_tejqc.txt
> db_hive_db_dumeh {code}
> Tables in the db:
> {code:java}
> 0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> use db_hive_db_dumeh;
> INFO : Compiling command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use db_hive_db_dumeh
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO : Completed compiling command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time taken: 0.016 seconds
> INFO : Executing command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463): use db_hive_db_dumeh
> INFO : Starting task [Stage-0:DDL] in serial mode
> INFO : Completed executing command(queryId=hive_20220428115112_876a9b0b-a19c-4ee6-b827-c777e4398463); Time taken: 0.007 seconds
> INFO : OK
> No rows affected (0.036 seconds)
> 0: jdbc:hive2://quasar-jagdkt-1.quasar-jagdkt> show tables;
> INFO : Compiling command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show tables
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
> INFO : Completed compiling command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time taken: 0.134 seconds
> INFO : Executing command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314): show tables
> INFO : Starting task [Stage-0:DDL] in serial mode
> INFO : Completed executing command(queryId=hive_20220428115116_01b637f7-7869-49a8-95df-255eb6be7314); Time taken: 0.015 seconds
> INFO : OK
> +-----------+
> | tab_name |
> +-----------+
> | table_1 |
> | table_2 |
> +-----------+
> 2 rows selected (0.688 seconds) {code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)