You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Madhan Neethiraj <ma...@apache.org> on 2018/12/25 09:23:54 UTC

Review Request 69632: ATLAS-3006: option to ignore/prune hive entities in hook notifications

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69632/
-----------------------------------------------------------

Review request for atlas.


Bugs: ATLAS-3006
    https://issues.apache.org/jira/browse/ATLAS-3006


Repository: atlas


Description
-------

Introduced following configurations to specify temporary/staging Hive tables, so that Hive hook/Atlas server can ignore or prune these tables. For pruned tables, columns and column-lineage details will be ignored.

# configurations for Hive hook
  atlas.hook.hive.ignore.hive_table.pattern=
  atlas.hook.hive.prune.hive_table.pattern=

# configurations for Atlas server
  atlas.notification.consumer.ignore.hive_table.pattern=
  atlas.notification.consumer.prune.hive_table.pattern=


Appropriate use of these configurations can avoid loading Atlas with unnecessary metadata of trainsient tables.


Diffs
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java 23cb853ca 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 0f4857856 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/AlterTableRename.java 35b058639 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java e4537b461 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateTable.java 442a0a0aa 
  webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java 003e5b01b 
  webapp/src/main/java/org/apache/atlas/notification/preprocessor/EntityPreprocessor.java PRE-CREATION 
  webapp/src/main/java/org/apache/atlas/notification/preprocessor/HivePreprocessor.java PRE-CREATION 
  webapp/src/main/java/org/apache/atlas/notification/preprocessor/PreprocessorContext.java PRE-CREATION 


Diff: https://reviews.apache.org/r/69632/diff/1/


Testing
-------

Verified with the following script that Hive hook and Atlas server ignore/prune metadata for specified Hive tables:

# configurations for Hive hook
atlas.hook.hive.ignore.hive_table.pattern=temp\..*,test\..*
atlas.hook.hive.prune.hive_table.pattern=staging\..*,.*_stg@.*


# configurations for Atlas server
atlas.notification.consumer.ignore.hive_table.pattern=temp\..*,test\..*
atlas.notification.consumer.prune.hive_table.pattern=staging\..*,.*_stg@.*

CREATE DATABASE IF NOT EXISTS test;
CREATE DATABASE IF NOT EXISTS temp;
CREATE DATABASE IF NOT EXISTS staging;
CREATE DATABASE IF NOT EXISTS prod;

DROP VIEW  IF EXISTS test.testView;
DROP TABLE IF EXISTS test.testTable;

DROP VIEW  IF EXISTS temp.tempView;
DROP TABLE IF EXISTS temp.tempTable;

DROP VIEW  IF EXISTS staging.stagingView;
DROP TABLE IF EXISTS staging.stagingTable;

DROP VIEW  IF EXISTS prod.prodView;
DROP TABLE IF EXISTS prod.prodTable;
DROP TABLE IF EXISTS prod.prodSourceTable;

DROP VIEW  IF EXISTS prod.myTable_stg;
DROP TABLE IF EXISTS prod.myView_stg;

CREATE TABLE test.testTable(id INT, name STRING);
CREATE VIEW  test.testView AS SELECT * FROM test.testTable;

CREATE TABLE temp.tempTable(id INT, name STRING);
CREATE VIEW  temp.tempView AS SELECT * FROM temp.tempTable;

CREATE TABLE staging.stagingTable(id INT, name STRING);
CREATE VIEW  staging.stagingView AS SELECT * FROM staging.stagingTable;

CREATE TABLE prod.prodSourceTable(id INT, name STRING);
CREATE TABLE prod.prodTable(id INT, name STRING);
CREATE VIEW  prod.prodView AS SELECT * FROM prod.prodTable;

CREATE TABLE prod.myTable_stg(id INT, name STRING);
CREATE VIEW  prod.myView_stg AS SELECT * FROM prod.prodTable;

INSERT INTO TABLE prod.prodTable SELECT * FROM staging.stagingTable;
INSERT INTO TABLE prod.prodTable SELECT * FROM temp.tempTable;
INSERT INTO TABLE prod.prodTable SELECT * FROM prod.myView_stg;
INSERT INTO TABLE prod.prodTable SELECT * FROM staging.stagingView;
INSERT INTO TABLE prod.prodTable SELECT * FROM prod.prodSourceTable;


Thanks,

Madhan Neethiraj


Re: Review Request 69632: ATLAS-3006: option to ignore/prune hive entities in hook notifications

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69632/#review211693
-----------------------------------------------------------


Ship it!




Ship It!

- Sarath Subramanian


On Dec. 26, 2018, 1:22 p.m., Madhan Neethiraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69632/
> -----------------------------------------------------------
> 
> (Updated Dec. 26, 2018, 1:22 p.m.)
> 
> 
> Review request for atlas.
> 
> 
> Bugs: ATLAS-3006
>     https://issues.apache.org/jira/browse/ATLAS-3006
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Introduced following configurations to specify temporary/staging Hive tables, so that Hive hook/Atlas server can ignore or prune these tables. For pruned tables, columns and column-lineage details will be ignored.
> 
> # configurations for Hive hook
>   atlas.hook.hive.ignore.hive_table.pattern=
>   atlas.hook.hive.prune.hive_table.pattern=
> 
> # configurations for Atlas server
>   atlas.notification.consumer.ignore.hive_table.pattern=
>   atlas.notification.consumer.prune.hive_table.pattern=
> 
> 
> Appropriate use of these configurations can avoid loading Atlas with unnecessary metadata of trainsient tables.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java 23cb853ca 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 0f4857856 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/AlterTableRename.java 35b058639 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java e4537b461 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateTable.java 442a0a0aa 
>   webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java 003e5b01b 
>   webapp/src/main/java/org/apache/atlas/notification/preprocessor/EntityPreprocessor.java PRE-CREATION 
>   webapp/src/main/java/org/apache/atlas/notification/preprocessor/HivePreprocessor.java PRE-CREATION 
>   webapp/src/main/java/org/apache/atlas/notification/preprocessor/PreprocessorContext.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69632/diff/2/
> 
> 
> Testing
> -------
> 
> Verified with the following script that Hive hook and Atlas server ignore/prune metadata for specified Hive tables:
> 
> # configurations for Hive hook
> atlas.hook.hive.ignore.hive_table.pattern=temp\..*,test\..*
> atlas.hook.hive.prune.hive_table.pattern=staging\..*,.*_stg@.*
> 
> 
> # configurations for Atlas server
> atlas.notification.consumer.ignore.hive_table.pattern=temp\..*,test\..*
> atlas.notification.consumer.prune.hive_table.pattern=staging\..*,.*_stg@.*
> 
> CREATE DATABASE IF NOT EXISTS test;
> CREATE DATABASE IF NOT EXISTS temp;
> CREATE DATABASE IF NOT EXISTS staging;
> CREATE DATABASE IF NOT EXISTS prod;
> 
> DROP VIEW  IF EXISTS test.testView;
> DROP TABLE IF EXISTS test.testTable;
> 
> DROP VIEW  IF EXISTS temp.tempView;
> DROP TABLE IF EXISTS temp.tempTable;
> 
> DROP VIEW  IF EXISTS staging.stagingView;
> DROP TABLE IF EXISTS staging.stagingTable;
> 
> DROP VIEW  IF EXISTS prod.prodView;
> DROP TABLE IF EXISTS prod.prodTable;
> DROP TABLE IF EXISTS prod.prodSourceTable;
> 
> DROP VIEW  IF EXISTS prod.myTable_stg;
> DROP TABLE IF EXISTS prod.myView_stg;
> 
> CREATE TABLE test.testTable(id INT, name STRING);
> CREATE VIEW  test.testView AS SELECT * FROM test.testTable;
> 
> CREATE TABLE temp.tempTable(id INT, name STRING);
> CREATE VIEW  temp.tempView AS SELECT * FROM temp.tempTable;
> 
> CREATE TABLE staging.stagingTable(id INT, name STRING);
> CREATE VIEW  staging.stagingView AS SELECT * FROM staging.stagingTable;
> 
> CREATE TABLE prod.prodSourceTable(id INT, name STRING);
> CREATE TABLE prod.prodTable(id INT, name STRING);
> CREATE VIEW  prod.prodView AS SELECT * FROM prod.prodTable;
> 
> CREATE TABLE prod.myTable_stg(id INT, name STRING);
> CREATE VIEW  prod.myView_stg AS SELECT * FROM prod.prodTable;
> 
> INSERT INTO TABLE prod.prodTable SELECT * FROM staging.stagingTable;
> INSERT INTO TABLE prod.prodTable SELECT * FROM temp.tempTable;
> INSERT INTO TABLE prod.prodTable SELECT * FROM prod.myView_stg;
> INSERT INTO TABLE prod.prodTable SELECT * FROM staging.stagingView;
> INSERT INTO TABLE prod.prodTable SELECT * FROM prod.prodSourceTable;
> 
> 
> Thanks,
> 
> Madhan Neethiraj
> 
>


Re: Review Request 69632: ATLAS-3006: option to ignore/prune hive entities in hook notifications

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69632/
-----------------------------------------------------------

(Updated Dec. 26, 2018, 9:22 p.m.)


Review request for atlas.


Changes
-------

- added caching of hive_table ignore/prune state
- updated to use following Atlas server configuration:
. atlas.notification.consumer.preprocess.hive_table.ignore.pattern
. atlas.notification.consumer.preprocess.hive_table.prune.pattern
. atlas.notification.consumer.preprocess.hive_table.cache.size
- updated to use following Hive hook configuration:
. atlas.hive.hook.hive_table.ignore.pattern
. atlas.hive.hook.hive_table.prune.pattern
. atlas.hive.hook.hive_table.cache.size


Bugs: ATLAS-3006
    https://issues.apache.org/jira/browse/ATLAS-3006


Repository: atlas


Description
-------

Introduced following configurations to specify temporary/staging Hive tables, so that Hive hook/Atlas server can ignore or prune these tables. For pruned tables, columns and column-lineage details will be ignored.

# configurations for Hive hook
  atlas.hook.hive.ignore.hive_table.pattern=
  atlas.hook.hive.prune.hive_table.pattern=

# configurations for Atlas server
  atlas.notification.consumer.ignore.hive_table.pattern=
  atlas.notification.consumer.prune.hive_table.pattern=


Appropriate use of these configurations can avoid loading Atlas with unnecessary metadata of trainsient tables.


Diffs (updated)
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java 23cb853ca 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 0f4857856 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/AlterTableRename.java 35b058639 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java e4537b461 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateTable.java 442a0a0aa 
  webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java 003e5b01b 
  webapp/src/main/java/org/apache/atlas/notification/preprocessor/EntityPreprocessor.java PRE-CREATION 
  webapp/src/main/java/org/apache/atlas/notification/preprocessor/HivePreprocessor.java PRE-CREATION 
  webapp/src/main/java/org/apache/atlas/notification/preprocessor/PreprocessorContext.java PRE-CREATION 


Diff: https://reviews.apache.org/r/69632/diff/2/

Changes: https://reviews.apache.org/r/69632/diff/1-2/


Testing
-------

Verified with the following script that Hive hook and Atlas server ignore/prune metadata for specified Hive tables:

# configurations for Hive hook
atlas.hook.hive.ignore.hive_table.pattern=temp\..*,test\..*
atlas.hook.hive.prune.hive_table.pattern=staging\..*,.*_stg@.*


# configurations for Atlas server
atlas.notification.consumer.ignore.hive_table.pattern=temp\..*,test\..*
atlas.notification.consumer.prune.hive_table.pattern=staging\..*,.*_stg@.*

CREATE DATABASE IF NOT EXISTS test;
CREATE DATABASE IF NOT EXISTS temp;
CREATE DATABASE IF NOT EXISTS staging;
CREATE DATABASE IF NOT EXISTS prod;

DROP VIEW  IF EXISTS test.testView;
DROP TABLE IF EXISTS test.testTable;

DROP VIEW  IF EXISTS temp.tempView;
DROP TABLE IF EXISTS temp.tempTable;

DROP VIEW  IF EXISTS staging.stagingView;
DROP TABLE IF EXISTS staging.stagingTable;

DROP VIEW  IF EXISTS prod.prodView;
DROP TABLE IF EXISTS prod.prodTable;
DROP TABLE IF EXISTS prod.prodSourceTable;

DROP VIEW  IF EXISTS prod.myTable_stg;
DROP TABLE IF EXISTS prod.myView_stg;

CREATE TABLE test.testTable(id INT, name STRING);
CREATE VIEW  test.testView AS SELECT * FROM test.testTable;

CREATE TABLE temp.tempTable(id INT, name STRING);
CREATE VIEW  temp.tempView AS SELECT * FROM temp.tempTable;

CREATE TABLE staging.stagingTable(id INT, name STRING);
CREATE VIEW  staging.stagingView AS SELECT * FROM staging.stagingTable;

CREATE TABLE prod.prodSourceTable(id INT, name STRING);
CREATE TABLE prod.prodTable(id INT, name STRING);
CREATE VIEW  prod.prodView AS SELECT * FROM prod.prodTable;

CREATE TABLE prod.myTable_stg(id INT, name STRING);
CREATE VIEW  prod.myView_stg AS SELECT * FROM prod.prodTable;

INSERT INTO TABLE prod.prodTable SELECT * FROM staging.stagingTable;
INSERT INTO TABLE prod.prodTable SELECT * FROM temp.tempTable;
INSERT INTO TABLE prod.prodTable SELECT * FROM prod.myView_stg;
INSERT INTO TABLE prod.prodTable SELECT * FROM staging.stagingView;
INSERT INTO TABLE prod.prodTable SELECT * FROM prod.prodSourceTable;


Thanks,

Madhan Neethiraj