You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Pinal Shah <pi...@freestoneinfotech.com> on 2020/08/28 13:33:19 UTC

Review Request 72815: ENGESC-3520: Deletion of non existing hive entities

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/
-----------------------------------------------------------

Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Repository: atlas


Description
-------

**Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.

**Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.

**Usage:** ./import-hive.sh -deleteNonExisting


Diffs
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
  client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 


Diff: https://reviews.apache.org/r/72815/diff/1/


Testing
-------

Manually tested


Thanks,

Pinal Shah


Re: Review Request 72815: Deletion of non existing hive entities

Posted by Pinal Shah <pi...@freestoneinfotech.com>.

> On Sept. 4, 2020, 4:04 p.m., Madhan Neethiraj wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
> > Lines 959 (patched)
> > <https://reviews.apache.org/r/72815/diff/2/?file=2239187#file2239187line968>
> >
> >     Returning after deleting one entity? Please review.

thanks for pointing, it is incorrect


- Pinal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/#review221806
-----------------------------------------------------------


On Sept. 5, 2020, 5:02 a.m., Pinal Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72815/
> -----------------------------------------------------------
> 
> (Updated Sept. 5, 2020, 5:02 a.m.)
> 
> 
> Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.
> 
> **Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.
> 
> **Usage:** ./import-hive.sh -deleteNonExisting
> 
> **NOTE:** **atlas.hook.hive.page.limit** property is added to configure the pageSize/limit while fetching entities from Atlas
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
>   client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 
> 
> 
> Diff: https://reviews.apache.org/r/72815/diff/3/
> 
> 
> Testing
> -------
> 
> Manually tested
> 
> 
> Thanks,
> 
> Pinal Shah
> 
>


Re: Review Request 72815: Deletion of non existing hive entities

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/#review221806
-----------------------------------------------------------




addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 862 (patched)
<https://reviews.apache.org/r/72815/#comment310888>

    Since there is only one criteria in the filter, there is no need for innerFc. Consider the following:
    
      SearchParameters.FilterCriteria fc = new SearchParameters.FilterCriteria();
     
      fc.setAttributeName(ATTRIBUTE_CLUSTER_NAME);
      fc.setAttributeValue(clusterName);
      fc.setOperator(SearchParameters.Operator.EQ);
    
    Also, the filter criteira is the same across all iteration of the for loop at #858. Consider setting up the criteira before once before entering the for loop.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 959 (patched)
<https://reviews.apache.org/r/72815/#comment310890>

    Returning after deleting one entity? Please review.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 968 (patched)
<https://reviews.apache.org/r/72815/#comment310892>

    deleteHiveMetadata() => deleteEntitiesForNonExistingHiveMetadata()



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 999 (patched)
<https://reviews.apache.org/r/72815/#comment310891>

    There is only one parameter to replace, as LOG.error() will print execption details for 'e'. Please review and remove ', {}' at the end of the message.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 1006 (patched)
<https://reviews.apache.org/r/72815/#comment310889>

    Consider replacing:
     Queue      => List
     LinkedList => ArrayList


- Madhan Neethiraj


On Sept. 3, 2020, 12:21 p.m., Pinal Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72815/
> -----------------------------------------------------------
> 
> (Updated Sept. 3, 2020, 12:21 p.m.)
> 
> 
> Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.
> 
> **Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.
> 
> **Usage:** ./import-hive.sh -deleteNonExisting
> 
> **NOTE:** **atlas.hook.hive.page.limit** property is added to configure the pageSize/limit while fetching entities from Atlas
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
>   client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 
> 
> 
> Diff: https://reviews.apache.org/r/72815/diff/2/
> 
> 
> Testing
> -------
> 
> Manually tested
> 
> 
> Thanks,
> 
> Pinal Shah
> 
>


Re: Review Request 72815: Deletion of non existing hive entities

Posted by Pinal Shah <pi...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/
-----------------------------------------------------------

(Updated Sept. 11, 2020, 12:27 p.m.)


Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

updating patch for branch-0.8


Repository: atlas


Description
-------

**Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.

**Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.

**Usage:** ./import-hive.sh -deleteNonExisting

**NOTE:** **atlas.hook.hive.page.limit** property is added to configure the pageSize/limit while fetching entities from Atlas


Diffs (updated)
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java f18d01b7c 
  client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 


Diff: https://reviews.apache.org/r/72815/diff/4/

Changes: https://reviews.apache.org/r/72815/diff/3-4/


Testing
-------

Manually tested


Thanks,

Pinal Shah


Re: Review Request 72815: Deletion of non existing hive entities

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/#review221808
-----------------------------------------------------------


Ship it!




Ship It!

- Madhan Neethiraj


On Sept. 5, 2020, 5:02 a.m., Pinal Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72815/
> -----------------------------------------------------------
> 
> (Updated Sept. 5, 2020, 5:02 a.m.)
> 
> 
> Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.
> 
> **Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.
> 
> **Usage:** ./import-hive.sh -deleteNonExisting
> 
> **NOTE:** **atlas.hook.hive.page.limit** property is added to configure the pageSize/limit while fetching entities from Atlas
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
>   client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 
> 
> 
> Diff: https://reviews.apache.org/r/72815/diff/3/
> 
> 
> Testing
> -------
> 
> Manually tested
> 
> 
> Thanks,
> 
> Pinal Shah
> 
>


Re: Review Request 72815: Deletion of non existing hive entities

Posted by Pinal Shah <pi...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/
-----------------------------------------------------------

(Updated Sept. 5, 2020, 5:02 a.m.)


Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

addressed comments


Repository: atlas


Description
-------

**Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.

**Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.

**Usage:** ./import-hive.sh -deleteNonExisting

**NOTE:** **atlas.hook.hive.page.limit** property is added to configure the pageSize/limit while fetching entities from Atlas


Diffs (updated)
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
  client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 


Diff: https://reviews.apache.org/r/72815/diff/3/

Changes: https://reviews.apache.org/r/72815/diff/2-3/


Testing
-------

Manually tested


Thanks,

Pinal Shah


Re: Review Request 72815: Deletion of non existing hive entities

Posted by Pinal Shah <pi...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/
-----------------------------------------------------------

(Updated Sept. 3, 2020, 12:21 p.m.)


Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Repository: atlas


Description (updated)
-------

**Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.

**Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.

**Usage:** ./import-hive.sh -deleteNonExisting

**NOTE:** **atlas.hook.hive.page.limit** property is added to configure the pageSize/limit while fetching entities from Atlas


Diffs
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
  client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 


Diff: https://reviews.apache.org/r/72815/diff/2/


Testing
-------

Manually tested


Thanks,

Pinal Shah


Re: Review Request 72815: Deletion of non existing hive entities

Posted by Pinal Shah <pi...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/
-----------------------------------------------------------

(Updated Sept. 3, 2020, 10:26 a.m.)


Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

Addressed review comments


Repository: atlas


Description
-------

**Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.

**Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.

**Usage:** ./import-hive.sh -deleteNonExisting


Diffs (updated)
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
  client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 


Diff: https://reviews.apache.org/r/72815/diff/2/

Changes: https://reviews.apache.org/r/72815/diff/1-2/


Testing
-------

Manually tested


Thanks,

Pinal Shah


Re: Review Request 72815: Deletion of non existing hive entities

Posted by Pinal Shah <pi...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/
-----------------------------------------------------------

(Updated Aug. 31, 2020, 8:51 a.m.)


Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Summary (updated)
-----------------

Deletion of non existing hive entities


Repository: atlas


Description
-------

**Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.

**Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.

**Usage:** ./import-hive.sh -deleteNonExisting


Diffs
-----

  addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
  client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 


Diff: https://reviews.apache.org/r/72815/diff/1/


Testing
-------

Manually tested


Thanks,

Pinal Shah


Re: Review Request 72815: ENGESC-3520: Deletion of non existing hive entities

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72815/#review221744
-----------------------------------------------------------




addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 103 (patched)
<https://reviews.apache.org/r/72815/#comment310794>

    Delete database and table entities in Atlas if not present in Hive



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 843 (patched)
<https://reviews.apache.org/r/72815/#comment310799>

    getAllDatabaseInCluster(): consider retrieving databases where clusterName == atlas.metdata.namespace, to ensure that databases and tables that belong to another metadata namespace are not deleted.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 846 (patched)
<https://reviews.apache.org/r/72815/#comment310796>

    Consider reading limit (pageSize) from configuration.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 850 (patched)
<https://reviews.apache.org/r/72815/#comment310795>

    Consider following for better readability:
    
      final List<AtlasEntityHeader> entities = new ArrayList<>();
      final int                     pageSize = 10000;
    
      for (int i = 0; ; i++) {
        int offset = pageSize * i;
    
        LOG.info("retrieving databases: offset={}, pageSize={}", offset, pageSize);
    
        AtlasSearchResult       searchResult  = atlasClientV2.basicSearch(HIVE_TYPE_DB, null, null, true, pageSize, offset);
        List<AtlasEntityHeader> entityHeaders = searchResult == null ? null : searchResult.getEntities();
        int                     dbCount       = entityHeaders == null ? 0 : entityHeaders.size();
    
        LOG.info("retrieved {} databases", dbCount);
    
        if (dbCount > 0) {
          entities.addAll(entityHeaders);
        }
    
        if (dbCount < pageSize) { // last page
          break;
        }
      }



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 872 (patched)
<https://reviews.apache.org/r/72815/#comment310797>

    Consider updating getAllTablesInDb() per comment above for getAllDatabaseInCluster().



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 939 (patched)
<https://reviews.apache.org/r/72815/#comment310798>

    Move #939 inside else block at #936.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 966 (patched)
<https://reviews.apache.org/r/72815/#comment310800>

    db.getAttribute(ATTRIBUTE_QUALIFIED_NAME).toString() => (String) db.getAttribute(ATTRIBUTE_QUALIFIED_NAME)
      - this will handle NULL value from db.getAttribute(ATTRIBUTE_QUALIFIED_NAME)



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 969 (patched)
<https://reviews.apache.org/r/72815/#comment310801>

    Include guid in the log message



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 978 (patched)
<https://reviews.apache.org/r/72815/#comment310802>

    Failed to retrieve table entities for database {} from Atlas.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
Lines 985 (patched)
<https://reviews.apache.org/r/72815/#comment310803>

    Queue<String> guidsToDelete = new LinkedList<>();
      
      if (!hiveClient.databaseExists(hiveDbName)) {
        if (CollectionUtils.isNotEmpty(tables)) {
          for (AtlasEntityHeader table : tables) {
            guidsToDelete.add(table.getGuid());
          }
        }
        
        guidsToDelete.add(db.getGuid());
      } else {
        if (CollectionUtils.isNotEmpty(tables)) {
          for (AtlasEntityHeader table : tables) {
            String hiveTableName = getHiveTableName((String) table.getAttribute(ATTRIBUTE_QUALIFIED_NAME), true);
    
            if (StringUtils.isEmpty(hiveTableName)) {
              LOG.error("Cannot get table from qualifiedName {} ", (String) table.getAttribute(ATTRIBUTE_QUALIFIED_NAME));
    
              continue;
            }
    
            try {
              hiveClient.getTable(hiveDbName, hiveTableName, true);
            } catch (InvalidTableException e) {  // table doesn't exist
              LOG.info("Added table {}.{} to delete", hiveDbName, hiveTableName);
    
              guidsToDelete.add(table.getGuid());
            } catch (HiveException e) {
              LOG.error("Failed to get table {}.{} from Hive", hiveDbName, hiveTableName, e);
    
              if(failOnError) {
                throw e;
              }
            }
          }    
        }
      }
    
      if (CollectionUtils.isNotEmpty(guidsToDelete)) {
        try {
          deleteByGuid(guidsToDelete);
        } catch (AtlasServiceException e) {
          LOG.error("Failed to delete Atlas entities for database {} , {}", hiveDbName, e);
    
          if(failOnError) {
            throw e;
          }
        }
      }


- Madhan Neethiraj


On Aug. 28, 2020, 1:33 p.m., Pinal Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72815/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2020, 1:33 p.m.)
> 
> 
> Review request for atlas, Jayendra Parab, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Problem:** Whenever database or table is dropped in hive, and HiveHook is not enabled, we dont have anyway to get the database and table sync with hive.
> 
> **Workaround:** Added support to delete hive entities in Atlas which are dropped in hive.
> 
> **Usage:** ./import-hive.sh -deleteNonExisting
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java e659ca041 
>   client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java 6968e8358 
> 
> 
> Diff: https://reviews.apache.org/r/72815/diff/1/
> 
> 
> Testing
> -------
> 
> Manually tested
> 
> 
> Thanks,
> 
> Pinal Shah
> 
>