You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2016/01/04 13:33:39 UTC

[jira] [Updated] (ATLAS-415) Hive import fails when importing a table that is already imported without StorageDescriptor information

     [ https://issues.apache.org/jira/browse/ATLAS-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated ATLAS-415:
-----------------------------------
    Attachment: ATLAS-415.patch

Attaching a patch for quick review.

The main fix is in using the API {{AtlasClient.updateEntity}} when we find a table is already registered with Atlas. The rest of the changes are to assist a unit test I wrote {{HiveMetaStoreBridgeTest}} and some refactoring.

With this patch, hive-import works for the case I described in the bug and updates the created table properly.

Couple of points that I want to call out to discuss in review:

* This might add additional calls to the server even when there's absolutely no change to the entity. Guess this will have a performance impact, but I am unsure how we can detect if there's any change on the client.
* Currently, I am doing the update only for Tables. Is this needed for DB and partitions as well? (I guess yes)

> Hive import fails when importing a table that is already imported without StorageDescriptor information
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-415
>                 URL: https://issues.apache.org/jira/browse/ATLAS-415
>             Project: Atlas
>          Issue Type: Bug
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>         Attachments: ATLAS-415.patch
>
>
> I found this when testing patches that integrate Storm with Atlas, but guess this may occur in other scenarios as well.
> To reproduce:
> * Run a storm topology with Atlas Hook enabled that has a HiveBolt (requires patches for ATLAS-181 and friends).
> * Run hive-import following the above.
> The first step creates a Hive DB and table setting just the required attributes. Note that the StorageDescriptor is an optional attribute as per the Hive DataModel now. 
> The second step fails with this exception:
> {code}
> Exception in thread "main" java.lang.NullPointerException
> 	at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.getSDForTable(HiveMetaStoreBridge.java:345)
> 	at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:219)
> 	at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:104)
> 	at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:96)
> 	at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:503)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)