You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Ashutosh Mestry (Jira)" <ji...@apache.org> on 2021/03/15 18:13:00 UTC

[jira] [Updated] (ATLAS-4204) Hive Hook: Improve HS2 Message Sending

     [ https://issues.apache.org/jira/browse/ATLAS-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Mestry updated ATLAS-4204:
-----------------------------------
    Component/s: hive-integration

> Hive Hook: Improve HS2 Message Sending
> --------------------------------------
>
>                 Key: ATLAS-4204
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4204
>             Project: Atlas
>          Issue Type: Improvement
>          Components: hive-integration
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>            Priority: Major
>
> *Background*
> HiveServer2 hook for Atlas sends notification message for both metadata (DDL operations) and lineage (DML operations).
> Hive Metastore (HMS) hook already sends metadata information to Atlas. These messages are all DDL operations.
> So duplicate messages about object updates are sent to Atlas.
> Atlas processes these messages like any other.
> This is additional processing time and increased volume. There is also a potential of incorrect data being updated within Atlas if the sequence of messages from HMS and HS2 gets changed.
> *Solution*
> This improvement will  send only lineage messages from HS2 hook. All the DDL (schema definition) messages will continue be sent from HMS hook (no change here).
> This will also reduce the volume of messages sent to Atlas from hive server2 and will help improve performance by avoiding processing duplicate messages.
> The improvement can be used via a configuration parameter. That way existing behavior continues as is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)