You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Shwetha G S (JIRA)" <ji...@apache.org> on 2015/09/15 08:51:45 UTC

[jira] [Comment Edited] (ATLAS-58) Make hive hook reliable

    [ https://issues.apache.org/jira/browse/ATLAS-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744923#comment-14744923 ] 

Shwetha G S edited comment on ATLAS-58 at 9/15/15 6:51 AM:
-----------------------------------------------------------

Hive hook sends notification messages (list of entities). The Notification consumer on server side consumes these messages and registers the entities. The server handles de-duping of entities based on the unique attribute of the entity

Big changes:
1. Concept of service that are started and stopped at atlas start and stop
2. De-duping of entities on server based on any unique attribute for the entity. If entity doesn't have any unique attribute, de-duping is not done and new entity is created
3. Changed entity submit API to take list of entities instead of just 1 entity (required for hive hook)
4. Moved security tests from integration tests to unit tests - as they were creating issues with server start as jetty already starts another server for integration tests
5. Removed some duplicate tests from repository module (the same tests exist in typesystem module as well)
6. In webapp ITs, re-used the types defined
7. Hive hook now sends notifications instead of registering entities. Sending notification is done synchronously. So, this adds to hive command execution delay. But this also makes it reliable

Pending:
1. Entity updates like alter table commands are not handlded. Will create another jira for this
2. Webapp jetty plugin doesn't shutdown embedded kafka at the end of integration tests. So, hive bridge ITs fail. Hive bridge ITs pass if run on their own. Still checking on this


was (Author: shwethags):
Hive hook sends notification messages (list of entities). The Notification consumer on server side consumes these messages and registers the entities. The server handles de-duping of entities based on the unique attribute of the entity

Big changes:
1. Concept of service that are started started and stopped at atlas start and stop
2. De-duping of entities on server based on any unique attribute for the entity. If entity doesn't have any unique attribute, de-duping is not done and new entity is created
3. Changed entity submit API to take list of entities instead of just 1 entity (required for hive hook)
4. Moved security tests from integration tests to unit tests - as they were creating issues with server start as jetty already starts another server for integration tests
5. Removed some duplicate tests from repository module (the same tests exist in typesystem module as well)
6. In webapp ITs, re-used the types defined
7. Hive hook now sends notifications instead of registering entities. Sending notification is done synchronously. So, this adds to hive command execution delay. But this also makes it reliable

Pending:
1. Entity updates like alter table commands are not handlded. Will create another jira for this
2. Webapp jetty plugin doesn't shutdown embedded kafka at the end of integration tests. So, hive bridge ITs fail. Hive bridge ITs pass if run on their own. Still checking on this

> Make hive hook reliable
> -----------------------
>
>                 Key: ATLAS-58
>                 URL: https://issues.apache.org/jira/browse/ATLAS-58
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Shwetha G S
>            Assignee: Shwetha G S
>              Labels: incompatible
>             Fix For: trunk
>
>         Attachments: ATLAS-58-v2.patch, ATLAS-58.patch
>
>
> Currently, hive hook executes in background thread pool and is an best effort approach to register entities. But this needs to be reliable for data governance to be effective
> One way is - in hive hook, add the entities to some messaging framework and atlas server can read the entities from the message and register in atlas. Since, posting message is faster, we can do it synchronously and hence reliable entity registration.
> We can start with kafka for messaging, but any other messaging framework should be pluggable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)