You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2019/09/16 16:19:00 UTC

[jira] [Commented] (KUDU-2895) Native Apache Atlas Support

    [ https://issues.apache.org/jira/browse/KUDU-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930675#comment-16930675 ] 

Grant Henke commented on KUDU-2895:
-----------------------------------

Upon further investigation it looks like we should avoid the generic lineage file as the only way to report a lineage/audit event. Instead we could offer it as a default option, but build the feature with direct to Atlas client usage in mind. 

To do this we should leverage a Java subprocess service. This can be used by Ranger too.

Doing that makes this integration fairly straightforward. In all the places we do an authorization check, we can immediately call a "logAuditEvent" function right after to forward the information to the audit functionality. Then based on the Kudu/Server configuration it can either log to a file or forward to an Atlas integration. 

We can use the other Atlas integrations and models to help define our integration in the Atlas plugin. This will likely be very similar/derived from the Ranger model we define in the Ranger work. 
* https://github.com/apache/atlas/tree/master/addons/models/1000-Hadoop
* https://github.com/apache/atlas/tree/master/addons/impala-bridge/src/main/java/org/apache/atlas/impala 


> Native Apache Atlas Support
> ---------------------------
>
>                 Key: KUDU-2895
>                 URL: https://issues.apache.org/jira/browse/KUDU-2895
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: Grant Henke
>            Priority: Major
>              Labels: roadmap-candidate
>
> This tracks adding lineage support to Kudu and Apache Atlas. 
> A few notes based on some initial research:
>  * It probably makes sense to generate a generic lineage file which can be consumed by Apache Atlas for lineage.
>  ** This avoids the need for Java interaction in the server
>  ** This is the approach Impala uses
>  ** See ATLAS-3183 and [https://impala.apache.org/docs/build3x/html/topics/impala_lineage.html#lineage]
>  * Creating lineage entries for table "DDL" initially makes sense
>  ** CREATE, ALTER, DELETE
>  ** This is what Hbase seems to do: [https://atlas.apache.org/Hook-HBase.html]
>  ** "Only the namespace, table and column-family create/update/ delete operations are captured by Atlas HBase hook"
>  * The need for lineage information by scans in unclear
>  ** It would be super fine grained and difficult to interpret.
>  ** Instead lineage from other tools doing the scanning would be more interpretable (Impala, Spark, etc).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)