You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@atlas.apache.org by "Vladislav Glinskiy (Jira)" <ji...@apache.org> on 2020/03/05 19:48:00 UTC

[jira] [Commented] (ATLAS-3655) Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations

    [ https://issues.apache.org/jira/browse/ATLAS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052471#comment-17052471 ] 

Vladislav Glinskiy commented on ATLAS-3655:
-------------------------------------------

cc [~kabhwan] [~sarath] 

> Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-3655
>                 URL: https://issues.apache.org/jira/browse/ATLAS-3655
>             Project: Atlas
>          Issue Type: Task
>            Reporter: Vladislav Glinskiy
>            Priority: Major
>             Fix For: 2.1.0, 3.0.0
>
>         Attachments: Screenshot from 2020-03-03 16-09-39.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations. Currently, Spark Atlas Connector uses 'spark_process' as a top-level type for a Spark session, thus it's being updated for multiple operations within the same session.
> The following statements:
> {code:java}
> spark.sql("create table table_1(col1 int,col2 string)");
> spark.sql("create table table_2 as select * from table_1");
> {code}
> result in the next correct lineage:
> table1 ------> spark_process1 -------> table2
> but executing similar statements in the same spark session:
> {code:java}
> spark.sql("create table table_3(col1 int,col2 string)"); 
> spark.sql("create table table_4 as select * from table_3");
> {code}
> result in the same 'spark_process' being updated and the lineage now connects all the 4 tables(see screenshot in the attachments).
>  
> The proposal is to create a 'spark_application' entity and associate all 'spark_process' entities (created within that session) to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)