You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Eva Xiao (Jira)" <ji...@apache.org> on 2021/10/19 02:55:00 UTC

[jira] [Commented] (ATLAS-3655) Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations

    [ https://issues.apache.org/jira/browse/ATLAS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430291#comment-17430291 ] 

Eva Xiao commented on ATLAS-3655:
---------------------------------

Hi Vladislav, I'm seeing this change is merged to Atlas 2.2, but I don't see corresponding change happen from Spark Atlas Connector repository, I'm wondering how did you get the screenshots with this new model? Did you make change to the Spark Atlas Connector yourself or is there any undergoing work on that project which is not public yet?

> Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ATLAS-3655
>                 URL: https://issues.apache.org/jira/browse/ATLAS-3655
>             Project: Atlas
>          Issue Type: Task
>            Reporter: Vladislav Glinskiy
>            Priority: Major
>             Fix For: 2.1.0, 3.0.0
>
>         Attachments: Screenshot from 2020-03-03 16-09-39.png
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations. Currently, Spark Atlas Connector uses 'spark_process' as a top-level type for a Spark session, thus it's being updated for multiple operations within the same session.
> The following statements:
> {code:java}
> spark.sql("create table table_1(col1 int,col2 string)");
> spark.sql("create table table_2 as select * from table_1");
> {code}
> result in the next correct lineage:
> table1 ------> spark_process1 -------> table2
> but executing similar statements in the same spark session:
> {code:java}
> spark.sql("create table table_3(col1 int,col2 string)"); 
> spark.sql("create table table_4 as select * from table_3");
> {code}
> result in the same 'spark_process' being updated and the lineage now connects all the 4 tables(see screenshot in the attachments).
>  
> The proposal is to create a 'spark_application' entity and associate all 'spark_process' entities (created within that session) to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)