You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by GitBox <gi...@apache.org> on 2020/03/11 21:36:42 UTC

[GitHub] [atlas] vladhlinsky opened a new pull request #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition

vladhlinsky opened a new pull request #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition
URL: https://github.com/apache/atlas/pull/93
 
 
   ## What changes were proposed in this pull request?
   
   Create `spark_column_lineage` type and relationship definition to add support of column level lineage for `CREATE TABLE AS SELECT ...` statements and views. Column level lineage refers to lineage created between the input and output columns.
   For example:
   ```
   hive > create table employee_ctas as select id from employee;
   ```    
   For the above query, lineage is created from `employee` to `employee_ctas`, and also from `employee.id` to `employee_ctas.id`.
   
   ## How was this patch tested?
   
   Manually using modified version of Spark Atlas Connector:
   - Installed and started Atlas.
   - `1100-spark_model.json` is updated with proposed changes. Atlas is restarted.
   - Executed the next statements using spark-shell:
   
   ```
   spark.sql("create table sparkemployee_1_2(id int,name string)");
   spark.sql("create table sparkemployee_ctas_1_2 as select id from sparkemployee_1_2");
   ```
   - Verified that each table has column entities and `spark_column_lineage` entity is created.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [atlas] nixonrodrigues merged pull request #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition

Posted by GitBox <gi...@apache.org>.
nixonrodrigues merged pull request #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition
URL: https://github.com/apache/atlas/pull/93
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [atlas] vladhlinsky commented on issue #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition

Posted by GitBox <gi...@apache.org>.
vladhlinsky commented on issue #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition
URL: https://github.com/apache/atlas/pull/93#issuecomment-597892124
 
 
   Attaching screenshots.
   ![Screenshot from 2020-03-11 23-39-45](https://user-images.githubusercontent.com/61428392/76467148-a386a580-63f1-11ea-9df0-9507e015c7a6.png)
   ![Screenshot from 2020-03-10 21-51-16](https://user-images.githubusercontent.com/61428392/76467046-66baae80-63f1-11ea-877c-cd64b874421d.png)
   ![Screenshot from 2020-03-10 21-51-26](https://user-images.githubusercontent.com/61428392/76467051-69b59f00-63f1-11ea-8d22-f147c066a9e6.png)
   ![Screenshot from 2020-03-10 21-51-47](https://user-images.githubusercontent.com/61428392/76467056-6de1bc80-63f1-11ea-8afd-7ce905b93797.png)
   ![Screenshot from 2020-03-11 23-40-34](https://user-images.githubusercontent.com/61428392/76467202-bf8a4700-63f1-11ea-88c9-8320901a41b3.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [atlas] nixonrodrigues commented on issue #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition

Posted by GitBox <gi...@apache.org>.
nixonrodrigues commented on issue #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition
URL: https://github.com/apache/atlas/pull/93#issuecomment-599377859
 
 
   +1 for PR, thanks @vladhlinsky for PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [atlas] vladhlinsky commented on issue #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition

Posted by GitBox <gi...@apache.org>.
vladhlinsky commented on issue #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition
URL: https://github.com/apache/atlas/pull/93#issuecomment-597892210
 
 
   cc @HeartSaVioR @sarathsubramanian

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services