You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "iodone (via GitHub)" <gi...@apache.org> on 2023/04/11 11:59:15 UTC

[GitHub] [kyuubi] iodone opened a new issue, #4693: [Improvement] Enhanced the table lineage for input tables

iodone opened a new issue, #4693:
URL: https://github.com/apache/kyuubi/issues/4693

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What would you like to be improved?
   
   ```
         spark.sql("CREATE TABLE t1 (a string, b string) USING hive")
         spark.sql("CREATE TABLE t2 (a string, b string) USING hive")
         val ret0 = exectractLineage("select t1.a from t1 where t1.b in (select b from t2)")
         assert(ret0 == Lineage(
           List("default.t1", "default.t2"),
           List(),
           List(("a", Set("default.t1.a")))))
   
         val ret1 = exectractLineage("select t1.a from t1 join t2")
         assert(ret1 == Lineage(
           List("default.t1", "default.t2"),
           List(),
           List(("a", Set("default.t1.a")))))
   ```
   In actual scenarios, it is necessary to display all input tables, even if a table may not contribute to the output columns.
   
   
   ### How should we improve?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] wsk1314zwr commented on issue #4693: [Improvement] Enhanced the table lineage for input tables

Posted by "wsk1314zwr (via GitHub)" <gi...@apache.org>.
wsk1314zwr commented on issue #4693:
URL: https://github.com/apache/kyuubi/issues/4693#issuecomment-1562992318

   
   I think it is necessary to enhance the table lineage for input tables,I have the following SQL scenario
   ```sql
   insert overwrite table dev_datalineage.test_where_field
   select 
          a.field1,
          a.field2,
          a.field3
   from dev_datalineage.join_table_a a 
   JOIN dev_datalineage.join_table_b b on a.field1 = b.field1 
   JOIN dev_datalineage.join_table_c c on a.field1 = c.field1 
   where c.field2='2';
   ```
   The lineage information parsed by the **hive lineage hook**:
   ```json
   {
   	"version": "1.0.0-0526",
   	"sqlType": "HiveSQL",
   	"collectTime": "1685021684190",
   	"operationName": "QUERY",
   	"vertices": [{
   		"id": 0,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.test_where_field.field1"
   	}, {
   		"id": 1,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.test_where_field.field2"
   	}, {
   		"id": 2,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.test_where_field.field3"
   	}, {
   		"id": 3,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.join_table_a.field1"
   	}, {
   		"id": 4,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.join_table_a.field2"
   	}, {
   		"id": 5,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.join_table_a.field3"
   	}, {
   		"id": 6,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.join_table_b.field1"
   	}, {
   		"id": 7,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.join_table_c.field1"
   	}, {
   		"id": 8,
   		"vertexType": "COLUMN",
   		"vertexId": "dev_datalineage.join_table_c.field2"
   	}],
   	"edges": [{
   		"sources": [3],
   		"targets": [0],
   		"edgeType": "PROJECTION"
   	}, {
   		"sources": [4],
   		"targets": [1],
   		"edgeType": "PROJECTION"
   	}, {
   		"sources": [5],
   		"targets": [2],
   		"edgeType": "PROJECTION"
   	}, {
   		"sources": [3, 6, 7],
   		"targets": [0, 1, 2],
   		"expression": "(a.field1 = b.field1 AND a.field1 = c.field1)",
   		"edgeType": "PREDICATE"
   	}, {
   		"sources": [8],
   		"targets": [0, 1, 2],
   		"expression": "(c.field2 = '2')",
   		"edgeType": "PREDICATE"
   	}],
   }
   ```
   The table level lineage of **join_table_b, join_table_b, and test_where_field** can be obtained from the lineage information of hive hook,Even if they do not have field level lineage, but the current kyuubi lineage plugin is not feasible, the table level lineage parsed by hive hook is more complete,More perfect table level lineage can avoid misjudgment of no downstream output table in the data governance process.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] wsk1314zwr commented on issue #4693: [Improvement] Enhanced the table lineage for input tables

Posted by "wsk1314zwr (via GitHub)" <gi...@apache.org>.
wsk1314zwr commented on issue #4693:
URL: https://github.com/apache/kyuubi/issues/4693#issuecomment-1562912248

   I think it is necessary to enhance the table lineage for input tables,I encountered the following SQL scenario.
   `
   insert overwrite table dev_datalineage.test_where_field
   select 
          a.field1,
          a.field2,
          a.field3
   from dev_datalineage.join_table_a a 
   JOIN dev_datalineage.join_table_b b on a.field1 = b.field1 
   JOIN dev_datalineage.join_table_c c on a.field1 = c.field1 
   where c.field2='2';
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org