You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "iodone (via GitHub)" <gi...@apache.org> on 2023/04/11 11:59:15 UTC
[GitHub] [kyuubi] iodone opened a new issue, #4693: [Improvement] Enhanced the table lineage for input tables
iodone opened a new issue, #4693:
URL: https://github.com/apache/kyuubi/issues/4693
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
### Search before asking
- [X] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues.
### What would you like to be improved?
```
spark.sql("CREATE TABLE t1 (a string, b string) USING hive")
spark.sql("CREATE TABLE t2 (a string, b string) USING hive")
val ret0 = exectractLineage("select t1.a from t1 where t1.b in (select b from t2)")
assert(ret0 == Lineage(
List("default.t1", "default.t2"),
List(),
List(("a", Set("default.t1.a")))))
val ret1 = exectractLineage("select t1.a from t1 join t2")
assert(ret1 == Lineage(
List("default.t1", "default.t2"),
List(),
List(("a", Set("default.t1.a")))))
```
In actual scenarios, it is necessary to display all input tables, even if a table may not contribute to the output columns.
### How should we improve?
_No response_
### Are you willing to submit PR?
- [X] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
- [ ] No. I cannot submit a PR at this time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] wsk1314zwr commented on issue #4693: [Improvement] Enhanced the table lineage for input tables
Posted by "wsk1314zwr (via GitHub)" <gi...@apache.org>.
wsk1314zwr commented on issue #4693:
URL: https://github.com/apache/kyuubi/issues/4693#issuecomment-1562992318
I think it is necessary to enhance the table lineage for input tables,I have the following SQL scenario
```sql
insert overwrite table dev_datalineage.test_where_field
select
a.field1,
a.field2,
a.field3
from dev_datalineage.join_table_a a
JOIN dev_datalineage.join_table_b b on a.field1 = b.field1
JOIN dev_datalineage.join_table_c c on a.field1 = c.field1
where c.field2='2';
```
The lineage information parsed by the **hive lineage hook**:
```json
{
"version": "1.0.0-0526",
"sqlType": "HiveSQL",
"collectTime": "1685021684190",
"operationName": "QUERY",
"vertices": [{
"id": 0,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.test_where_field.field1"
}, {
"id": 1,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.test_where_field.field2"
}, {
"id": 2,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.test_where_field.field3"
}, {
"id": 3,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_a.field1"
}, {
"id": 4,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_a.field2"
}, {
"id": 5,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_a.field3"
}, {
"id": 6,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_b.field1"
}, {
"id": 7,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_c.field1"
}, {
"id": 8,
"vertexType": "COLUMN",
"vertexId": "dev_datalineage.join_table_c.field2"
}],
"edges": [{
"sources": [3],
"targets": [0],
"edgeType": "PROJECTION"
}, {
"sources": [4],
"targets": [1],
"edgeType": "PROJECTION"
}, {
"sources": [5],
"targets": [2],
"edgeType": "PROJECTION"
}, {
"sources": [3, 6, 7],
"targets": [0, 1, 2],
"expression": "(a.field1 = b.field1 AND a.field1 = c.field1)",
"edgeType": "PREDICATE"
}, {
"sources": [8],
"targets": [0, 1, 2],
"expression": "(c.field2 = '2')",
"edgeType": "PREDICATE"
}],
}
```
The table level lineage of **join_table_b, join_table_b, and test_where_field** can be obtained from the lineage information of hive hook,Even if they do not have field level lineage, but the current kyuubi lineage plugin is not feasible, the table level lineage parsed by hive hook is more complete,More perfect table level lineage can avoid misjudgment of no downstream output table in the data governance process.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] wsk1314zwr commented on issue #4693: [Improvement] Enhanced the table lineage for input tables
Posted by "wsk1314zwr (via GitHub)" <gi...@apache.org>.
wsk1314zwr commented on issue #4693:
URL: https://github.com/apache/kyuubi/issues/4693#issuecomment-1562912248
I think it is necessary to enhance the table lineage for input tables,I encountered the following SQL scenario.
`
insert overwrite table dev_datalineage.test_where_field
select
a.field1,
a.field2,
a.field3
from dev_datalineage.join_table_a a
JOIN dev_datalineage.join_table_b b on a.field1 = b.field1
JOIN dev_datalineage.join_table_c c on a.field1 = c.field1
where c.field2='2';
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org