You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by GitBox <gi...@apache.org> on 2022/12/27 09:10:36 UTC

[GitHub] [kyuubi] iodone opened a new pull request, #4033: [KYUUBY #3978] add `DatasourceV2` command support

iodone opened a new pull request, #4033:
URL: https://github.com/apache/kyuubi/pull/4033

   <!--
   Thanks for sending a pull request!
   
   Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/CONTRIBUTING.html
     2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
   -->
   close #3978 
   ### _Why are the changes needed?_
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you add a feature, you can talk about the use case of it.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### _How was this patch tested?_
   - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
   
   - [ ] Add screenshots for manual tests if appropriate
   
   - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] ulysses-you commented on pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on PR #4033:
URL: https://github.com/apache/kyuubi/pull/4033#issuecomment-1367826109

   thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] iodone commented on a diff in pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
iodone commented on code in PR #4033:
URL: https://github.com/apache/kyuubi/pull/4033#discussion_r1058893457


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -323,6 +359,10 @@ trait LineageParser {
         val tableName = p.name
         joinRelationColumnLineage(parentColumnsLineage, p.output, Seq(tableName))
 
+      case p: DataSourceV2Relation =>

Review Comment:
   This scenario occurs when creating a view using the v2 table, because it is not going to read it, here corresponding to the `relation` is `DatasourceV2Relation`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] iodone commented on a diff in pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
iodone commented on code in PR #4033:
URL: https://github.com/apache/kyuubi/pull/4033#discussion_r1058895980


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -280,6 +284,38 @@ trait LineageParser {
       case p if p.nodeName == "SaveIntoDataSourceCommand" =>
         extractColumnsLineage(getQuery(plan), parentColumnsLineage)
 
+      case p
+          if p.nodeName == "AppendData"
+            || p.nodeName == "OverwriteByExpression"
+            || p.nodeName == "OverwritePartitionsDynamic" =>
+        val table = getPlanField[NamedRelation]("table", plan).name
+        extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
+          k.withName(s"$table.${k.name}") -> v
+        }
+
+      case p if p.nodeName == "MergeIntoTable" =>

Review Comment:
   I think `DeleteFromTable` corresponding to the lineage is a bit ambiguous and not considered for implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] ulysses-you commented on a diff in pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #4033:
URL: https://github.com/apache/kyuubi/pull/4033#discussion_r1058800004


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -280,6 +284,38 @@ trait LineageParser {
       case p if p.nodeName == "SaveIntoDataSourceCommand" =>
         extractColumnsLineage(getQuery(plan), parentColumnsLineage)
 
+      case p
+          if p.nodeName == "AppendData"
+            || p.nodeName == "OverwriteByExpression"
+            || p.nodeName == "OverwritePartitionsDynamic" =>
+        val table = getPlanField[NamedRelation]("table", plan).name
+        extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
+          k.withName(s"$table.${k.name}") -> v
+        }
+
+      case p if p.nodeName == "MergeIntoTable" =>

Review Comment:
   shall we handle `DeleteFromTable` ?



##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -323,6 +359,10 @@ trait LineageParser {
         val tableName = p.name
         joinRelationColumnLineage(parentColumnsLineage, p.output, Seq(tableName))
 
+      case p: DataSourceV2Relation =>

Review Comment:
   we are pasring optimized plan, it seems we can never get DataSourceV2Relation ?  it would be always converted DataSourceV2ScanRelation



##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -400,6 +440,7 @@ case class SparkSQLLineageParseHelper(sparkSession: SparkSession) extends Lineag
     Try(parse(plan)).recover {
       case e: Exception =>
         logWarning(s"Extract Statement[$executionId] columns lineage failed.", e)
+        e.printStackTrace

Review Comment:
   unnecessary change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] iodone commented on pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
iodone commented on PR #4033:
URL: https://github.com/apache/kyuubi/pull/4033#issuecomment-1367131163

   cc @ulysses-you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] ulysses-you closed pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
ulysses-you closed pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support
URL: https://github.com/apache/kyuubi/pull/4033


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] iodone commented on a diff in pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
iodone commented on code in PR #4033:
URL: https://github.com/apache/kyuubi/pull/4033#discussion_r1059289778


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -280,6 +284,38 @@ trait LineageParser {
       case p if p.nodeName == "SaveIntoDataSourceCommand" =>
         extractColumnsLineage(getQuery(plan), parentColumnsLineage)
 
+      case p
+          if p.nodeName == "AppendData"
+            || p.nodeName == "OverwriteByExpression"
+            || p.nodeName == "OverwritePartitionsDynamic" =>
+        val table = getPlanField[NamedRelation]("table", plan).name
+        extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
+          k.withName(s"$table.${k.name}") -> v
+        }
+
+      case p if p.nodeName == "MergeIntoTable" =>

Review Comment:
   OK, I create the #4049 to collect some suggestions in real business scenarios.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] ulysses-you commented on a diff in pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #4033:
URL: https://github.com/apache/kyuubi/pull/4033#discussion_r1059205439


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -280,6 +284,38 @@ trait LineageParser {
       case p if p.nodeName == "SaveIntoDataSourceCommand" =>
         extractColumnsLineage(getQuery(plan), parentColumnsLineage)
 
+      case p
+          if p.nodeName == "AppendData"
+            || p.nodeName == "OverwriteByExpression"
+            || p.nodeName == "OverwritePartitionsDynamic" =>
+        val table = getPlanField[NamedRelation]("table", plan).name
+        extractColumnsLineage(getQuery(plan), parentColumnsLineage).map { case (k, v) =>
+          k.withName(s"$table.${k.name}") -> v
+        }
+
+      case p if p.nodeName == "MergeIntoTable" =>

Review Comment:
   May be we can improve the semantics of delete operation together. e.g. `drop database`, `drop table`.. Do you think it is valuable ? we can create a issue first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] ulysses-you commented on a diff in pull request #4033: [KYUUBY #3978] add `DatasourceV2` command support

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #4033:
URL: https://github.com/apache/kyuubi/pull/4033#discussion_r1059204528


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -323,6 +359,10 @@ trait LineageParser {
         val tableName = p.name
         joinRelationColumnLineage(parentColumnsLineage, p.output, Seq(tableName))
 
+      case p: DataSourceV2Relation =>

Review Comment:
   I see. can we add some comments here, it can help recall in future



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org