You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "iodone (via GitHub)" <gi...@apache.org> on 2023/06/08 08:34:24 UTC

[GitHub] [kyuubi] iodone opened a new pull request, #4932: [KYUUBI #4925] Add default catalog using spark_catalog with the lineage result

iodone opened a new pull request, #4932:
URL: https://github.com/apache/kyuubi/pull/4932

   <!--
   Thanks for sending a pull request!
   
   Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/CONTRIBUTING.html
     2. If the PR is related to an issue in https://github.com/apache/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
   -->
   
   ### _Why are the changes needed?_
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you add a feature, you can talk about the use case of it.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   close #4925 
   
   ### _How was this patch tested?_
   - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
   
   - [ ] Add screenshots for manual tests if appropriate
   
   - [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] wForget commented on a diff in pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "wForget (via GitHub)" <gi...@apache.org>.
wForget commented on code in PR #4932:
URL: https://github.com/apache/kyuubi/pull/4932#discussion_r1246414244


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala:
##########
@@ -104,33 +104,38 @@ object AtlasEntityHelper {
   }
 
   def tableObjectId(tableName: String): Option[AtlasObjectId] = {
-    val dbTb = tableName.split('.')
-    if (dbTb.length == 2) {
-      val qualifiedName = tableQualifiedName(cluster, dbTb(0), dbTb(1))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    tableName.split('.') match {
+      case Array(catalog, db, table) =>

Review Comment:
   The entity qualifiedName will be used as a unique key to associate with the entity created by hive hook, so they need to be consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] wForget commented on a diff in pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "wForget (via GitHub)" <gi...@apache.org>.
wForget commented on code in PR #4932:
URL: https://github.com/apache/kyuubi/pull/4932#discussion_r1246416570


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala:
##########
@@ -104,33 +104,38 @@ object AtlasEntityHelper {
   }
 
   def tableObjectId(tableName: String): Option[AtlasObjectId] = {
-    val dbTb = tableName.split('.')
-    if (dbTb.length == 2) {
-      val qualifiedName = tableQualifiedName(cluster, dbTb(0), dbTb(1))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    tableName.split('.') match {
+      case Array(catalog, db, table) =>

Review Comment:
   I think the catalog will only be used to judge the data source type and not be part of the qualifiedName.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] wForget commented on a diff in pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "wForget (via GitHub)" <gi...@apache.org>.
wForget commented on code in PR #4932:
URL: https://github.com/apache/kyuubi/pull/4932#discussion_r1226057327


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala:
##########
@@ -104,33 +104,38 @@ object AtlasEntityHelper {
   }
 
   def tableObjectId(tableName: String): Option[AtlasObjectId] = {
-    val dbTb = tableName.split('.')
-    if (dbTb.length == 2) {
-      val qualifiedName = tableQualifiedName(cluster, dbTb(0), dbTb(1))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    tableName.split('.') match {
+      case Array(catalog, db, table) =>

Review Comment:
   we can only handle the default `spark_catalog` catalog.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] ulysses-you commented on a diff in pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "ulysses-you (via GitHub)" <gi...@apache.org>.
ulysses-you commented on code in PR #4932:
URL: https://github.com/apache/kyuubi/pull/4932#discussion_r1226014114


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -476,6 +478,27 @@ trait LineageParser {
   }
 
   private def getQuery(plan: LogicalPlan): LogicalPlan = getField[LogicalPlan](plan, "query")
+
+  private def getV2TableName(plan: NamedRelation): String = {
+    plan match {
+      case relation: DataSourceV2ScanRelation =>
+        val catalog = relation.relation.catalog.get.name()
+        val database = relation.relation.identifier.get.namespace().mkString(".")
+        val table = relation.relation.identifier.get.name()
+        s"$catalog.$database.$table"
+      case relation: DataSourceV2Relation =>
+        val catalog = relation.catalog.get.name()
+        val database = relation.identifier.get.namespace().mkString(".")
+        val table = relation.identifier.get.name()
+        s"$catalog.$database.$table"
+      case _ =>
+        plan.name
+    }
+  }
+
+  private def getV1TableName(tableName: String): String = {
+    Seq(LineageConf.DEFAULT_CATALOG, tableName).filter(_.nonEmpty).mkString(".")

Review Comment:
   ```suggestion
     private def getV1TableName(qualifiedName: String): String = {
       Seq(LineageConf.DEFAULT_CATALOG, qualifiedName).filter(_.nonEmpty).mkString(".")
   ```



##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParseHelper.scala:
##########
@@ -476,6 +478,27 @@ trait LineageParser {
   }
 
   private def getQuery(plan: LogicalPlan): LogicalPlan = getField[LogicalPlan](plan, "query")
+
+  private def getV2TableName(plan: NamedRelation): String = {
+    plan match {
+      case relation: DataSourceV2ScanRelation =>
+        val catalog = relation.relation.catalog.get.name()

Review Comment:
   Can `catalog` be none ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] codecov-commenter commented on pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #4932:
URL: https://github.com/apache/kyuubi/pull/4932#issuecomment-1585532750

   ## [Codecov](https://app.codecov.io/gh/apache/kyuubi/pull/4932?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#4932](https://app.codecov.io/gh/apache/kyuubi/pull/4932?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (ce24e3c) into [master](https://app.codecov.io/gh/apache/kyuubi/commit/5f98539c8263b2e413055a2e514b2faaab88fb02?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (5f98539) will **not change** coverage.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@          Coverage Diff           @@
   ##           master   #4932   +/-   ##
   ======================================
     Coverage    0.00%   0.00%           
   ======================================
     Files         563     563           
     Lines       30891   30907   +16     
     Branches     4030    4032    +2     
   ======================================
   - Misses      30891   30907   +16     
   ```
   
   
   | [Impacted Files](https://app.codecov.io/gh/apache/kyuubi/pull/4932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...n/lineage/dispatcher/atlas/AtlasEntityHelper.scala](https://app.codecov.io/gh/apache/kyuubi/pull/4932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-ZXh0ZW5zaW9ucy9zcGFyay9reXV1Ymktc3BhcmstbGluZWFnZS9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9wbHVnaW4vbGluZWFnZS9kaXNwYXRjaGVyL2F0bGFzL0F0bGFzRW50aXR5SGVscGVyLnNjYWxh) | `0.00% <0.00%> (ø)` | |
   | [...in/lineage/helper/SparkSQLLineageParseHelper.scala](https://app.codecov.io/gh/apache/kyuubi/pull/4932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-ZXh0ZW5zaW9ucy9zcGFyay9reXV1Ymktc3BhcmstbGluZWFnZS9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9wbHVnaW4vbGluZWFnZS9oZWxwZXIvU3BhcmtTUUxMaW5lYWdlUGFyc2VIZWxwZXIuc2NhbGE=) | `0.00% <0.00%> (ø)` | |
   | [.../org/apache/spark/kyuubi/lineage/LineageConf.scala](https://app.codecov.io/gh/apache/kyuubi/pull/4932?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-ZXh0ZW5zaW9ucy9zcGFyay9reXV1Ymktc3BhcmstbGluZWFnZS9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL3NwYXJrL2t5dXViaS9saW5lYWdlL0xpbmVhZ2VDb25mLnNjYWxh) | `0.00% <0.00%> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] ulysses-you closed pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "ulysses-you (via GitHub)" <gi...@apache.org>.
ulysses-you closed pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result
URL: https://github.com/apache/kyuubi/pull/4932


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] iodone commented on a diff in pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "iodone (via GitHub)" <gi...@apache.org>.
iodone commented on code in PR #4932:
URL: https://github.com/apache/kyuubi/pull/4932#discussion_r1246392277


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala:
##########
@@ -104,33 +104,38 @@ object AtlasEntityHelper {
   }
 
   def tableObjectId(tableName: String): Option[AtlasObjectId] = {
-    val dbTb = tableName.split('.')
-    if (dbTb.length == 2) {
-      val qualifiedName = tableQualifiedName(cluster, dbTb(0), dbTb(1))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    tableName.split('.') match {
+      case Array(catalog, db, table) =>

Review Comment:
   The implementation here is to merge the catalog into the table name or column name, and the result will be in the form of a string. Does that matter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] ulysses-you commented on pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "ulysses-you (via GitHub)" <gi...@apache.org>.
ulysses-you commented on PR #4932:
URL: https://github.com/apache/kyuubi/pull/4932#issuecomment-1671069886

   thanks, merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] wForget commented on a diff in pull request #4932: [KYUUBI #4925] Add default catalog using `spark_catalog` with the lineage result

Posted by "wForget (via GitHub)" <gi...@apache.org>.
wForget commented on code in PR #4932:
URL: https://github.com/apache/kyuubi/pull/4932#discussion_r1226054836


##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala:
##########
@@ -104,33 +104,38 @@ object AtlasEntityHelper {
   }
 
   def tableObjectId(tableName: String): Option[AtlasObjectId] = {
-    val dbTb = tableName.split('.')
-    if (dbTb.length == 2) {
-      val qualifiedName = tableQualifiedName(cluster, dbTb(0), dbTb(1))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    tableName.split('.') match {
+      case Array(catalog, db, table) =>
+        val qualifiedName = tableQualifiedName(cluster, catalog, db, table)
+        // TODO parse datasource type
+        Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
+      case _ =>
+        None
     }
   }
 
-  def tableQualifiedName(cluster: String, db: String, table: String): String = {
-    s"${db.toLowerCase}.${table.toLowerCase}@$cluster"
+  def tableQualifiedName(cluster: String, catalog: String, db: String, table: String): String = {
+    s"${catalog.toLowerCase}.${db.toLowerCase}.${table.toLowerCase}@$cluster"
   }
 
   def columnObjectId(columnName: String): Option[AtlasObjectId] = {
-    val dbTbCol = columnName.split('.')
-    if (dbTbCol.length == 3) {
-      val qualifiedName = columnQualifiedName(cluster, dbTbCol(0), dbTbCol(1), dbTbCol(2))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_COLUMN_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    columnName.split('.') match {
+      case Array(catalog, db, table, column) =>

Review Comment:
   ditto



##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala:
##########
@@ -104,33 +104,38 @@ object AtlasEntityHelper {
   }
 
   def tableObjectId(tableName: String): Option[AtlasObjectId] = {
-    val dbTb = tableName.split('.')
-    if (dbTb.length == 2) {
-      val qualifiedName = tableQualifiedName(cluster, dbTb(0), dbTb(1))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    tableName.split('.') match {
+      case Array(catalog, db, table) =>
+        val qualifiedName = tableQualifiedName(cluster, catalog, db, table)
+        // TODO parse datasource type
+        Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
+      case _ =>
+        None
     }
   }
 
-  def tableQualifiedName(cluster: String, db: String, table: String): String = {
-    s"${db.toLowerCase}.${table.toLowerCase}@$cluster"
+  def tableQualifiedName(cluster: String, catalog: String, db: String, table: String): String = {
+    s"${catalog.toLowerCase}.${db.toLowerCase}.${table.toLowerCase}@$cluster"
   }
 
   def columnObjectId(columnName: String): Option[AtlasObjectId] = {
-    val dbTbCol = columnName.split('.')
-    if (dbTbCol.length == 3) {
-      val qualifiedName = columnQualifiedName(cluster, dbTbCol(0), dbTbCol(1), dbTbCol(2))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_COLUMN_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    columnName.split('.') match {
+      case Array(catalog, db, table, column) =>
+        val qualifiedName = columnQualifiedName(cluster, catalog, db, table, column)
+        // TODO parse datasource type
+        Some(new AtlasObjectId(HIVE_COLUMN_TYPE, "qualifiedName", qualifiedName))
+      case _ =>
+        None
     }
   }
 
-  def columnQualifiedName(cluster: String, db: String, table: String, column: String): String = {
-    s"${db.toLowerCase}.${table.toLowerCase}.${column.toLowerCase}@$cluster"
+  def columnQualifiedName(
+      cluster: String,
+      catalog: String,
+      db: String,
+      table: String,
+      column: String): String = {
+    s"${catalog.toLowerCase}.${db.toLowerCase}.${table.toLowerCase}.${column.toLowerCase}@$cluster"

Review Comment:
   ditto



##########
extensions/spark/kyuubi-spark-lineage/src/main/scala/org/apache/kyuubi/plugin/lineage/dispatcher/atlas/AtlasEntityHelper.scala:
##########
@@ -104,33 +104,38 @@ object AtlasEntityHelper {
   }
 
   def tableObjectId(tableName: String): Option[AtlasObjectId] = {
-    val dbTb = tableName.split('.')
-    if (dbTb.length == 2) {
-      val qualifiedName = tableQualifiedName(cluster, dbTb(0), dbTb(1))
-      // TODO parse datasource type
-      Some(new AtlasObjectId(HIVE_TABLE_TYPE, "qualifiedName", qualifiedName))
-    } else {
-      None
+    tableName.split('.') match {
+      case Array(catalog, db, table) =>

Review Comment:
   We should ignore the catalog for Atlas, because the Hive Hook in Atlas also does not specify a catalog:
   
   https://github.com/apache/atlas/blob/b7849aeb78856f2f8e205c5e784cee84409cbc72/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java#L884



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org