You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/03 12:07:20 UTC

[GitHub] [spark] chenjunbiao001 opened a new pull request #35721: Supports customization of the entire optimizer and planner

chenjunbiao001 opened a new pull request #35721:
URL: https://github.com/apache/spark/pull/35721


   linked to issue:https://issues.apache.org/jira/browse/SPARK-38405


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] nyingping edited a comment on pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

nyingping edited a comment on pull request #35721:
URL: https://github.com/apache/spark/pull/35721#issuecomment-1058938606


   1.PR title should be changed to: `[SPARK-38405][SQL] Supports customization of the entire optimizer and planner`
   
   2.PR description may be in the default format to make the community better understand your intention.
   
   
   ### What changes were proposed in this pull request?
   
   ### Why are the changes needed?
   
   ### Does this PR introduce any user-facing change?
   
   ### How was this patch tested?
   
   3.You should open GA in your fork repository
   
   https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] nyingping commented on pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

nyingping commented on pull request #35721:
URL: https://github.com/apache/spark/pull/35721#issuecomment-1058938606


   1.PR title should be changed to: `[SPARK-38405][SQL] Supports customization of the entire optimizer and planner`
   
   2.PR description may be in the default format to make the community better understand your intention.
   
   ```
   ### What changes were proposed in this pull request?
   
   ### Why are the changes needed?
   
   ### Does this PR introduce any user-facing change?
   
   ### How was this patch tested?
   ```
   3.You should open GA in your fork repository
   
   https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] nyingping edited a comment on pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

nyingping edited a comment on pull request #35721:
URL: https://github.com/apache/spark/pull/35721#issuecomment-1058938606


   It might be clearer if the PR title could be changed to : `[SPARK-38405][SQL] Supports customization of the entire optimizer and planner`
   
   PR description may be in the default format to make the community better understand your intention.
   
   
   ### What changes were proposed in this pull request?
   
   ### Why are the changes needed?
   
   ### Does this PR introduce any user-facing change?
   
   ### How was this patch tested?
   
   Please open GA in your fork repository
   
   https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #35721: [SPARK-38405][SQL] Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

wangyum commented on a change in pull request #35721:
URL: https://github.com/apache/spark/pull/35721#discussion_r825305668



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
##########
@@ -242,15 +242,24 @@ abstract class BaseSessionStateBuilder(
    * Note: this depends on `catalog` and `experimentalMethods` fields.
    */
   protected def optimizer: Optimizer = {
-    new SparkOptimizer(catalogManager, catalog, experimentalMethods) {
-      override def earlyScanPushDownRules: Seq[Rule[LogicalPlan]] =
-        super.earlyScanPushDownRules ++ customEarlyScanPushDownRules
+    val optimizerClassName = conf.getConfString("spark.sql.customized.optimizer", "")
+    if (optimizerClassName.nonEmpty && !optimizerClassName.equals("")) {
+      val catalogManagerTypeName = Class.forName("org.apache.spark.sql.connector.catalog.CatalogManager")
+      val sessionCatalogTypeName = Class.forName("org.apache.spark.sql.catalyst.catalog.SessionCatalog")
+      val experimentalMethodsTypeName = Class.forName("org.apache.spark.sql.ExperimentalMethods")
+      val optimizerInstance = Class.forName(optimizerClassName).getConstructor(catalogManagerTypeName, sessionCatalogTypeName, experimentalMethodsTypeName).newInstance(catalogManager, catalog, experimentalMethods)

Review comment:
       @chenjunbiao001 @nyingping Could we use `spark.sql.extensions` to to add custom optimizer rules?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] nyingping commented on a change in pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

nyingping commented on a change in pull request #35721:
URL: https://github.com/apache/spark/pull/35721#discussion_r819344793



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
##########
@@ -242,15 +242,24 @@ abstract class BaseSessionStateBuilder(
    * Note: this depends on `catalog` and `experimentalMethods` fields.
    */
   protected def optimizer: Optimizer = {
-    new SparkOptimizer(catalogManager, catalog, experimentalMethods) {
-      override def earlyScanPushDownRules: Seq[Rule[LogicalPlan]] =
-        super.earlyScanPushDownRules ++ customEarlyScanPushDownRules
+    val optimizerClassName = conf.getConfString("spark.sql.customized.optimizer", "")
+    if (optimizerClassName.nonEmpty && !optimizerClassName.equals("")) {
+      val catalogManagerTypeName = Class.forName("org.apache.spark.sql.connector.catalog.CatalogManager")
+      val sessionCatalogTypeName = Class.forName("org.apache.spark.sql.catalyst.catalog.SessionCatalog")
+      val experimentalMethodsTypeName = Class.forName("org.apache.spark.sql.ExperimentalMethods")
+      val optimizerInstance = Class.forName(optimizerClassName).getConstructor(catalogManagerTypeName, sessionCatalogTypeName, experimentalMethodsTypeName).newInstance(catalogManager, catalog, experimentalMethods)

Review comment:
       file line length exceeds 100 characters on line 247,248,250 and 304
   
   https://github.com/databricks/scala-style-guide#linelength

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
##########
@@ -242,15 +242,24 @@ abstract class BaseSessionStateBuilder(
    * Note: this depends on `catalog` and `experimentalMethods` fields.
    */
   protected def optimizer: Optimizer = {
-    new SparkOptimizer(catalogManager, catalog, experimentalMethods) {
-      override def earlyScanPushDownRules: Seq[Rule[LogicalPlan]] =
-        super.earlyScanPushDownRules ++ customEarlyScanPushDownRules
+    val optimizerClassName = conf.getConfString("spark.sql.customized.optimizer", "")
+    if (optimizerClassName.nonEmpty && !optimizerClassName.equals("")) {
+      val catalogManagerTypeName = Class.forName("org.apache.spark.sql.connector.catalog.CatalogManager")
+      val sessionCatalogTypeName = Class.forName("org.apache.spark.sql.catalyst.catalog.SessionCatalog")
+      val experimentalMethodsTypeName = Class.forName("org.apache.spark.sql.ExperimentalMethods")
+      val optimizerInstance = Class.forName(optimizerClassName).getConstructor(catalogManagerTypeName, sessionCatalogTypeName, experimentalMethodsTypeName).newInstance(catalogManager, catalog, experimentalMethods)

Review comment:
       file line length exceeds 100 characters on line 247,248,250 and 304
   
   https://github.com/databricks/scala-style-guide#linelength




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] nyingping commented on a change in pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

nyingping commented on a change in pull request #35721:
URL: https://github.com/apache/spark/pull/35721#discussion_r819348736



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
##########
@@ -242,15 +242,24 @@ abstract class BaseSessionStateBuilder(
    * Note: this depends on `catalog` and `experimentalMethods` fields.
    */
   protected def optimizer: Optimizer = {
-    new SparkOptimizer(catalogManager, catalog, experimentalMethods) {
-      override def earlyScanPushDownRules: Seq[Rule[LogicalPlan]] =
-        super.earlyScanPushDownRules ++ customEarlyScanPushDownRules
+    val optimizerClassName = conf.getConfString("spark.sql.customized.optimizer", "")
+    if (optimizerClassName.nonEmpty && !optimizerClassName.equals("")) {
+      val catalogManagerTypeName = Class.forName("org.apache.spark.sql.connector.catalog.CatalogManager")
+      val sessionCatalogTypeName = Class.forName("org.apache.spark.sql.catalyst.catalog.SessionCatalog")
+      val experimentalMethodsTypeName = Class.forName("org.apache.spark.sql.ExperimentalMethods")
+      val optimizerInstance = Class.forName(optimizerClassName).getConstructor(catalogManagerTypeName, sessionCatalogTypeName, experimentalMethodsTypeName).newInstance(catalogManager, catalog, experimentalMethods)
+      optimizerInstance.asInstanceOf[Optimizer]
+    } else {
+      new SparkOptimizer(catalogManager, catalog, experimentalMethods) {
+        override def earlyScanPushDownRules: Seq[Rule[LogicalPlan]] =
+          super.earlyScanPushDownRules ++ customEarlyScanPushDownRules
 
-      override def preCBORules: Seq[Rule[LogicalPlan]] =
-        super.preCBORules ++ customPreCBORules
+        override def preCBORules: Seq[Rule[LogicalPlan]] =
+          super.preCBORules ++ customPreCBORules

Review comment:
       nit. redundant space on line 257,258.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] chenjunbiao001 commented on pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

chenjunbiao001 commented on pull request #35721:
URL: https://github.com/apache/spark/pull/35721#issuecomment-1058871514


   这是来自QQ邮箱的假期自动回复邮件。
    
   您好，邮件已收到，感谢您的来信！


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] nyingping edited a comment on pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

nyingping edited a comment on pull request #35721:
URL: https://github.com/apache/spark/pull/35721#issuecomment-1058938606


   1.PR title should be changed to: `[SPARK-38405][SQL] Supports customization of the entire optimizer and planner`
   
   2.PR description may be in the default format to make the community better understand your intention.
   
   
   ### What changes were proposed in this pull request?
   
   ### Why are the changes needed?
   
   ### Does this PR introduce any user-facing change?
   
   ### How was this patch tested?
   
   3.Please open GA in your fork repository
   
   https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #35721:
URL: https://github.com/apache/spark/pull/35721#issuecomment-1058871372


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] nyingping edited a comment on pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

nyingping edited a comment on pull request #35721:
URL: https://github.com/apache/spark/pull/35721#issuecomment-1058938606


   1.It might be clearer if the PR title could be changed to : `[SPARK-38405][SQL] Supports customization of the entire optimizer and planner`
   
   2.PR description may be in the default format to make the community better understand your intention.
   
   
   ### What changes were proposed in this pull request?
   
   ### Why are the changes needed?
   
   ### Does this PR introduce any user-facing change?
   
   ### How was this patch tested?
   
   3.Please open GA in your fork repository
   
   https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/managing-github-actions-settings-for-a-repository


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] chenjunbiao001 removed a comment on pull request #35721: Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

chenjunbiao001 removed a comment on pull request #35721:
URL: https://github.com/apache/spark/pull/35721#issuecomment-1058871514


   这是来自QQ邮箱的假期自动回复邮件。
    
   您好，邮件已收到，感谢您的来信！


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #35721: [SPARK-38405][SQL] Supports customization of the entire optimizer and planner

Posted by GitBox <gi...@apache.org>.

wangyum commented on a change in pull request #35721:
URL: https://github.com/apache/spark/pull/35721#discussion_r825305668



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
##########
@@ -242,15 +242,24 @@ abstract class BaseSessionStateBuilder(
    * Note: this depends on `catalog` and `experimentalMethods` fields.
    */
   protected def optimizer: Optimizer = {
-    new SparkOptimizer(catalogManager, catalog, experimentalMethods) {
-      override def earlyScanPushDownRules: Seq[Rule[LogicalPlan]] =
-        super.earlyScanPushDownRules ++ customEarlyScanPushDownRules
+    val optimizerClassName = conf.getConfString("spark.sql.customized.optimizer", "")
+    if (optimizerClassName.nonEmpty && !optimizerClassName.equals("")) {
+      val catalogManagerTypeName = Class.forName("org.apache.spark.sql.connector.catalog.CatalogManager")
+      val sessionCatalogTypeName = Class.forName("org.apache.spark.sql.catalyst.catalog.SessionCatalog")
+      val experimentalMethodsTypeName = Class.forName("org.apache.spark.sql.ExperimentalMethods")
+      val optimizerInstance = Class.forName(optimizerClassName).getConstructor(catalogManagerTypeName, sessionCatalogTypeName, experimentalMethodsTypeName).newInstance(catalogManager, catalog, experimentalMethods)

Review comment:
       @chenjunbiao001 @nyingping Could we use `spark.sql.extensions` to add custom optimizer rules?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org