You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/11 22:07:30 UTC

[GitHub] [spark] viirya opened a new pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

viirya opened a new pull request #30341:
URL: https://github.com/apache/spark/pull/30341


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This patch proposes to add subexpression elimination for interpreted expression evaluation. Interpreted expression evaluation is used when codegen was not able to work, for example complex schema.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Currently we only do subexpression elimination for codegen. For some reasons, we may need to run interpreted expression valuation. For example, codegen fails to compile and fallback to interpreted mode. It is commonly seen for complex schema from expressions that is possibly caused by the query optimizer too, e.g. SPARK-32945.
   
   We should also support subexpression elimination for interpreted evaluation. That could reduce performance difference when Spark fallbacks from codegen to interpreted expression evaluation.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Unit test. Benchmark manually.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523968830



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]

Review comment:
       For semantically equal exprs, we put a pair of expr -> proxy into the map and note the proxy is the same. So later we traverse down into expressions, we look at the map. We don't do semantically comparing when looking at this map.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727154090


   **[Test build #131088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131088/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728698599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725695725


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728640207






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726217652


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521861048



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRunTime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e@(_: UncheckedExecutionException | _: ExecutionError) =>

Review comment:
       nit. maybe spacing? `e@(` -> `e @ (`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860331



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRunTime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRunTime {

Review comment:
       Please update the file name too.
   `EvaluationRunTime.scala` -> `EvaluationRuntime.scala`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725776270


   **[Test build #130955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130955/testReport)** for PR 30341 at commit [`6bab83c`](https://github.com/apache/spark/commit/6bab83c6f247910d421aa02eada0d5fdf4610120).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725700510


   **[Test build #130949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130949/testReport)** for PR 30341 at commit [`33ac8b4`](https://github.com/apache/spark/commit/33ac8b4d4fffb231054d16a80ab6366f30683c2d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727529486


   **[Test build #131106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131106/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161445






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295714



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map { expr =>
+        // `transform` will cause stackoverflow because it keeps transforming into
+        // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+        // subexpressions at higher level. So we `transformDown` until finding first
+        // subexpression.
+        var transformed = false
+        expr.transform {

Review comment:
       Good point! I think I missed this point. I will add a test too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727158698


   >  I think it should be added as a separate PR and merge the test PR first? Then we can run against with it to check if the result is the same. @maropu
   
   Yea, it looks fine. I'll review this again tonight and thanks for the update.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727531750


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131106/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523977667



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]

Review comment:
       Ah I get it now. Seems we can use `IdentityHashMap` to be more explicit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725713159






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727565640






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523801676



##########
File path: sql/core/benchmarks/SubExprEliminationBenchmark-jdk11-results.txt
##########
@@ -7,9 +7,9 @@ OpenJDK 64-Bit Server VM 11.0.9+11 on Mac OS X 10.15.6
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 from_json as subExpr:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 -------------------------------------------------------------------------------------------------------------------------
-subexpressionElimination off, codegen on           26809          27731         898          0.0   268094225.4       1.0X
-subexpressionElimination off, codegen off          25117          26612        1357          0.0   251166638.4       1.1X
-subexpressionElimination on, codegen on             2582           2906         282          0.0    25819408.7      10.4X
-subexpressionElimination on, codegen off           25635          26131         804          0.0   256346873.1       1.0X
+subexpressionElimination off, codegen on           25932          26908         916          0.0   259320042.3       1.0X
+subexpressionElimination off, codegen off          26085          26159          65          0.0   260848905.0       1.0X
+subexpressionElimination on, codegen on             2860           2939          72          0.0    28603312.9       9.1X
+subexpressionElimination on, codegen off            2517           2617          93          0.0    25165157.7      10.3X

Review comment:
       Thank you for this additional `10.3x`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727364039


   @dongjoon-hyun Thanks! I will update this tonight or tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726259763






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295714



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map { expr =>
+        // `transform` will cause stackoverflow because it keeps transforming into
+        // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+        // subexpressions at higher level. So we `transformDown` until finding first
+        // subexpression.
+        var transformed = false
+        expr.transform {

Review comment:
       Good point! I think I missed this point. I will add a test too. (actually there is a test)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726218542


   **[Test build #131012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131012/testReport)** for PR 30341 at commit [`e8449ec`](https://github.com/apache/spark/commit/e8449ec6a550973aeb66c6deb2bf98a547b934d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860179



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRunTime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRunTime {

Review comment:
       `Runtime` instead of `RunTime`.
   For example, Java has `RuntimeException`. Spark follows it and uses `RunTime` for `time` only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523962095



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
+
+  test("Evaluate ExpressionProxy should create cached result") {
+    val runtime = new SubExprEvaluationRuntime(1)
+    val proxy = ExpressionProxy(Literal(1), runtime)
+    assert(runtime.cache.size() == 0)
+    proxy.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy) == ResultProxy(1))
+  }
+
+  test("SubExprEvaluationRuntime cannot exceed configured max entries") {
+    val runtime = new SubExprEvaluationRuntime(2)
+    assert(runtime.cache.size() == 0)
+
+    val proxy1 = ExpressionProxy(Literal(1), runtime)
+    proxy1.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+    val proxy2 = ExpressionProxy(Literal(2), runtime)
+    proxy2.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+    val proxy3 = ExpressionProxy(Literal(3), runtime)
+    proxy3.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy3) == ResultProxy(3))
+  }
+
+  test("setInput should empty cached result") {
+    val runtime = new SubExprEvaluationRuntime(2)
+    val proxy1 = ExpressionProxy(Literal(1), runtime)
+    assert(runtime.cache.size() == 0)
+    proxy1.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+    val proxy2 = ExpressionProxy(Literal(2), runtime)
+    proxy2.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+    runtime.setInput()
+    assert(runtime.cache.size() == 0)
+  }
+
+  test("Wrap ExpressionProxy on subexpressions") {
+    val runtime = new SubExprEvaluationRuntime(1)
+
+    val one = Literal(1)
+    val two = Literal(2)
+    val mul = Multiply(one, two)
+    val mul2 = Multiply(mul, mul)
+    val sqrt = Sqrt(mul2)
+    val sum = Add(mul2, sqrt)
+
+    //  ( (one * two) * (one * two) ) + sqrt( (one * two) * (one * two) )
+    val proxyExpressions = runtime.proxyExpressions(Seq(sum))
+    val proxys = proxyExpressions.flatMap(_.collect {
+      case p: ExpressionProxy => p
+    })
+    // ( (one * two) * (one * two) )
+    assert(proxys.size == 2)
+    val expected = ExpressionProxy(mul2, runtime)
+    assert(proxys.head == expected)
+  }
+
+  test("ExpressionProxy won't be on non deterministic") {
+    val runtime = new SubExprEvaluationRuntime(1)
+
+    val sum = Add(Rand(0), Rand(0))
+    val proxys = runtime.proxyExpressions(Seq(sum, sum)).flatMap(_.collect {
+      case p: ExpressionProxy => p
+    })
+    assert(proxys.isEmpty)
+  }
+}

Review comment:
       can we test attributes?
   ```
   val attr1 = AttributeReference("a", ...)
   val attr2 = attr1.withName("A")
   ```
   To make sure 2 semantically-equal attributes can be optimized.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171747


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725799453






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522820742



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {

Review comment:
       nit: this doesn't need a default value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727420586


   Thanks. No need to hurry~ Take your time. I just want to give you feedback swiftly from my side. This PR will be here until next week.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860550



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRunTime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e@(_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+}
+
+/**
+ * A simple wrapper for holding `Any` in the cache of `EvaluationRunTime`.

Review comment:
       ditto. `RunTime` -> `Runtime`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521867969



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+    buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+      .internal()
+      .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+      .version("3.1.0")
+      .intConf
+      .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")

Review comment:
       `maxEntries = 0` basically means no caching.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728640212


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35786/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727539284


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35710/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727158547


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35691/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725837249


   GitHub Actions were passed actually.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728640207


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384163



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {

Review comment:
       Removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727163943


   **[Test build #131088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131088/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726218542


   **[Test build #131012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131012/testReport)** for PR 30341 at commit [`e8449ec`](https://github.com/apache/spark/commit/e8449ec6a550973aeb66c6deb2bf98a547b934d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725688678


   **[Test build #130948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130948/testReport)** for PR 30341 at commit [`c7fdae5`](https://github.com/apache/spark/commit/c7fdae569209374a5bfcf038dfc1995935a180da).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727534541


   **[Test build #131107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131107/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725866165


   **[Test build #130955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130955/testReport)** for PR 30341 at commit [`6bab83c`](https://github.com/apache/spark/commit/6bab83c6f247910d421aa02eada0d5fdf4610120).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171748


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131089/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725695725






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164713


   **[Test build #131089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131089/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384138



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {

Review comment:
       As a single file `SubExprEvaluationRuntime.scala` now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728582517


   **[Test build #131184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131184/testReport)** for PR 30341 at commit [`db115d6`](https://github.com/apache/spark/commit/db115d6d850942eca5dc6fac80896b1038561e51).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727154090


   **[Test build #131088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131088/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523962907



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map(replaceWithProxy(_, proxyMap.toMap))
+    } else {
+      expressions
+    }
+  }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(

Review comment:
       for https://github.com/apache/spark/pull/30341#discussion_r523299091
   
   We can define `ExpressionProxy` as `case class ExpressionProxy(child: Expression, id: Int, ...)` and override `equals`/`hashCode`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725836485


   **[Test build #130965 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130965/testReport)** for PR 30341 at commit [`ddd3a96`](https://github.com/apache/spark/commit/ddd3a96db1e22bd0b8c8a5b4c351479a63e1e907).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725718679


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523976909



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))

Review comment:
       Ah sorry I misread the code. You are right.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726242747


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35618/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522786671



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result

Review comment:
       No, here the semantically equal expressions are linked to same `ExpressionProxy`. The cache only uses `ExpressionProxy` as keys.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727272817


   **[Test build #131094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131094/testReport)** for PR 30341 at commit [`47bae35`](https://github.com/apache/spark/commit/47bae358690c550a0259796b7bed137d1d38f28e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728582517


   **[Test build #131184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131184/testReport)** for PR 30341 at commit [`db115d6`](https://github.com/apache/spark/commit/db115d6d850942eca5dc6fac80896b1038561e51).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725894073






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523956362



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))

Review comment:
       This is a top-down traverse, which means we will recursively replace expr with proxy even for `ExpressionProxy`. Is it expected?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725859457






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725859447


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35571/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725718679






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727331477


   #30379 is merged. Could you rebased to the master and regenerate the result, @viirya .
   
   Sure, that's the exact reason I asked their opinion once more. 😄 
   > Let's wait for @cloud-fan's comment for #30341 (comment) after this weekend.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727535880


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35709/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522821537



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+}
+
+/**
+ * A simple wrapper for holding `Any` in the cache of `EvaluationRuntime`.

Review comment:
       why do we need this wrapper?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728697924


   **[Test build #131184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131184/testReport)** for PR 30341 at commit [`db115d6`](https://github.com/apache/spark/commit/db115d6d850942eca5dc6fac80896b1038561e51).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727242883


   **[Test build #131094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131094/testReport)** for PR 30341 at commit [`47bae35`](https://github.com/apache/spark/commit/47bae358690c550a0259796b7bed137d1d38f28e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522786671



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result

Review comment:
       No, here the semantically equal expressions are linked to same `ExpressionProxy`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725688678


   **[Test build #130948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130948/testReport)** for PR 30341 at commit [`c7fdae5`](https://github.com/apache/spark/commit/c7fdae569209374a5bfcf038dfc1995935a180da).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727173662






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-729051922


   Thank you, @viirya , @maropu , @cloud-fan !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727273046






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728640190


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35786/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521862019



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+    buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+      .internal()
+      .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+      .version("3.1.0")
+      .intConf
+      .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")
+      .createWithDefault(100)

Review comment:
       `100` is enough?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725799453






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164713


   **[Test build #131089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131089/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725876366


   **[Test build #130973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130973/testReport)** for PR 30341 at commit [`e8449ec`](https://github.com/apache/spark/commit/e8449ec6a550973aeb66c6deb2bf98a547b934d5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725837444


   cc @cloud-fan @dongjoon-hyun @maropu @HyukjinKwon 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725776270


   **[Test build #130955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130955/testReport)** for PR 30341 at commit [`6bab83c`](https://github.com/apache/spark/commit/6bab83c6f247910d421aa02eada0d5fdf4610120).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522817102



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)

Review comment:
       for low-level class, maybe it's better to have a parameter for cache capacity, instead of accessing the active conf here. It also makes the UT easier as this class don't rely on sql conf.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295829



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+}
+
+/**
+ * A simple wrapper for holding `Any` in the cache of `EvaluationRuntime`.

Review comment:
       Seems `CacheLoader` doesn't accept the second type parameter to be `Any`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726259734


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35618/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164017


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524808901



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map(replaceWithProxy(_, proxyMap.toMap))
+    } else {
+      expressions
+    }
+  }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(

Review comment:
       Ah, nvm. I figured it out. I mixed two places in thinking.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725867045






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727531747


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725718712


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130949/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860972



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRunTime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL

Review comment:
       Quick question: Is this only `14.0` specific? If we change the latest Guava, is this different?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726368702


   **[Test build #131012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131012/testReport)** for PR 30341 at commit [`e8449ec`](https://github.com/apache/spark/commit/e8449ec6a550973aeb66c6deb2bf98a547b934d5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728965012


   thanks, merging to master!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523299091



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result

Review comment:
       Oh, but in order to look up using unique ID of `ExpressionProxy`, we need to keep a map (id -> `ExpressionProxy`) inside `EvaluationRuntime`, so we know to call which proxy's `proxyEval` in `CacheLoader.load` function.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524815671



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
+
+  test("Evaluate ExpressionProxy should create cached result") {
+    val runtime = new SubExprEvaluationRuntime(1)
+    val proxy = ExpressionProxy(Literal(1), runtime)
+    assert(runtime.cache.size() == 0)
+    proxy.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy) == ResultProxy(1))
+  }
+
+  test("SubExprEvaluationRuntime cannot exceed configured max entries") {
+    val runtime = new SubExprEvaluationRuntime(2)
+    assert(runtime.cache.size() == 0)
+
+    val proxy1 = ExpressionProxy(Literal(1), runtime)
+    proxy1.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+    val proxy2 = ExpressionProxy(Literal(2), runtime)
+    proxy2.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+    val proxy3 = ExpressionProxy(Literal(3), runtime)
+    proxy3.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy3) == ResultProxy(3))
+  }
+
+  test("setInput should empty cached result") {
+    val runtime = new SubExprEvaluationRuntime(2)
+    val proxy1 = ExpressionProxy(Literal(1), runtime)
+    assert(runtime.cache.size() == 0)
+    proxy1.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+    val proxy2 = ExpressionProxy(Literal(2), runtime)
+    proxy2.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+    runtime.setInput()
+    assert(runtime.cache.size() == 0)
+  }
+
+  test("Wrap ExpressionProxy on subexpressions") {
+    val runtime = new SubExprEvaluationRuntime(1)
+
+    val one = Literal(1)
+    val two = Literal(2)
+    val mul = Multiply(one, two)
+    val mul2 = Multiply(mul, mul)
+    val sqrt = Sqrt(mul2)
+    val sum = Add(mul2, sqrt)
+
+    //  ( (one * two) * (one * two) ) + sqrt( (one * two) * (one * two) )
+    val proxyExpressions = runtime.proxyExpressions(Seq(sum))
+    val proxys = proxyExpressions.flatMap(_.collect {
+      case p: ExpressionProxy => p
+    })
+    // ( (one * two) * (one * two) )
+    assert(proxys.size == 2)
+    val expected = ExpressionProxy(mul2, runtime)
+    assert(proxys.head == expected)
+  }
+
+  test("ExpressionProxy won't be on non deterministic") {
+    val runtime = new SubExprEvaluationRuntime(1)
+
+    val sum = Add(Rand(0), Rand(0))
+    val proxys = runtime.proxyExpressions(Seq(sum, sum)).flatMap(_.collect {
+      case p: ExpressionProxy => p
+    })
+    assert(proxys.isEmpty)
+  }
+}

Review comment:
       `EquivalentExpressions` skips for `LeafExpression`. So attributes won't be counted for subexpression.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725729215


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35555/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726369827






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727541753


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35710/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522818745



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map { expr =>
+        // `transform` will cause stackoverflow because it keeps transforming into
+        // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+        // subexpressions at higher level. So we `transformDown` until finding first
+        // subexpression.
+        var transformed = false

Review comment:
       this is hacky. Can we do a manual tree traversal using `mapChildren`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727248167


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35697/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728632331


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35786/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522712902



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result

Review comment:
       does cache lookup include expression canonilization?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523961435



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
+
+  test("Evaluate ExpressionProxy should create cached result") {
+    val runtime = new SubExprEvaluationRuntime(1)
+    val proxy = ExpressionProxy(Literal(1), runtime)
+    assert(runtime.cache.size() == 0)
+    proxy.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy) == ResultProxy(1))
+  }
+
+  test("SubExprEvaluationRuntime cannot exceed configured max entries") {
+    val runtime = new SubExprEvaluationRuntime(2)
+    assert(runtime.cache.size() == 0)
+
+    val proxy1 = ExpressionProxy(Literal(1), runtime)
+    proxy1.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+    val proxy2 = ExpressionProxy(Literal(2), runtime)
+    proxy2.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+    val proxy3 = ExpressionProxy(Literal(3), runtime)
+    proxy3.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy3) == ResultProxy(3))
+  }
+
+  test("setInput should empty cached result") {
+    val runtime = new SubExprEvaluationRuntime(2)
+    val proxy1 = ExpressionProxy(Literal(1), runtime)
+    assert(runtime.cache.size() == 0)
+    proxy1.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+    val proxy2 = ExpressionProxy(Literal(2), runtime)
+    proxy2.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+    runtime.setInput()
+    assert(runtime.cache.size() == 0)
+  }
+
+  test("Wrap ExpressionProxy on subexpressions") {
+    val runtime = new SubExprEvaluationRuntime(1)
+
+    val one = Literal(1)
+    val two = Literal(2)
+    val mul = Multiply(one, two)
+    val mul2 = Multiply(mul, mul)
+    val sqrt = Sqrt(mul2)
+    val sum = Add(mul2, sqrt)
+
+    //  ( (one * two) * (one * two) ) + sqrt( (one * two) * (one * two) )
+    val proxyExpressions = runtime.proxyExpressions(Seq(sum))
+    val proxys = proxyExpressions.flatMap(_.collect {
+      case p: ExpressionProxy => p
+    })
+    // ( (one * two) * (one * two) )
+    assert(proxys.size == 2)
+    val expected = ExpressionProxy(mul2, runtime)
+    assert(proxys.head == expected)

Review comment:
       should this be `proxys.forall(_ == expected)`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727531747






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523419617



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
##########
@@ -33,6 +34,15 @@ import org.apache.spark.unsafe.Platform
 class InterpretedUnsafeProjection(expressions: Array[Expression]) extends UnsafeProjection {
   import InterpretedUnsafeProjection._
 
+  private[this] val subExprElimination = SQLConf.get.subexpressionEliminationEnabled

Review comment:
       nit: `subExprElimination` -> `subExprEliminationEnabled`?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    expr match {
+      case e if proxyMap.contains(e) =>
+        proxyMap(e)
+      case _ =>
+        expr.mapChildren(replaceWithProxy(_, proxyMap))

Review comment:
       nit: `proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))`?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
##########
@@ -33,6 +34,15 @@ import org.apache.spark.unsafe.Platform
 class InterpretedUnsafeProjection(expressions: Array[Expression]) extends UnsafeProjection {
   import InterpretedUnsafeProjection._
 
+  private[this] val subExprElimination = SQLConf.get.subexpressionEliminationEnabled
+  private[this] lazy val runtime =
+    new SubExprEvaluationRuntime(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+  private[this] val proxyExpressions = if (subExprElimination) {

Review comment:
       nit: In case of `subExprElimination=false` or no common subexprs, this variable is not related to a "proxy", so how about just calling it `exprs`? (`proxyExpressions` -> `exprs`)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295162



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.

Review comment:
       Yes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725851615


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35571/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384442



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class EvaluationRuntimeSuite extends SparkFunSuite {

Review comment:
       Okay, let me see how to add some tests for this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523381340



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result

Review comment:
       I think it should be fine but want to make sure I get your point correctly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727325324


   Thanks @dongjoon-hyun.
   
   Let's wait for @cloud-fan's comment for https://github.com/apache/spark/pull/30341#discussion_r522822589 after this weekend.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725717956


   **[Test build #130949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130949/testReport)** for PR 30341 at commit [`33ac8b4`](https://github.com/apache/spark/commit/33ac8b4d4fffb231054d16a80ab6366f30683c2d).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725894068


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727542977


   > Please let us know if you have another comments, @maropu and @cloud-fan .
   
   Yea, I don't have any more comment, so I'll leave this to @cloud-fan .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725860167


   Thank you for pinging me, @viirya .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521866385



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+    buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+      .internal()
+      .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+      .version("3.1.0")
+      .intConf
+      .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")
+      .createWithDefault(100)

Review comment:
       This default value is copied from code generator's cache setting. I guess this should be enough.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164017






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725912384


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130965/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725867045






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295133



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result

Review comment:
       Good idea.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727535887






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725912373






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524808398



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map(replaceWithProxy(_, proxyMap.toMap))
+    } else {
+      expressions
+    }
+  }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(

Review comment:
       Okay, but in this case we cannot use `IdentityHashMap` anymore as `IdentityHashMap` doesn't use `equals` to compare keys.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727541758






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727534418


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523961911



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))

Review comment:
       Once we replace one expression with `ExpressionProxy`, we stop traversing down. We only traverse down to children if cannot find current expression in `proxyMap`. Is this for your question?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521865514



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+    buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+      .internal()
+      .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+      .version("3.1.0")
+      .intConf
+      .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")

Review comment:
       Shall we use a simple form like `.checkValue(_ >= 0)`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725700510


   **[Test build #130949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130949/testReport)** for PR 30341 at commit [`33ac8b4`](https://github.com/apache/spark/commit/33ac8b4d4fffb231054d16a80ab6366f30683c2d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727250940


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35697/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727155028


   Hmm, actually for https://github.com/apache/spark/pull/30341#discussion_r522904967, that is to add some tests in `SQLQueryTestSuite` with `--CONFIG_DIM spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN`, I think it should be added as a separate PR and merge the test PR first? Then we can run against with it to check if the result is the same. @maropu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728698599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384143



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>

Review comment:
       Moved.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522893523



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
##########
@@ -63,17 +72,21 @@ class InterpretedUnsafeProjection(expressions: Array[Expression]) extends Unsafe
   }
 
   override def initialize(partitionIndex: Int): Unit = {
-    expressions.foreach(_.foreach {
+    proxyExpressions.foreach(_.foreach {
       case n: Nondeterministic => n.initialize(partitionIndex)
       case _ =>
     })
   }
 
   override def apply(row: InternalRow): UnsafeRow = {
+    if (subExprElimination) {
+      runtime.setInput(row)
+    }
+
     // Put the expression results in the intermediate row.
     var i = 0
     while (i < numFields) {
-      values(i) = expressions(i).eval(row)
+      values(i) = proxyExpressions(i).eval(row)

Review comment:
       Any reason not to apply this into `InterpretedMutableProjection`, too?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {

Review comment:
       Seems like this is just a thin wrapper for eliminating common subexprs in `ExpressionRuntime`, so how about putting this class and `ExpressionRuntime` in a single file?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {

Review comment:
       This seems to be a specific class for subexpression elimination, but I feel the name looks too general. How about renaming it more concretely? e.g., `SubExprElimination[Eval]Runtime` or something?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>

Review comment:
       How about moving this exception handling into the `ExpressionRuntime` then making `cache` private just like `CodeGenerator.cache`? https://github.com/apache/spark/blob/f80fe213bd4c5e065d5723816c42302a532be75c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1351-L1359




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522822589



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result

Review comment:
       Can we assign unique IDs(per `EvaluationRuntime`) to `ExpressionProxy`? then the lookup can be more efficient.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727250946






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523957768



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]

Review comment:
       is it OK to use a simple map here? Two expressions may not equal to each other even if they semantically equal.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164174


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725729230






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523296101



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
##########
@@ -63,17 +72,21 @@ class InterpretedUnsafeProjection(expressions: Array[Expression]) extends Unsafe
   }
 
   override def initialize(partitionIndex: Int): Unit = {
-    expressions.foreach(_.foreach {
+    proxyExpressions.foreach(_.foreach {
       case n: Nondeterministic => n.initialize(partitionIndex)
       case _ =>
     })
   }
 
   override def apply(row: InternalRow): UnsafeRow = {
+    if (subExprElimination) {
+      runtime.setInput(row)
+    }
+
     // Put the expression results in the intermediate row.
     var i = 0
     while (i < numFields) {
-      values(i) = expressions(i).eval(row)
+      values(i) = proxyExpressions(i).eval(row)

Review comment:
       It could be applied to `InterpretedMutableProjection` too. We can add it gradually like filter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725792725


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35561/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726369827






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727534541


   **[Test build #131107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131107/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727565640






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727242745


   Thanks @maropu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-729074737


   Thanks @dongjoon-hyun @maropu @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725705286


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35554/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727154709


   I addressed most of comments except for https://github.com/apache/spark/pull/30341#discussion_r522822589 and https://github.com/apache/spark/pull/30341#discussion_r522904967.
   
   For https://github.com/apache/spark/pull/30341#discussion_r522822589, I will wait for confirmation.
   
   For https://github.com/apache/spark/pull/30341#discussion_r522904967, I will add some more tests later.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727173659


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35692/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726722441


   Nice feature!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727531728


   **[Test build #131106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131106/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727273046






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727242883


   **[Test build #131094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131094/testReport)** for PR 30341 at commit [`47bae35`](https://github.com/apache/spark/commit/47bae358690c550a0259796b7bed137d1d38f28e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521867300



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRunTime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression {
+
+  final override def dataType: DataType = child.dataType
+  final override def nullable: Boolean = child.nullable
+  final override def children: Seq[Expression] = child :: Nil
+
+  // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+  final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+    throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+  def proxyEval(input: InternalRow = null): Any = {
+    child.eval(input)
+  }
+
+  override def eval(input: InternalRow = null): Any = try {
+    runtime.cache.get(this).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL

Review comment:
       Seems the same: https://guava.dev/releases/21.0/api/docs/com/google/common/cache/Cache.html#get-K-java.util.concurrent.Callable-




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727529486


   **[Test build #131106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131106/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725716132


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35555/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727173662






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727250946






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171013


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35692/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164021


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131088/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725799445


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35561/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727255265


   > It would be great if we can have a benchmark suite (from the code in the PR description), @viirya .
   
   Yes, @dongjoon-hyun, I remember it. :) I will add a benchmark suite and the tests @maropu suggested in separate PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-729342315


   Thanks @HyukjinKwon!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161442


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35691/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725894068






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522818057



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    var proxyMap = Map.empty[Expression, ExpressionProxy]

Review comment:
       shall we use a mutable map here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725836485






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727565352


   **[Test build #131107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131107/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725713136


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35554/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727532739


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35709/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522904967



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class EvaluationRuntimeSuite extends SparkFunSuite {

Review comment:
       How about adding some end-2-end tests for this feature in `SQLQueryTestSuite` with `--CONFIG_DIM spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725859457






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161445






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522815747



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.

Review comment:
       We will support interpreted filter in the future, right? maybe `in` -> `such as`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725695736


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130948/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725695687


   **[Test build #130948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130948/testReport)** for PR 30341 at commit [`c7fdae5`](https://github.com/apache/spark/commit/c7fdae569209374a5bfcf038dfc1995935a180da).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class EvaluationRunTime `
     * `case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression `
     * `case class ResultProxy(result: Any)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171679


   **[Test build #131089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131089/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726259763






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725912063






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384132



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {

Review comment:
       Renamed as `SubExprEvaluationRuntime` now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727420586


   No need to hurry~ Take your time. I just want to give you feedback swiftly from my side.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-729338732


   Sorry for a late comment. +1, nice.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524808901



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map(replaceWithProxy(_, proxyMap.toMap))
+    } else {
+      expressions
+    }
+  }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(

Review comment:
       Ah, nvm. I figured it out.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+  private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(cacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  def getEval(proxy: ExpressionProxy): Any = try {
+    cache.get(proxy).result
+  } catch {
+    // Cache.get() may wrap the original exception. See the following URL
+    // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+    //   Cache.html#get(K,%20java.util.concurrent.Callable)
+    case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+      throw e.getCause
+  }
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Recursively replaces expression with its proxy expression in `proxyMap`.
+   */
+  private def replaceWithProxy(
+      expr: Expression,
+      proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+    proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map(replaceWithProxy(_, proxyMap.toMap))
+    } else {
+      expressions
+    }
+  }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(

Review comment:
       <del>Okay, but in this case we cannot use `IdentityHashMap` anymore as `IdentityHashMap` doesn't use `equals` to compare keys.</del>




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #30341:
URL: https://github.com/apache/spark/pull/30341


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725911883






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727535887






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171747






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521861854



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+    buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+      .internal()
+      .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+      .version("3.1.0")
+      .intConf
+      .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")

Review comment:
       What happens `maxEntries = 0`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522820004



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map { expr =>
+        // `transform` will cause stackoverflow because it keeps transforming into
+        // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+        // subexpressions at higher level. So we `transformDown` until finding first
+        // subexpression.
+        var transformed = false
+        expr.transform {

Review comment:
       what if the single expr already have repeated sub-exprs? e.g. `SELECT (a+1) + (a+1)`. The code here seems can only replace one expression.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727541758






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524801521



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
+
+  test("Evaluate ExpressionProxy should create cached result") {
+    val runtime = new SubExprEvaluationRuntime(1)
+    val proxy = ExpressionProxy(Literal(1), runtime)
+    assert(runtime.cache.size() == 0)
+    proxy.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy) == ResultProxy(1))
+  }
+
+  test("SubExprEvaluationRuntime cannot exceed configured max entries") {
+    val runtime = new SubExprEvaluationRuntime(2)
+    assert(runtime.cache.size() == 0)
+
+    val proxy1 = ExpressionProxy(Literal(1), runtime)
+    proxy1.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+    val proxy2 = ExpressionProxy(Literal(2), runtime)
+    proxy2.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+    val proxy3 = ExpressionProxy(Literal(3), runtime)
+    proxy3.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy3) == ResultProxy(3))
+  }
+
+  test("setInput should empty cached result") {
+    val runtime = new SubExprEvaluationRuntime(2)
+    val proxy1 = ExpressionProxy(Literal(1), runtime)
+    assert(runtime.cache.size() == 0)
+    proxy1.eval()
+    assert(runtime.cache.size() == 1)
+    assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+    val proxy2 = ExpressionProxy(Literal(2), runtime)
+    proxy2.eval()
+    assert(runtime.cache.size() == 2)
+    assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+    runtime.setInput()
+    assert(runtime.cache.size() == 0)
+  }
+
+  test("Wrap ExpressionProxy on subexpressions") {
+    val runtime = new SubExprEvaluationRuntime(1)
+
+    val one = Literal(1)
+    val two = Literal(2)
+    val mul = Multiply(one, two)
+    val mul2 = Multiply(mul, mul)
+    val sqrt = Sqrt(mul2)
+    val sum = Add(mul2, sqrt)
+
+    //  ( (one * two) * (one * two) ) + sqrt( (one * two) * (one * two) )
+    val proxyExpressions = runtime.proxyExpressions(Seq(sum))
+    val proxys = proxyExpressions.flatMap(_.collect {
+      case p: ExpressionProxy => p
+    })
+    // ( (one * two) * (one * two) )
+    assert(proxys.size == 2)
+    val expected = ExpressionProxy(mul2, runtime)
+    assert(proxys.head == expected)

Review comment:
       yeah, you're right.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727254222


   It would be great if we can have a benchmark suite (from the code in the PR description), @viirya .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726647711


   The idea looks good!  left several comments


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523383340



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+  val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+    .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+    .build(
+      new CacheLoader[ExpressionProxy, ResultProxy]() {
+        override def load(expr: ExpressionProxy): ResultProxy = {
+          ResultProxy(expr.proxyEval(currentInput))
+        }
+      })
+
+  private var currentInput: InternalRow = null
+
+  /**
+   * Sets given input row as current row for evaluating expressions. This cleans up the cache
+   * too as new input comes.
+   */
+  def setInput(input: InternalRow = null): Unit = {
+    currentInput = input
+    cache.invalidateAll()
+  }
+
+  /**
+   * Finds subexpressions and wraps them with `ExpressionProxy`.
+   */
+  def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+    val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+    expressions.foreach(equivalentExpressions.addExprTree(_))
+
+    var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+    val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+    commonExprs.foreach { e =>
+      val expr = e.head
+      val proxy = ExpressionProxy(expr, this)
+
+      proxyMap ++= e.map(_ -> proxy).toMap
+    }
+
+    // Only adding proxy if we find subexpressions.
+    if (proxyMap.nonEmpty) {
+      expressions.map { expr =>
+        // `transform` will cause stackoverflow because it keeps transforming into
+        // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+        // subexpressions at higher level. So we `transformDown` until finding first
+        // subexpression.
+        var transformed = false

Review comment:
       Oh, good idea, I like it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725729230






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860409



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRunTime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRunTime {

Review comment:
       Oh, good to know. I will change it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725713159






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org