You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/11 22:07:30 UTC
[GitHub] [spark] viirya opened a new pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
viirya opened a new pull request #30341:
URL: https://github.com/apache/spark/pull/30341
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
4. Be sure to keep the PR description updated to reflect all changes.
5. Please write your PR title to summarize what this PR proposes.
6. If possible, provide a concise example to reproduce the issue for a faster review.
7. If you want to add a new configuration, please read the guideline first for naming configurations in
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
-->
### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
2. If you fix some SQL features, you can provide some references of other DBMSes.
3. If there is design documentation, please add the link.
4. If there is a discussion in the mailing list, please add the link.
-->
This patch proposes to add subexpression elimination for interpreted expression evaluation. Interpreted expression evaluation is used when codegen was not able to work, for example complex schema.
### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
-->
Currently we only do subexpression elimination for codegen. For some reasons, we may need to run interpreted expression valuation. For example, codegen fails to compile and fallback to interpreted mode. It is commonly seen for complex schema from expressions that is possibly caused by the query optimizer too, e.g. SPARK-32945.
We should also support subexpression elimination for interpreted evaluation. That could reduce performance difference when Spark fallbacks from codegen to interpreted expression evaluation.
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
If no, write 'No'.
-->
No
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
-->
Unit test. Benchmark manually.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523968830
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
Review comment:
For semantically equal exprs, we put a pair of expr -> proxy into the map and note the proxy is the same. So later we traverse down into expressions, we look at the map. We don't do semantically comparing when looking at this map.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727154090
**[Test build #131088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131088/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728698599
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725695725
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728640207
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726217652
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521861048
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRunTime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e@(_: UncheckedExecutionException | _: ExecutionError) =>
Review comment:
nit. maybe spacing? `e@(` -> `e @ (`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860331
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRunTime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRunTime {
Review comment:
Please update the file name too.
`EvaluationRunTime.scala` -> `EvaluationRuntime.scala`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725776270
**[Test build #130955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130955/testReport)** for PR 30341 at commit [`6bab83c`](https://github.com/apache/spark/commit/6bab83c6f247910d421aa02eada0d5fdf4610120).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725700510
**[Test build #130949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130949/testReport)** for PR 30341 at commit [`33ac8b4`](https://github.com/apache/spark/commit/33ac8b4d4fffb231054d16a80ab6366f30683c2d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727529486
**[Test build #131106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131106/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161445
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295714
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+ val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map { expr =>
+ // `transform` will cause stackoverflow because it keeps transforming into
+ // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+ // subexpressions at higher level. So we `transformDown` until finding first
+ // subexpression.
+ var transformed = false
+ expr.transform {
Review comment:
Good point! I think I missed this point. I will add a test too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727158698
> I think it should be added as a separate PR and merge the test PR first? Then we can run against with it to check if the result is the same. @maropu
Yea, it looks fine. I'll review this again tonight and thanks for the update.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727531750
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131106/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523977667
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
Review comment:
Ah I get it now. Seems we can use `IdentityHashMap` to be more explicit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725713159
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727565640
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523801676
##########
File path: sql/core/benchmarks/SubExprEliminationBenchmark-jdk11-results.txt
##########
@@ -7,9 +7,9 @@ OpenJDK 64-Bit Server VM 11.0.9+11 on Mac OS X 10.15.6
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
from_json as subExpr: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
-subexpressionElimination off, codegen on 26809 27731 898 0.0 268094225.4 1.0X
-subexpressionElimination off, codegen off 25117 26612 1357 0.0 251166638.4 1.1X
-subexpressionElimination on, codegen on 2582 2906 282 0.0 25819408.7 10.4X
-subexpressionElimination on, codegen off 25635 26131 804 0.0 256346873.1 1.0X
+subexpressionElimination off, codegen on 25932 26908 916 0.0 259320042.3 1.0X
+subexpressionElimination off, codegen off 26085 26159 65 0.0 260848905.0 1.0X
+subexpressionElimination on, codegen on 2860 2939 72 0.0 28603312.9 9.1X
+subexpressionElimination on, codegen off 2517 2617 93 0.0 25165157.7 10.3X
Review comment:
Thank you for this additional `10.3x`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727364039
@dongjoon-hyun Thanks! I will update this tonight or tomorrow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726259763
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295714
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+ val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map { expr =>
+ // `transform` will cause stackoverflow because it keeps transforming into
+ // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+ // subexpressions at higher level. So we `transformDown` until finding first
+ // subexpression.
+ var transformed = false
+ expr.transform {
Review comment:
Good point! I think I missed this point. I will add a test too. (actually there is a test)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726218542
**[Test build #131012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131012/testReport)** for PR 30341 at commit [`e8449ec`](https://github.com/apache/spark/commit/e8449ec6a550973aeb66c6deb2bf98a547b934d5).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860179
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRunTime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRunTime {
Review comment:
`Runtime` instead of `RunTime`.
For example, Java has `RuntimeException`. Spark follows it and uses `RunTime` for `time` only.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523962095
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
+
+ test("Evaluate ExpressionProxy should create cached result") {
+ val runtime = new SubExprEvaluationRuntime(1)
+ val proxy = ExpressionProxy(Literal(1), runtime)
+ assert(runtime.cache.size() == 0)
+ proxy.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy) == ResultProxy(1))
+ }
+
+ test("SubExprEvaluationRuntime cannot exceed configured max entries") {
+ val runtime = new SubExprEvaluationRuntime(2)
+ assert(runtime.cache.size() == 0)
+
+ val proxy1 = ExpressionProxy(Literal(1), runtime)
+ proxy1.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+ val proxy2 = ExpressionProxy(Literal(2), runtime)
+ proxy2.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+ val proxy3 = ExpressionProxy(Literal(3), runtime)
+ proxy3.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy3) == ResultProxy(3))
+ }
+
+ test("setInput should empty cached result") {
+ val runtime = new SubExprEvaluationRuntime(2)
+ val proxy1 = ExpressionProxy(Literal(1), runtime)
+ assert(runtime.cache.size() == 0)
+ proxy1.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+ val proxy2 = ExpressionProxy(Literal(2), runtime)
+ proxy2.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+ runtime.setInput()
+ assert(runtime.cache.size() == 0)
+ }
+
+ test("Wrap ExpressionProxy on subexpressions") {
+ val runtime = new SubExprEvaluationRuntime(1)
+
+ val one = Literal(1)
+ val two = Literal(2)
+ val mul = Multiply(one, two)
+ val mul2 = Multiply(mul, mul)
+ val sqrt = Sqrt(mul2)
+ val sum = Add(mul2, sqrt)
+
+ // ( (one * two) * (one * two) ) + sqrt( (one * two) * (one * two) )
+ val proxyExpressions = runtime.proxyExpressions(Seq(sum))
+ val proxys = proxyExpressions.flatMap(_.collect {
+ case p: ExpressionProxy => p
+ })
+ // ( (one * two) * (one * two) )
+ assert(proxys.size == 2)
+ val expected = ExpressionProxy(mul2, runtime)
+ assert(proxys.head == expected)
+ }
+
+ test("ExpressionProxy won't be on non deterministic") {
+ val runtime = new SubExprEvaluationRuntime(1)
+
+ val sum = Add(Rand(0), Rand(0))
+ val proxys = runtime.proxyExpressions(Seq(sum, sum)).flatMap(_.collect {
+ case p: ExpressionProxy => p
+ })
+ assert(proxys.isEmpty)
+ }
+}
Review comment:
can we test attributes?
```
val attr1 = AttributeReference("a", ...)
val attr2 = attr1.withName("A")
```
To make sure 2 semantically-equal attributes can be optimized.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171747
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725799453
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522820742
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
Review comment:
nit: this doesn't need a default value.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727420586
Thanks. No need to hurry~ Take your time. I just want to give you feedback swiftly from my side. This PR will be here until next week.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860550
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRunTime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e@(_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+}
+
+/**
+ * A simple wrapper for holding `Any` in the cache of `EvaluationRunTime`.
Review comment:
ditto. `RunTime` -> `Runtime`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521867969
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+ buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+ .internal()
+ .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")
Review comment:
`maxEntries = 0` basically means no caching.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728640212
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35786/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727539284
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35710/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727158547
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35691/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725837249
GitHub Actions were passed actually.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728640207
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384163
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
Review comment:
Removed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727163943
**[Test build #131088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131088/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726218542
**[Test build #131012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131012/testReport)** for PR 30341 at commit [`e8449ec`](https://github.com/apache/spark/commit/e8449ec6a550973aeb66c6deb2bf98a547b934d5).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725688678
**[Test build #130948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130948/testReport)** for PR 30341 at commit [`c7fdae5`](https://github.com/apache/spark/commit/c7fdae569209374a5bfcf038dfc1995935a180da).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727534541
**[Test build #131107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131107/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725866165
**[Test build #130955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130955/testReport)** for PR 30341 at commit [`6bab83c`](https://github.com/apache/spark/commit/6bab83c6f247910d421aa02eada0d5fdf4610120).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171748
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131089/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725695725
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164713
**[Test build #131089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131089/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384138
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
Review comment:
As a single file `SubExprEvaluationRuntime.scala` now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728582517
**[Test build #131184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131184/testReport)** for PR 30341 at commit [`db115d6`](https://github.com/apache/spark/commit/db115d6d850942eca5dc6fac80896b1038561e51).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727154090
**[Test build #131088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131088/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523962907
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map(replaceWithProxy(_, proxyMap.toMap))
+ } else {
+ expressions
+ }
+ }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(
Review comment:
for https://github.com/apache/spark/pull/30341#discussion_r523299091
We can define `ExpressionProxy` as `case class ExpressionProxy(child: Expression, id: Int, ...)` and override `equals`/`hashCode`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725836485
**[Test build #130965 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130965/testReport)** for PR 30341 at commit [`ddd3a96`](https://github.com/apache/spark/commit/ddd3a96db1e22bd0b8c8a5b4c351479a63e1e907).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725718679
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523976909
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
Review comment:
Ah sorry I misread the code. You are right.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726242747
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35618/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522786671
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
Review comment:
No, here the semantically equal expressions are linked to same `ExpressionProxy`. The cache only uses `ExpressionProxy` as keys.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727272817
**[Test build #131094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131094/testReport)** for PR 30341 at commit [`47bae35`](https://github.com/apache/spark/commit/47bae358690c550a0259796b7bed137d1d38f28e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728582517
**[Test build #131184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131184/testReport)** for PR 30341 at commit [`db115d6`](https://github.com/apache/spark/commit/db115d6d850942eca5dc6fac80896b1038561e51).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725894073
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523956362
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
Review comment:
This is a top-down traverse, which means we will recursively replace expr with proxy even for `ExpressionProxy`. Is it expected?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725859457
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725859447
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35571/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725718679
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727331477
#30379 is merged. Could you rebased to the master and regenerate the result, @viirya .
Sure, that's the exact reason I asked their opinion once more. 😄
> Let's wait for @cloud-fan's comment for #30341 (comment) after this weekend.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727535880
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35709/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522821537
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+}
+
+/**
+ * A simple wrapper for holding `Any` in the cache of `EvaluationRuntime`.
Review comment:
why do we need this wrapper?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728697924
**[Test build #131184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131184/testReport)** for PR 30341 at commit [`db115d6`](https://github.com/apache/spark/commit/db115d6d850942eca5dc6fac80896b1038561e51).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727242883
**[Test build #131094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131094/testReport)** for PR 30341 at commit [`47bae35`](https://github.com/apache/spark/commit/47bae358690c550a0259796b7bed137d1d38f28e).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522786671
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
Review comment:
No, here the semantically equal expressions are linked to same `ExpressionProxy`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725688678
**[Test build #130948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130948/testReport)** for PR 30341 at commit [`c7fdae5`](https://github.com/apache/spark/commit/c7fdae569209374a5bfcf038dfc1995935a180da).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727173662
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-729051922
Thank you, @viirya , @maropu , @cloud-fan !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727273046
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728640190
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35786/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521862019
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+ buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+ .internal()
+ .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")
+ .createWithDefault(100)
Review comment:
`100` is enough?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725799453
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164713
**[Test build #131089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131089/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725876366
**[Test build #130973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130973/testReport)** for PR 30341 at commit [`e8449ec`](https://github.com/apache/spark/commit/e8449ec6a550973aeb66c6deb2bf98a547b934d5).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725837444
cc @cloud-fan @dongjoon-hyun @maropu @HyukjinKwon
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725776270
**[Test build #130955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130955/testReport)** for PR 30341 at commit [`6bab83c`](https://github.com/apache/spark/commit/6bab83c6f247910d421aa02eada0d5fdf4610120).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522817102
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+ val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
Review comment:
for low-level class, maybe it's better to have a parameter for cache capacity, instead of accessing the active conf here. It also makes the UT easier as this class don't rely on sql conf.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295829
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+}
+
+/**
+ * A simple wrapper for holding `Any` in the cache of `EvaluationRuntime`.
Review comment:
Seems `CacheLoader` doesn't accept the second type parameter to be `Any`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726259734
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35618/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164017
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524808901
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map(replaceWithProxy(_, proxyMap.toMap))
+ } else {
+ expressions
+ }
+ }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(
Review comment:
Ah, nvm. I figured it out. I mixed two places in thinking.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725867045
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727531747
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725718712
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130949/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860972
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRunTime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
Review comment:
Quick question: Is this only `14.0` specific? If we change the latest Guava, is this different?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726368702
**[Test build #131012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131012/testReport)** for PR 30341 at commit [`e8449ec`](https://github.com/apache/spark/commit/e8449ec6a550973aeb66c6deb2bf98a547b934d5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728965012
thanks, merging to master!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523299091
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
Review comment:
Oh, but in order to look up using unique ID of `ExpressionProxy`, we need to keep a map (id -> `ExpressionProxy`) inside `EvaluationRuntime`, so we know to call which proxy's `proxyEval` in `CacheLoader.load` function.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524815671
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
+
+ test("Evaluate ExpressionProxy should create cached result") {
+ val runtime = new SubExprEvaluationRuntime(1)
+ val proxy = ExpressionProxy(Literal(1), runtime)
+ assert(runtime.cache.size() == 0)
+ proxy.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy) == ResultProxy(1))
+ }
+
+ test("SubExprEvaluationRuntime cannot exceed configured max entries") {
+ val runtime = new SubExprEvaluationRuntime(2)
+ assert(runtime.cache.size() == 0)
+
+ val proxy1 = ExpressionProxy(Literal(1), runtime)
+ proxy1.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+ val proxy2 = ExpressionProxy(Literal(2), runtime)
+ proxy2.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+ val proxy3 = ExpressionProxy(Literal(3), runtime)
+ proxy3.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy3) == ResultProxy(3))
+ }
+
+ test("setInput should empty cached result") {
+ val runtime = new SubExprEvaluationRuntime(2)
+ val proxy1 = ExpressionProxy(Literal(1), runtime)
+ assert(runtime.cache.size() == 0)
+ proxy1.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+ val proxy2 = ExpressionProxy(Literal(2), runtime)
+ proxy2.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+ runtime.setInput()
+ assert(runtime.cache.size() == 0)
+ }
+
+ test("Wrap ExpressionProxy on subexpressions") {
+ val runtime = new SubExprEvaluationRuntime(1)
+
+ val one = Literal(1)
+ val two = Literal(2)
+ val mul = Multiply(one, two)
+ val mul2 = Multiply(mul, mul)
+ val sqrt = Sqrt(mul2)
+ val sum = Add(mul2, sqrt)
+
+ // ( (one * two) * (one * two) ) + sqrt( (one * two) * (one * two) )
+ val proxyExpressions = runtime.proxyExpressions(Seq(sum))
+ val proxys = proxyExpressions.flatMap(_.collect {
+ case p: ExpressionProxy => p
+ })
+ // ( (one * two) * (one * two) )
+ assert(proxys.size == 2)
+ val expected = ExpressionProxy(mul2, runtime)
+ assert(proxys.head == expected)
+ }
+
+ test("ExpressionProxy won't be on non deterministic") {
+ val runtime = new SubExprEvaluationRuntime(1)
+
+ val sum = Add(Rand(0), Rand(0))
+ val proxys = runtime.proxyExpressions(Seq(sum, sum)).flatMap(_.collect {
+ case p: ExpressionProxy => p
+ })
+ assert(proxys.isEmpty)
+ }
+}
Review comment:
`EquivalentExpressions` skips for `LeafExpression`. So attributes won't be counted for subexpression.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725729215
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35555/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726369827
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727541753
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35710/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522818745
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+ val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map { expr =>
+ // `transform` will cause stackoverflow because it keeps transforming into
+ // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+ // subexpressions at higher level. So we `transformDown` until finding first
+ // subexpression.
+ var transformed = false
Review comment:
this is hacky. Can we do a manual tree traversal using `mapChildren`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727248167
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35697/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728632331
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35786/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522712902
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
Review comment:
does cache lookup include expression canonilization?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523961435
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
+
+ test("Evaluate ExpressionProxy should create cached result") {
+ val runtime = new SubExprEvaluationRuntime(1)
+ val proxy = ExpressionProxy(Literal(1), runtime)
+ assert(runtime.cache.size() == 0)
+ proxy.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy) == ResultProxy(1))
+ }
+
+ test("SubExprEvaluationRuntime cannot exceed configured max entries") {
+ val runtime = new SubExprEvaluationRuntime(2)
+ assert(runtime.cache.size() == 0)
+
+ val proxy1 = ExpressionProxy(Literal(1), runtime)
+ proxy1.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+ val proxy2 = ExpressionProxy(Literal(2), runtime)
+ proxy2.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+ val proxy3 = ExpressionProxy(Literal(3), runtime)
+ proxy3.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy3) == ResultProxy(3))
+ }
+
+ test("setInput should empty cached result") {
+ val runtime = new SubExprEvaluationRuntime(2)
+ val proxy1 = ExpressionProxy(Literal(1), runtime)
+ assert(runtime.cache.size() == 0)
+ proxy1.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+ val proxy2 = ExpressionProxy(Literal(2), runtime)
+ proxy2.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+ runtime.setInput()
+ assert(runtime.cache.size() == 0)
+ }
+
+ test("Wrap ExpressionProxy on subexpressions") {
+ val runtime = new SubExprEvaluationRuntime(1)
+
+ val one = Literal(1)
+ val two = Literal(2)
+ val mul = Multiply(one, two)
+ val mul2 = Multiply(mul, mul)
+ val sqrt = Sqrt(mul2)
+ val sum = Add(mul2, sqrt)
+
+ // ( (one * two) * (one * two) ) + sqrt( (one * two) * (one * two) )
+ val proxyExpressions = runtime.proxyExpressions(Seq(sum))
+ val proxys = proxyExpressions.flatMap(_.collect {
+ case p: ExpressionProxy => p
+ })
+ // ( (one * two) * (one * two) )
+ assert(proxys.size == 2)
+ val expected = ExpressionProxy(mul2, runtime)
+ assert(proxys.head == expected)
Review comment:
should this be `proxys.forall(_ == expected)`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727531747
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523419617
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
##########
@@ -33,6 +34,15 @@ import org.apache.spark.unsafe.Platform
class InterpretedUnsafeProjection(expressions: Array[Expression]) extends UnsafeProjection {
import InterpretedUnsafeProjection._
+ private[this] val subExprElimination = SQLConf.get.subexpressionEliminationEnabled
Review comment:
nit: `subExprElimination` -> `subExprEliminationEnabled`?
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ expr match {
+ case e if proxyMap.contains(e) =>
+ proxyMap(e)
+ case _ =>
+ expr.mapChildren(replaceWithProxy(_, proxyMap))
Review comment:
nit: `proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))`?
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
##########
@@ -33,6 +34,15 @@ import org.apache.spark.unsafe.Platform
class InterpretedUnsafeProjection(expressions: Array[Expression]) extends UnsafeProjection {
import InterpretedUnsafeProjection._
+ private[this] val subExprElimination = SQLConf.get.subexpressionEliminationEnabled
+ private[this] lazy val runtime =
+ new SubExprEvaluationRuntime(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+ private[this] val proxyExpressions = if (subExprElimination) {
Review comment:
nit: In case of `subExprElimination=false` or no common subexprs, this variable is not related to a "proxy", so how about just calling it `exprs`? (`proxyExpressions` -> `exprs`)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295162
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
Review comment:
Yes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725851615
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35571/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384442
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class EvaluationRuntimeSuite extends SparkFunSuite {
Review comment:
Okay, let me see how to add some tests for this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523381340
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
Review comment:
I think it should be fine but want to make sure I get your point correctly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727325324
Thanks @dongjoon-hyun.
Let's wait for @cloud-fan's comment for https://github.com/apache/spark/pull/30341#discussion_r522822589 after this weekend.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725717956
**[Test build #130949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130949/testReport)** for PR 30341 at commit [`33ac8b4`](https://github.com/apache/spark/commit/33ac8b4d4fffb231054d16a80ab6366f30683c2d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725894068
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727542977
> Please let us know if you have another comments, @maropu and @cloud-fan .
Yea, I don't have any more comment, so I'll leave this to @cloud-fan .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725860167
Thank you for pinging me, @viirya .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521866385
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+ buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+ .internal()
+ .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")
+ .createWithDefault(100)
Review comment:
This default value is copied from code generator's cache setting. I guess this should be enough.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164017
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725912384
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130965/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725867045
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523295133
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
Review comment:
Good idea.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727535887
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725912373
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524808398
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map(replaceWithProxy(_, proxyMap.toMap))
+ } else {
+ expressions
+ }
+ }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(
Review comment:
Okay, but in this case we cannot use `IdentityHashMap` anymore as `IdentityHashMap` doesn't use `equals` to compare keys.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727541758
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727534418
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523961911
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
Review comment:
Once we replace one expression with `ExpressionProxy`, we stop traversing down. We only traverse down to children if cannot find current expression in `proxyMap`. Is this for your question?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521865514
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+ buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+ .internal()
+ .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")
Review comment:
Shall we use a simple form like `.checkValue(_ >= 0)`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725700510
**[Test build #130949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130949/testReport)** for PR 30341 at commit [`33ac8b4`](https://github.com/apache/spark/commit/33ac8b4d4fffb231054d16a80ab6366f30683c2d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727250940
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35697/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727155028
Hmm, actually for https://github.com/apache/spark/pull/30341#discussion_r522904967, that is to add some tests in `SQLQueryTestSuite` with `--CONFIG_DIM spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN`, I think it should be added as a separate PR and merge the test PR first? Then we can run against with it to check if the result is the same. @maropu
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-728698599
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384143
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
Review comment:
Moved.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522893523
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
##########
@@ -63,17 +72,21 @@ class InterpretedUnsafeProjection(expressions: Array[Expression]) extends Unsafe
}
override def initialize(partitionIndex: Int): Unit = {
- expressions.foreach(_.foreach {
+ proxyExpressions.foreach(_.foreach {
case n: Nondeterministic => n.initialize(partitionIndex)
case _ =>
})
}
override def apply(row: InternalRow): UnsafeRow = {
+ if (subExprElimination) {
+ runtime.setInput(row)
+ }
+
// Put the expression results in the intermediate row.
var i = 0
while (i < numFields) {
- values(i) = expressions(i).eval(row)
+ values(i) = proxyExpressions(i).eval(row)
Review comment:
Any reason not to apply this into `InterpretedMutableProjection`, too?
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
Review comment:
Seems like this is just a thin wrapper for eliminating common subexprs in `ExpressionRuntime`, so how about putting this class and `ExpressionRuntime` in a single file?
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
Review comment:
This seems to be a specific class for subexpression elimination, but I feel the name looks too general. How about renaming it more concretely? e.g., `SubExprElimination[Eval]Runtime` or something?
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
Review comment:
How about moving this exception handling into the `ExpressionRuntime` then making `cache` private just like `CodeGenerator.cache`? https://github.com/apache/spark/blob/f80fe213bd4c5e065d5723816c42302a532be75c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1351-L1359
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522822589
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRuntime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRuntime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
Review comment:
Can we assign unique IDs(per `EvaluationRuntime`) to `ExpressionProxy`? then the lookup can be more efficient.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727250946
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523957768
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
Review comment:
is it OK to use a simple map here? Two expressions may not equal to each other even if they semantically equal.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164174
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725729230
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523296101
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
##########
@@ -63,17 +72,21 @@ class InterpretedUnsafeProjection(expressions: Array[Expression]) extends Unsafe
}
override def initialize(partitionIndex: Int): Unit = {
- expressions.foreach(_.foreach {
+ proxyExpressions.foreach(_.foreach {
case n: Nondeterministic => n.initialize(partitionIndex)
case _ =>
})
}
override def apply(row: InternalRow): UnsafeRow = {
+ if (subExprElimination) {
+ runtime.setInput(row)
+ }
+
// Put the expression results in the intermediate row.
var i = 0
while (i < numFields) {
- values(i) = expressions(i).eval(row)
+ values(i) = proxyExpressions(i).eval(row)
Review comment:
It could be applied to `InterpretedMutableProjection` too. We can add it gradually like filter.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725792725
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35561/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726369827
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727534541
**[Test build #131107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131107/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727565640
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727242745
Thanks @maropu
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-729074737
Thanks @dongjoon-hyun @maropu @cloud-fan
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725705286
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35554/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727154709
I addressed most of comments except for https://github.com/apache/spark/pull/30341#discussion_r522822589 and https://github.com/apache/spark/pull/30341#discussion_r522904967.
For https://github.com/apache/spark/pull/30341#discussion_r522822589, I will wait for confirmation.
For https://github.com/apache/spark/pull/30341#discussion_r522904967, I will add some more tests later.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727173659
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35692/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726722441
Nice feature!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727531728
**[Test build #131106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131106/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727273046
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727242883
**[Test build #131094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131094/testReport)** for PR 30341 at commit [`47bae35`](https://github.com/apache/spark/commit/47bae358690c550a0259796b7bed137d1d38f28e).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521867300
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionProxy.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `EvaluationRunTime`, when this
+ * is asked to evaluate, it will load the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression {
+
+ final override def dataType: DataType = child.dataType
+ final override def nullable: Boolean = child.nullable
+ final override def children: Seq[Expression] = child :: Nil
+
+ // `ExpressionProxy` is for interpreted expression evaluation only. So cannot `doGenCode`.
+ final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
+ throw new UnsupportedOperationException(s"Cannot generate code for expression: $this")
+
+ def proxyEval(input: InternalRow = null): Any = {
+ child.eval(input)
+ }
+
+ override def eval(input: InternalRow = null): Any = try {
+ runtime.cache.get(this).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
Review comment:
Seems the same: https://guava.dev/releases/21.0/api/docs/com/google/common/cache/Cache.html#get-K-java.util.concurrent.Callable-
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727529486
**[Test build #131106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131106/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725716132
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35555/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727173662
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727250946
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171013
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35692/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727164021
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131088/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725799445
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35561/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727255265
> It would be great if we can have a benchmark suite (from the code in the PR description), @viirya .
Yes, @dongjoon-hyun, I remember it. :) I will add a benchmark suite and the tests @maropu suggested in separate PRs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-729342315
Thanks @HyukjinKwon!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161442
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35691/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725894068
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522818057
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+ val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ var proxyMap = Map.empty[Expression, ExpressionProxy]
Review comment:
shall we use a mutable map here?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725836485
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727565352
**[Test build #131107 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131107/testReport)** for PR 30341 at commit [`77168fe`](https://github.com/apache/spark/commit/77168fe2dd687113ae1b9a2f086982861445ecda).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725713136
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35554/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727532739
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35709/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522904967
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class EvaluationRuntimeSuite extends SparkFunSuite {
Review comment:
How about adding some end-2-end tests for this feature in `SQLQueryTestSuite` with `--CONFIG_DIM spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN` ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725859457
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727161445
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522815747
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
Review comment:
We will support interpreted filter in the future, right? maybe `in` -> `such as`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725695736
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130948/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725695687
**[Test build #130948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130948/testReport)** for PR 30341 at commit [`c7fdae5`](https://github.com/apache/spark/commit/c7fdae569209374a5bfcf038dfc1995935a180da).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class EvaluationRunTime `
* `case class ExpressionProxy(child: Expression, runtime: EvaluationRunTime) extends Expression `
* `case class ResultProxy(result: Any)`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171679
**[Test build #131089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131089/testReport)** for PR 30341 at commit [`4780b65`](https://github.com/apache/spark/commit/4780b653f079c2fd4c950d43758587875d89e41a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726259763
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725912063
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523384132
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
Review comment:
Renamed as `SubExprEvaluationRuntime` now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727420586
No need to hurry~ Take your time. I just want to give you feedback swiftly from my side.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-729338732
Sorry for a late comment. +1, nice.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524808901
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map(replaceWithProxy(_, proxyMap.toMap))
+ } else {
+ expressions
+ }
+ }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(
Review comment:
Ah, nvm. I figured it out.
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala
##########
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
+import org.apache.spark.sql.types.DataType
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * such as `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class SubExprEvaluationRuntime(cacheMaxEntries: Int) {
+
+ private[sql] val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(cacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ def getEval(proxy: ExpressionProxy): Any = try {
+ cache.get(proxy).result
+ } catch {
+ // Cache.get() may wrap the original exception. See the following URL
+ // http://google.github.io/guava/releases/14.0/api/docs/com/google/common/cache/
+ // Cache.html#get(K,%20java.util.concurrent.Callable)
+ case e @ (_: UncheckedExecutionException | _: ExecutionError) =>
+ throw e.getCause
+ }
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Recursively replaces expression with its proxy expression in `proxyMap`.
+ */
+ private def replaceWithProxy(
+ expr: Expression,
+ proxyMap: Map[Expression, ExpressionProxy]): Expression = {
+ proxyMap.getOrElse(expr, expr.mapChildren(replaceWithProxy(_, proxyMap)))
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ val proxyMap = mutable.Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map(replaceWithProxy(_, proxyMap.toMap))
+ } else {
+ expressions
+ }
+ }
+}
+
+/**
+ * A proxy for an catalyst `Expression`. Given a runtime object `SubExprEvaluationRuntime`,
+ * when this is asked to evaluate, it will load from the evaluation cache in the runtime first.
+ */
+case class ExpressionProxy(
Review comment:
<del>Okay, but in this case we cannot use `IdentityHashMap` anymore as `IdentityHashMap` doesn't use `equals` to compare keys.</del>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #30341:
URL: https://github.com/apache/spark/pull/30341
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725911883
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727535887
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727171747
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521861854
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -539,6 +539,15 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val SUBEXPRESSION_ELIMINATION_CACHE_MAX_ENTRIES =
+ buildConf("spark.sql.subexpressionElimination.cache.maxEntries")
+ .internal()
+ .doc("The maximum entries of the cache used for interpreted subexpression elimination.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(maxEntries => maxEntries >= 0, "The maximum must not be negative")
Review comment:
What happens `maxEntries = 0`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r522820004
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+ val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map { expr =>
+ // `transform` will cause stackoverflow because it keeps transforming into
+ // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+ // subexpressions at higher level. So we `transformDown` until finding first
+ // subexpression.
+ var transformed = false
+ expr.transform {
Review comment:
what if the single expr already have repeated sub-exprs? e.g. `SELECT (a+1) + (a+1)`. The code here seems can only replace one expression.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727541758
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r524801521
##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+
+class SubExprEvaluationRuntimeSuite extends SparkFunSuite {
+
+ test("Evaluate ExpressionProxy should create cached result") {
+ val runtime = new SubExprEvaluationRuntime(1)
+ val proxy = ExpressionProxy(Literal(1), runtime)
+ assert(runtime.cache.size() == 0)
+ proxy.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy) == ResultProxy(1))
+ }
+
+ test("SubExprEvaluationRuntime cannot exceed configured max entries") {
+ val runtime = new SubExprEvaluationRuntime(2)
+ assert(runtime.cache.size() == 0)
+
+ val proxy1 = ExpressionProxy(Literal(1), runtime)
+ proxy1.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+ val proxy2 = ExpressionProxy(Literal(2), runtime)
+ proxy2.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+ val proxy3 = ExpressionProxy(Literal(3), runtime)
+ proxy3.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy3) == ResultProxy(3))
+ }
+
+ test("setInput should empty cached result") {
+ val runtime = new SubExprEvaluationRuntime(2)
+ val proxy1 = ExpressionProxy(Literal(1), runtime)
+ assert(runtime.cache.size() == 0)
+ proxy1.eval()
+ assert(runtime.cache.size() == 1)
+ assert(runtime.cache.get(proxy1) == ResultProxy(1))
+
+ val proxy2 = ExpressionProxy(Literal(2), runtime)
+ proxy2.eval()
+ assert(runtime.cache.size() == 2)
+ assert(runtime.cache.get(proxy2) == ResultProxy(2))
+
+ runtime.setInput()
+ assert(runtime.cache.size() == 0)
+ }
+
+ test("Wrap ExpressionProxy on subexpressions") {
+ val runtime = new SubExprEvaluationRuntime(1)
+
+ val one = Literal(1)
+ val two = Literal(2)
+ val mul = Multiply(one, two)
+ val mul2 = Multiply(mul, mul)
+ val sqrt = Sqrt(mul2)
+ val sum = Add(mul2, sqrt)
+
+ // ( (one * two) * (one * two) ) + sqrt( (one * two) * (one * two) )
+ val proxyExpressions = runtime.proxyExpressions(Seq(sum))
+ val proxys = proxyExpressions.flatMap(_.collect {
+ case p: ExpressionProxy => p
+ })
+ // ( (one * two) * (one * two) )
+ assert(proxys.size == 2)
+ val expected = ExpressionProxy(mul2, runtime)
+ assert(proxys.head == expected)
Review comment:
yeah, you're right.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-727254222
It would be great if we can have a benchmark suite (from the code in the PR description), @viirya .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-726647711
The idea looks good! left several comments
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r523383340
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRuntime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRuntime {
+
+ val cache: LoadingCache[ExpressionProxy, ResultProxy] = CacheBuilder.newBuilder()
+ .maximumSize(SQLConf.get.subexpressionEliminationCacheMaxEntries)
+ .build(
+ new CacheLoader[ExpressionProxy, ResultProxy]() {
+ override def load(expr: ExpressionProxy): ResultProxy = {
+ ResultProxy(expr.proxyEval(currentInput))
+ }
+ })
+
+ private var currentInput: InternalRow = null
+
+ /**
+ * Sets given input row as current row for evaluating expressions. This cleans up the cache
+ * too as new input comes.
+ */
+ def setInput(input: InternalRow = null): Unit = {
+ currentInput = input
+ cache.invalidateAll()
+ }
+
+ /**
+ * Finds subexpressions and wraps them with `ExpressionProxy`.
+ */
+ def proxyExpressions(expressions: Seq[Expression]): Seq[Expression] = {
+ val equivalentExpressions: EquivalentExpressions = new EquivalentExpressions
+
+ expressions.foreach(equivalentExpressions.addExprTree(_))
+
+ var proxyMap = Map.empty[Expression, ExpressionProxy]
+
+ val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
+ commonExprs.foreach { e =>
+ val expr = e.head
+ val proxy = ExpressionProxy(expr, this)
+
+ proxyMap ++= e.map(_ -> proxy).toMap
+ }
+
+ // Only adding proxy if we find subexpressions.
+ if (proxyMap.nonEmpty) {
+ expressions.map { expr =>
+ // `transform` will cause stackoverflow because it keeps transforming into
+ // `ExpressionProxy`. But we cannot use `transformUp` because we want to use
+ // subexpressions at higher level. So we `transformDown` until finding first
+ // subexpression.
+ var transformed = false
Review comment:
Oh, good idea, I like it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725729230
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30341: [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30341:
URL: https://github.com/apache/spark/pull/30341#discussion_r521860409
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EvaluationRunTime.scala
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions
+
+import com.google.common.cache.{CacheBuilder, CacheLoader, LoadingCache}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This class helps subexpression elimination for interpreted evaluation
+ * in `InterpretedUnsafeProjection`. It maintains an evaluation cache.
+ * This class wraps `ExpressionProxy` around given expressions. The `ExpressionProxy`
+ * intercepts expression evaluation and loads from the cache first.
+ */
+class EvaluationRunTime {
Review comment:
Oh, good to know. I will change it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30341: [WIP][SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30341:
URL: https://github.com/apache/spark/pull/30341#issuecomment-725713159
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org