You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/17 15:49:00 UTC
[GitHub] [spark] wangyum opened a new pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
wangyum opened a new pull request #29790:
URL: https://github.com/apache/spark/pull/29790
### What changes were proposed in this pull request?
Some expression's data type not a static value. It needs to be calculated every time. e.g. `CaseWhen`.
We should avoid calling dataType multiple times for each expression.
### Why are the changes needed?
Improve query performance. for example:
```scala
spark.range(100000000L).selectExpr("approx_count_distinct(case when id % 400 > 20 then id else 0 end)").show
```
Profiling result:
```
-- Execution profile ---
Total samples : 18365
Frame buffer usage : 2.6688%
--- 58443254327 ns (31.82%), 5844 samples
[ 0] GenericTaskQueueSet<OverflowTaskQueue<StarTask, (MemoryType)1, 131072u>, (MemoryType)1>::steal_best_of_2(unsigned int, int*, StarTask&)
[ 1] StealTask::do_it(GCTaskManager*, unsigned int)
[ 2] GCTaskThread::run()
[ 3] java_start(Thread*)
[ 4] start_thread
--- 6140668667 ns (3.34%), 614 samples
[ 0] GenericTaskQueueSet<OverflowTaskQueue<StarTask, (MemoryType)1, 131072u>, (MemoryType)1>::peek()
[ 1] ParallelTaskTerminator::offer_termination(TerminatorTerminator*)
[ 2] StealTask::do_it(GCTaskManager*, unsigned int)
[ 3] GCTaskThread::run()
[ 4] java_start(Thread*)
[ 5] start_thread
--- 5679994036 ns (3.09%), 568 samples
[ 0] scala.collection.generic.Growable.$plus$plus$eq
[ 1] scala.collection.generic.Growable.$plus$plus$eq$
[ 2] scala.collection.mutable.ListBuffer.$plus$plus$eq
[ 3] scala.collection.mutable.ListBuffer.$plus$plus$eq
[ 4] scala.collection.generic.GenericTraversableTemplate.$anonfun$flatten$1
[ 5] scala.collection.generic.GenericTraversableTemplate$$Lambda$107.411506101.apply
[ 6] scala.collection.immutable.List.foreach
[ 7] scala.collection.generic.GenericTraversableTemplate.flatten
[ 8] scala.collection.generic.GenericTraversableTemplate.flatten$
[ 9] scala.collection.AbstractTraversable.flatten
[10] org.apache.spark.internal.config.ConfigEntry.readString
[11] org.apache.spark.internal.config.ConfigEntryWithDefault.readFrom
[12] org.apache.spark.sql.internal.SQLConf.getConf
[13] org.apache.spark.sql.internal.SQLConf.caseSensitiveAnalysis
[14] org.apache.spark.sql.types.DataType.sameType
[15] org.apache.spark.sql.catalyst.analysis.TypeCoercion$.$anonfun$haveSameType$1
[16] org.apache.spark.sql.catalyst.analysis.TypeCoercion$.$anonfun$haveSameType$1$adapted
[17] org.apache.spark.sql.catalyst.analysis.TypeCoercion$$$Lambda$1527.1975399904.apply
[18] scala.collection.IndexedSeqOptimized.prefixLengthImpl
[19] scala.collection.IndexedSeqOptimized.forall
[20] scala.collection.IndexedSeqOptimized.forall$
[21] scala.collection.mutable.ArrayBuffer.forall
[22] org.apache.spark.sql.catalyst.analysis.TypeCoercion$.haveSameType
[23] org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck
[24] org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck$
[25] org.apache.spark.sql.catalyst.expressions.CaseWhen.dataTypeCheck
[26] org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType
[27] org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType$
[28] org.apache.spark.sql.catalyst.expressions.CaseWhen.dataType
[29] org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus.update
[30] org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1.$anonfun$applyOrElse$2
[31] org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1.$anonfun$applyOrElse$2$adapted
[32] org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$Lambda$1534.1383512673.apply
[33] org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateProcessRow$7
[34] org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateProcessRow$7$adapted
[35] org.apache.spark.sql.execution.aggregate.AggregationIterator$$Lambda$1555.725788712.apply
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual test
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696802100
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128983/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696668112
**[Test build #128970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128970/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r496663486
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
##########
@@ -1048,6 +1048,11 @@ trait ComplexTypeMergingExpression extends Expression {
@transient
lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
+ private lazy val internalDataType: DataType = {
Review comment:
can we put it right before the line of `override def dataType: DataType`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696761600
**[Test build #128983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128983/testReport)** for PR 29790 at commit [`649d3c2`](https://github.com/apache/spark/commit/649d3c2bee57ed851a031ee01ceba0fe75dd6ef7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696762294
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r494810795
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,15 @@ object ArrayUnion {
since = "2.4.0")
case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
- dataTypeCheck
Review comment:
Do you mean add it back?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696960524
**[Test build #128985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128985/testReport)** for PR 29790 at commit [`f2dc664`](https://github.com/apache/spark/commit/f2dc664df7ccde629d9df8dcad46cf1657a52ff9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696547533
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128957/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696780076
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699894033
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599
**[Test build #128957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128957/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r494807719
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,15 @@ object ArrayUnion {
since = "2.4.0")
case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
- dataTypeCheck
Review comment:
After checking the code, seems `dateTypeCheck` is not put in `checkInputDataTypes`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492551773
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -225,10 +225,13 @@ case class ScalarSubquery(
children: Seq[Expression] = Seq.empty,
exprId: ExprId = NamedExpression.newExprId)
extends SubqueryExpression(plan, children, exprId) with Unevaluable {
- override def dataType: DataType = {
+
+ private lazy val internalDataType: DataType = {
Review comment:
does this need to be a lazy val? seems a very cheap method.
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3741,11 +3746,13 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
dataTypeCheck
Review comment:
shall we just remove this line? input data type check is part of the resolution procedure, and we don't need to do it again when accessing the data type.
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,16 @@ object ArrayUnion {
since = "2.4.0")
case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
+
+ private lazy val internalDataType: DataType = {
dataTypeCheck
Review comment:
ditto
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
@transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
Review comment:
is it expensive? it just creates a few objects.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694524598
**[Test build #128836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128836/testReport)** for PR 29790 at commit [`6a9c01f`](https://github.com/apache/spark/commit/6a9c01fa2243059f4441f38574fe0e437a78c7a0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599
**[Test build #128957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128957/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694338060
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694841883
**[Test build #128864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128864/testReport)** for PR 29790 at commit [`f5f3af5`](https://github.com/apache/spark/commit/f5f3af50fa4ca71e421c8580a9868623924d177e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696550307
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700709523
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33855/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492812835
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3741,11 +3746,13 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
dataTypeCheck
Review comment:
+1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696551094
**[Test build #128970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128970/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329270
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492553171
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
@transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
Review comment:
is it expensive? it just creates a few objects.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492551773
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -225,10 +225,13 @@ case class ScalarSubquery(
children: Seq[Expression] = Seq.empty,
exprId: ExprId = NamedExpression.newExprId)
extends SubqueryExpression(plan, children, exprId) with Unevaluable {
- override def dataType: DataType = {
+
+ private lazy val internalDataType: DataType = {
Review comment:
does this need to be a lazy val? seems a very cheap method.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r494810795
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,15 @@ object ArrayUnion {
since = "2.4.0")
case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
- dataTypeCheck
Review comment:
Do you mean add it back?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694328507
**[Test build #128834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128834/testReport)** for PR 29790 at commit [`906d2e0`](https://github.com/apache/spark/commit/906d2e064ca01e66ba4490ae5dc0e9c3608dd23f).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694702274
**[Test build #128864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128864/testReport)** for PR 29790 at commit [`f5f3af5`](https://github.com/apache/spark/commit/f5f3af50fa4ca71e421c8580a9868623924d177e).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696547519
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694702274
**[Test build #128864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128864/testReport)** for PR 29790 at commit [`f5f3af5`](https://github.com/apache/spark/commit/f5f3af50fa4ca71e421c8580a9868623924d177e).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694338060
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694341541
**[Test build #128836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128836/testReport)** for PR 29790 at commit [`6a9c01f`](https://github.com/apache/spark/commit/6a9c01fa2243059f4441f38574fe0e437a78c7a0).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696550307
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696783561
**[Test build #128985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128985/testReport)** for PR 29790 at commit [`f2dc664`](https://github.com/apache/spark/commit/f2dc664df7ccde629d9df8dcad46cf1657a52ff9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329930
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696551094
**[Test build #128970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128970/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696761600
**[Test build #128983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128983/testReport)** for PR 29790 at commit [`649d3c2`](https://github.com/apache/spark/commit/649d3c2bee57ed851a031ee01ceba0fe75dd6ef7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694842890
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699894015
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33789/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492473081
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
##########
@@ -1048,6 +1048,11 @@ trait ComplexTypeMergingExpression extends Expression {
@transient
lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
+ private lazy val internalDataType: DataType = {
Review comment:
If only change:
```scala
override def dataType: DataType = {
dataTypeCheck
inputTypesForMerging.reduceLeft(TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(_, _).get)
}
```
to
```scala
lazy val dataType: DataType = {
dataTypeCheck
inputTypesForMerging.reduceLeft(TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(_, _).get)
}
```
will throw:
```java
[error] /Users/yumwang/opensource/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:558: super may not be used on lazy value dataType
[error] super.dataType.asInstanceOf[MapType]
[error] ^
[error] /Users/yumwang/opensource/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:2077: super may not be used on lazy value dataType
[error] super.dataType
[error] ^
[error] two errors found
```
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
@transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
Review comment:
This is to improve this case:
![image](https://user-images.githubusercontent.com/5399861/93886687-55474d00-fd18-11ea-9948-95f6e0072d1c.png)
Benchmark code | Before this PR(Seconds) | After this PR(Seconds)
-- | -- | --
spark.range(100000000L).selectExpr("approx_count_distinct(map_entries(map(1, id)))").collect() | 21787 | 15551
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
@transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
Review comment:
This is to improve this case:
![image](https://user-images.githubusercontent.com/5399861/93886687-55474d00-fd18-11ea-9948-95f6e0072d1c.png)
Benchmark code | Before this PR(Milliseconds) | After this PR(Milliseconds)
-- | -- | --
spark.range(100000000L).selectExpr("approx_count_distinct(map_entries(map(1, id)))").collect() | 21787 | 15551
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3741,11 +3746,13 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
dataTypeCheck
Review comment:
+1
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -225,10 +225,13 @@ case class ScalarSubquery(
children: Seq[Expression] = Seq.empty,
exprId: ExprId = NamedExpression.newExprId)
extends SubqueryExpression(plan, children, exprId) with Unevaluable {
- override def dataType: DataType = {
+
+ private lazy val internalDataType: DataType = {
Review comment:
I reverted this change because I did not find an expression to call this method many times.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699863097
**[Test build #129174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129174/testReport)** for PR 29790 at commit [`6a8877d`](https://github.com/apache/spark/commit/6a8877df876f0afd3c3b9c4248c61beceddcbb11).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492722255
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
@transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
Review comment:
This is to improve this case:
![image](https://user-images.githubusercontent.com/5399861/93886687-55474d00-fd18-11ea-9948-95f6e0072d1c.png)
Benchmark code | Before this PR(Seconds) | After this PR(Seconds)
-- | -- | --
spark.range(100000000L).selectExpr("approx_count_distinct(map_entries(map(1, id)))").collect() | 21787 | 15551
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696802081
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700725596
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700885340
**[Test build #129238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129238/testReport)** for PR 29790 at commit [`37d0786`](https://github.com/apache/spark/commit/37d0786201ec9109b661e19a5152fa53b1d1cad4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492722255
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -368,7 +368,7 @@ case class MapEntries(child: Expression)
@transient private lazy val childDataType: MapType = child.dataType.asInstanceOf[MapType]
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
Review comment:
This is to improve this case:
![image](https://user-images.githubusercontent.com/5399861/93886687-55474d00-fd18-11ea-9948-95f6e0072d1c.png)
Benchmark code | Before this PR(Milliseconds) | After this PR(Milliseconds)
-- | -- | --
spark.range(100000000L).selectExpr("approx_count_distinct(map_entries(map(1, id)))").collect() | 21787 | 15551
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696547519
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700002852
**[Test build #129174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129174/testReport)** for PR 29790 at commit [`6a8877d`](https://github.com/apache/spark/commit/6a8877df876f0afd3c3b9c4248c61beceddcbb11).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699894033
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696783561
**[Test build #128985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128985/testReport)** for PR 29790 at commit [`f2dc664`](https://github.com/apache/spark/commit/f2dc664df7ccde629d9df8dcad46cf1657a52ff9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696961631
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329270
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700005928
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696668904
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478805
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694526069
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694828526
do we really need such an invasive change? If there is a specific expression that calls `dataType` many times, let's fix that expression only. Or if this can bring significant end-to-end perf speedup, we can consider accepting it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696961631
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696551577
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694702875
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #29790:
URL: https://github.com/apache/spark/pull/29790
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700683847
**[Test build #129238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129238/testReport)** for PR 29790 at commit [`37d0786`](https://github.com/apache/spark/commit/37d0786201ec9109b661e19a5152fa53b1d1cad4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700886576
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492814578
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
##########
@@ -225,10 +225,13 @@ case class ScalarSubquery(
children: Seq[Expression] = Seq.empty,
exprId: ExprId = NamedExpression.newExprId)
extends SubqueryExpression(plan, children, exprId) with Unevaluable {
- override def dataType: DataType = {
+
+ private lazy val internalDataType: DataType = {
Review comment:
I reverted this change because I did not find an expression to call this method many times.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694341541
**[Test build #128836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128836/testReport)** for PR 29790 at commit [`6a9c01f`](https://github.com/apache/spark/commit/6a9c01fa2243059f4441f38574fe0e437a78c7a0).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492473081
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
##########
@@ -1048,6 +1048,11 @@ trait ComplexTypeMergingExpression extends Expression {
@transient
lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
+ private lazy val internalDataType: DataType = {
Review comment:
If only change:
```scala
override def dataType: DataType = {
dataTypeCheck
inputTypesForMerging.reduceLeft(TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(_, _).get)
}
```
to
```scala
lazy val dataType: DataType = {
dataTypeCheck
inputTypesForMerging.reduceLeft(TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(_, _).get)
}
```
will throw:
```java
[error] /Users/yumwang/opensource/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:558: super may not be used on lazy value dataType
[error] super.dataType.asInstanceOf[MapType]
[error] ^
[error] /Users/yumwang/opensource/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:2077: super may not be used on lazy value dataType
[error] super.dataType
[error] ^
[error] two errors found
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696478599
**[Test build #128957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128957/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694526069
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700886576
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329930
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694702875
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696668904
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492552898
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,16 @@ object ArrayUnion {
since = "2.4.0")
case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
+
+ private lazy val internalDataType: DataType = {
dataTypeCheck
Review comment:
ditto
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700725596
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696546740
**[Test build #128957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128957/testReport)** for PR 29790 at commit [`5755273`](https://github.com/apache/spark/commit/5755273fdc7886a69843458fce7c8e0a7ca6bbcc).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700725568
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33855/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r492552680
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3741,11 +3746,13 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
case class ArrayExcept(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
+ private lazy val internalDataType: DataType = {
dataTypeCheck
Review comment:
shall we just remove this line? input data type check is part of the resolution procedure, and we don't need to do it again when accessing the data type.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699863097
**[Test build #129174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129174/testReport)** for PR 29790 at commit [`6a8877d`](https://github.com/apache/spark/commit/6a8877df876f0afd3c3b9c4248c61beceddcbb11).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696762294
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696780076
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694829038
Or if an expression has a very complicated `def dataType`, can we change it to `lazy val dataType`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700005941
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694842890
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r494807719
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3498,13 +3500,15 @@ object ArrayUnion {
since = "2.4.0")
case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike
with ComplexTypeMergingExpression {
- override def dataType: DataType = {
- dataTypeCheck
Review comment:
After checking the code, seems `dateTypeCheck` is not put in `checkInputDataTypes`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-703615807
Merged to master.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329911
**[Test build #128834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128834/testReport)** for PR 29790 at commit [`906d2e0`](https://github.com/apache/spark/commit/906d2e064ca01e66ba4490ae5dc0e9c3608dd23f).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696802081
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-700683847
**[Test build #129238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129238/testReport)** for PR 29790 at commit [`37d0786`](https://github.com/apache/spark/commit/37d0786201ec9109b661e19a5152fa53b1d1cad4).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid constructing dataType multiple times
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-699884789
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33789/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694329946
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128834/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29790:
URL: https://github.com/apache/spark/pull/29790#discussion_r490646094
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
##########
@@ -400,7 +403,10 @@ case class Expm1(child: Expression) extends UnaryMathExpression(StrictMath.expm1
""",
since = "1.4.0")
case class Floor(child: Expression) extends UnaryMathExpression(math.floor, "FLOOR") {
- override def dataType: DataType = child.dataType match {
+
+ private lazy val childDataType = child.dataType
Review comment:
This looks a pretty common code pattern, so could we move this into base classes, e.g., lazy val in `UnaryExpression`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-694328507
**[Test build #128834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128834/testReport)** for PR 29790 at commit [`906d2e0`](https://github.com/apache/spark/commit/906d2e064ca01e66ba4490ae5dc0e9c3608dd23f).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696801765
**[Test build #128983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128983/testReport)** for PR 29790 at commit [`649d3c2`](https://github.com/apache/spark/commit/649d3c2bee57ed851a031ee01ceba0fe75dd6ef7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29790:
URL: https://github.com/apache/spark/pull/29790#issuecomment-696551577
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org