You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2016/06/16 08:13:32 UTC

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/13704

    [SPARK-15985][SQL] Reduce runtime overhead of a program that reads an primitive array in Dataset

    ## What changes were proposed in this pull request?
    
    This PR reduces runtime overhead of a program the reads an primitive array in Dataset. Generated code copies array elements from Dataset to a temporary array. If we know that types of source and destination are primitive array, we apply one of the following optimization:
    
    1. Eliminate an allocation of ```Object[]``` and call ```ArrayData.to<Type>Array()``` method if types of source and destination are the same
    2. Eliminate a pair of ```isNullAt()``` and a ```null``` assignment and allocate an primitive array instead of ```Object[]``` if types of source and destination are different
    
    
    An example program
    ```
    val ds = Seq(Array(1.0, 2.0, 3.0), Array(4.0, 5.0, 6.0)).toDS()
    ds.map(p => {
         var s = 0.0
         for (i <- 0 to 2) { s += p(i) }
         s
       }).show
    ```
    
    Generated code before applying this PR
    ```
    /* 036 */   protected void processNext() throws java.io.IOException {
    /* 037 */     while (inputadapter_input.hasNext()) {
    /* 038 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
    /* 039 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
    /* 040 */       ArrayData inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getArray(0));
    /* 041 */
    /* 042 */       boolean deserializetoobject_isNull1 = inputadapter_isNull;
    /* 043 */       ArrayData deserializetoobject_value1 = null;
    /* 044 */       if (!inputadapter_isNull) {
    /* 045 */         final int deserializetoobject_n = inputadapter_value.numElements();
    /* 046 */         final Object[] deserializetoobject_values = new Object[deserializetoobject_n];
    /* 047 */         for (int deserializetoobject_j = 0; deserializetoobject_j < deserializetoobject_n; deserializetoobject_j ++) {
    /* 048 */           if (inputadapter_value.isNullAt(deserializetoobject_j)) {
    /* 049 */             deserializetoobject_values[deserializetoobject_j] = null;
    /* 050 */           } else {
    /* 051 */             boolean deserializetoobject_feNull = false;
    /* 052 */             double deserializetoobject_fePrim =
    /* 053 */             inputadapter_value.getDouble(deserializetoobject_j);
    /* 054 */
    /* 055 */             boolean deserializetoobject_teNull = deserializetoobject_feNull;
    /* 056 */             double deserializetoobject_tePrim = -1.0;
    /* 057 */             if (!deserializetoobject_feNull) {
    /* 058 */               deserializetoobject_tePrim = deserializetoobject_fePrim;
    /* 059 */             }
    /* 060 */
    /* 061 */             if (deserializetoobject_teNull) {
    /* 062 */               deserializetoobject_values[deserializetoobject_j] = null;
    /* 063 */             } else {
    /* 064 */               deserializetoobject_values[deserializetoobject_j] = deserializetoobject_tePrim;
    /* 065 */             }
    /* 066 */           }
    /* 067 */         }
    /* 068 */         deserializetoobject_value1 = new org.apache.spark.sql.catalyst.util.GenericArrayData(deserializetoobject_values);
    /* 069 */
    /* 070 */       }
    /* 071 */
    /* 072 */       boolean deserializetoobject_isNull = deserializetoobject_isNull1;
    /* 073 */       final double[] deserializetoobject_value = deserializetoobject_isNull ? null : (double[]) deserializetoobject_value1.toDoubleArray();
    /* 074 */       deserializetoobject_isNull = deserializetoobject_value == null;
    /* 075 */
    /* 076 */       Object mapelements_obj = ((Expression) references[0]).eval(null);
    /* 077 */       scala.Function1 mapelements_value1 = (scala.Function1) mapelements_obj;
    /* 078 */
    /* 079 */       boolean mapelements_isNull = false || deserializetoobject_isNull;
    /* 080 */       final double mapelements_value = mapelements_isNull ? -1.0 : (Double) mapelements_value1.apply(deserializetoobject_value);
    /* 081 */
    /* 082 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 083 */
    /* 084 */       if (mapelements_isNull) {
    /* 085 */         serializefromobject_rowWriter.setNullAt(0);
    /* 086 */       } else {
    /* 087 */         serializefromobject_rowWriter.write(0, mapelements_value);
    /* 088 */       }
    /* 089 */       append(serializefromobject_result);
    /* 090 */       if (shouldStop()) return;
    /* 091 */     }
    /* 092 */   }
    ```
    
    Generated code after applying this PR
    ```
    /* 036 */   protected void processNext() throws java.io.IOException {
    /* 037 */     while (inputadapter_input.hasNext()) {
    /* 038 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
    /* 039 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
    /* 040 */       ArrayData inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getArray(0));
    /* 041 */
    /* 042 */       boolean deserializetoobject_isNull1 = inputadapter_isNull;
    /* 043 */       ArrayData deserializetoobject_value1 = null;
    /* 044 */       if (!inputadapter_isNull) {
    /* 045 */         final double[] deserializetoobject_values = inputadapter_value.toDoubleArray();
    /* 046 */         deserializetoobject_value1 = new org.apache.spark.sql.catalyst.util.GenericArrayData(deserializetoobject_values);
    /* 047 */
    /* 048 */       }
    /* 049 */
    /* 050 */       boolean deserializetoobject_isNull = deserializetoobject_isNull1;
    /* 051 */       final double[] deserializetoobject_value = deserializetoobject_isNull ? null : (double[]) deserializetoobject_value1.toDoubleArray();
    /* 052 */       deserializetoobject_isNull = deserializetoobject_value == null;
    /* 053 */
    /* 054 */       Object mapelements_obj = ((Expression) references[0]).eval(null);
    /* 055 */       scala.Function1 mapelements_value1 = (scala.Function1) mapelements_obj;
    /* 056 */
    /* 057 */       boolean mapelements_isNull = false || deserializetoobject_isNull;
    /* 058 */       final double mapelements_value = mapelements_isNull ? -1.0 : (Double) mapelements_value1.apply(deserializetoobject_value);
    /* 059 */
    /* 060 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 061 */
    /* 062 */       if (mapelements_isNull) {
    /* 063 */         serializefromobject_rowWriter.setNullAt(0);
    /* 064 */       } else {
    /* 065 */         serializefromobject_rowWriter.write(0, mapelements_value);
    /* 066 */       }
    /* 067 */       append(serializefromobject_result);
    /* 068 */       if (shouldStop()) return;
    /* 069 */     }
    /* 070 */   }
    ```
    
    
    ## How was this patch tested?
    
    Tested by existing Dataset unit tests
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-15985

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13704
    
----
commit 17f17d60794c1f0ab81e21ec2484742a7610f06d
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2016-06-16T07:46:27Z

    optimize to read primitive array elements in Dataset

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r72915873
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1441,6 +1441,12 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     object SimplifyCasts extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case Cast(e, dataType) if e.dataType == dataType => e
    +    case c @ Cast(e, dataType) => (e.dataType, dataType) match {
    --- End diff --
    
    ping @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61677/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r68548441
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -837,8 +837,36 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
         val j = ctx.freshName("j")
         val values = ctx.freshName("values")
     
    +    val isPrimitiveFrom = ctx.isPrimitiveType(fromType)
    +    val isPrimitiveTo = ctx.isPrimitiveType(toType)
    +
         (c, evPrim, evNull) =>
    -      s"""
    +      if (isPrimitiveFrom && isPrimitiveTo) {
    +        // ensure no null in input and output arrays
    --- End diff --
    
    I assume that here is a part to generate code for Java primitive arrays regarding ```from``` and ```to```. Since Java primitive array (e.g. int[]) cannot have ```null``` value unlike SQL, I said "ensure not null in input and output arrays."
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62097/consoleFull)** for PR 13704 at commit [`b7477de`](https://github.com/apache/spark/commit/b7477de4c79dac42b42d745e419654dcf831bdba).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class SimplifyCastsSuite extends PlanTest `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    can you also put the plan tree of the example program in PR description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70364606
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    --- End diff --
    
    we should also test `nullable to non-nullable`, to make sure we don't optimize it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62152/consoleFull)** for PR 13704 at commit [`8dd829a`](https://github.com/apache/spark/commit/8dd829a4922441cc09dee08b532e6b3c90780535).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #60636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60636/consoleFull)** for PR 13704 at commit [`17f17d6`](https://github.com/apache/spark/commit/17f17d60794c1f0ab81e21ec2484742a7610f06d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62504/consoleFull)** for PR 13704 at commit [`cbcfd56`](https://github.com/apache/spark/commit/cbcfd561d92c02395d685c46cb09cce802b22727).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @liancheng would it be possible to review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62570/consoleFull)** for PR 13704 at commit [`e4cd571`](https://github.com/apache/spark/commit/e4cd571bc07a2b8c45580d9ba60f66d5b40b7422).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    ping @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62050/consoleFull)** for PR 13704 at commit [`43ced15`](https://github.com/apache/spark/commit/43ced1576f31a202c1c514c4bfe28e0ad0d4c964).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class PrimitiveArrayBenchmark extends BenchmarkBase `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    I'm wondering why we have this `Cast` in `DeserializeToObject`. Isn't the `value#63` already a double array?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62239/consoleFull)** for PR 13704 at commit [`355f5a5`](https://github.com/apache/spark/commit/355f5a54c1c2fe2f087e6c5d30fbd40aa1d0b6d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62071/consoleFull)** for PR 13704 at commit [`66800fa`](https://github.com/apache/spark/commit/66800faaebf72e492ee7693d81f8dba980f1dab2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62239/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70562239
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,121 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  def array(arrayType: ArrayType): AttributeReference =
    --- End diff --
    
    can we put in into https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L242?
    
    Then in the test we can just write `val input = LocalRelation('a.array(ArrayType(IntegerType, nullable = false)))`, and no need to use `array_intPrimitive`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #64594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64594/consoleFull)** for PR 13704 at commit [`c8f87a1`](https://github.com/apache/spark/commit/c8f87a1a22dfa9c05aa2f514a7af43c046188b38).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @cloud-fan I added test cases. Could you review it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70364232
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    --- End diff --
    
    `input.select('a.cast(ArrayType(IntegerType, false)).as('casted))`, then we don't need to create the literal


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62102/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62155/consoleFull)** for PR 13704 at commit [`466ba88`](https://github.com/apache/spark/commit/466ba882d1630a6ade3a94c381be20551ec19111).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    From the plan tree given by you, the `Cast` is casting `value#63` to array type, not `ObjectType(classOf[Array[Double]])`, there should be some other reason that we need to find out(maybe it's because the nullability of array element is different).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70194231
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2018,6 +2018,8 @@ class Analyzer(
                 fail(child, DateType, walkedTypePath)
               case (StringType, to: NumericType) =>
                 fail(child, to, walkedTypePath)
    +          case (from: ArrayType, to: ArrayType) if !from.containsNull =>
    --- End diff --
    
    How about we improve the `SimplifyCasts` rule, to eliminate the cast from non-element-nullable array to nullable ones? And we can handle map too.
    
    Then we can add unit tests for it, instead of adding a benchmark. I think benchmark is unnecessary for this case, removing a `Cast` is definitely faster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70397586
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,120 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, true)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val array_intNull = Literal.create(
    +      Seq(1, 2, null, 4, 5), ArrayType(IntegerType, true))
    +    val plan = input.select(array_intNull
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    assert(optimized.resolved === false)
    --- End diff --
    
    how could this be? If one optimizer rule makes a resolved plan to unresolved, then these must be something wrong in that rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @cloud-fan , regarding a ```cast```, I think that you are correct. When we insert a ```cast```, I pass information on ```ArrayType.containsNull``` to ```Cast``` now. In this case, it is ```false```. As a result, after the logical optimizations, we got a plan tree with ```Cast```. I updated the description of PR.
    
    ```
    == Analyzed Logical Plan ==
    value: double
    SerializeFromObject [input[0, double, true] AS value#6]
    +- MapElements <function1>, obj#5: double
       +- DeserializeToObject cast(value#1 as array<double>).toDoubleArray, obj#4: [D
          +- LocalRelation [value#1]
    
    == Optimized Logical Plan ==
    SerializeFromObject [input[0, double, true] AS value#6]
    +- MapElements <function1>, obj#5: double
       +- DeserializeToObject value#1.toDoubleArray, obj#4: [D
          +- LocalRelation [value#1]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #61630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61630/consoleFull)** for PR 13704 at commit [`256c861`](https://github.com/apache/spark/commit/256c8616add059299765a7c7f99e5fa389fd65f0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r69391625
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -837,8 +837,36 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
         val j = ctx.freshName("j")
         val values = ctx.freshName("values")
     
    +    val isPrimitiveFrom = ctx.isPrimitiveType(fromType)
    --- End diff --
    
    Yes, you are right. The latest code also checks ```ArrayType.containsNull``` of ```from``` and ```to```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62102 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62102/consoleFull)** for PR 13704 at commit [`d99bd20`](https://github.com/apache/spark/commit/d99bd20465a352f0b01434039bdf50ac252b27ad).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62159/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62178/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r71168135
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,119 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    --- End diff --
    
    can we just write `input.select('a.cast(ArrayType(IntegerType, false)).as("casted"))`? then we don't need `array_intPrimitive`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64594/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61629/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62155/consoleFull)** for PR 13704 at commit [`466ba88`](https://github.com/apache/spark/commit/466ba882d1630a6ade3a94c381be20551ec19111).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    left some comment, let's go ahead and merge it after that :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r71278812
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,112 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    comparePlans(optimized, plan)
    +  }
    +
    +  test("nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  def map(keyType: DataType, valueType: DataType, nullable: Boolean): AttributeReference =
    --- End diff --
    
    This is because [current map()](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L246) cannot pass information on `nullable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    I agree that ```value#63``` is double array type. ```value#63``` is stored as ```UnsafeArrayData```.
    ```<function1>``` in ```MapElements <function1>, obj#67: double``` is represented by Java byte code instead of ```Expressions```.  We have to pass an Java primitive array ``double[]`` instead of double array in ```UnsafeArrayData```. I think that ```Cast``` performs this conversion of double array from ```UnsafeArrayData``` to ```double[]```. This is an issue only for Dataset.
    
    What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62102/consoleFull)** for PR 13704 at commit [`d99bd20`](https://github.com/apache/spark/commit/d99bd20465a352f0b01434039bdf50ac252b27ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r71275696
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,112 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    comparePlans(optimized, plan)
    +  }
    +
    +  test("nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  def map(keyType: DataType, valueType: DataType, nullable: Boolean): AttributeReference =
    --- End diff --
    
    why need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62159 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62159/consoleFull)** for PR 13704 at commit [`e6a5772`](https://github.com/apache/spark/commit/e6a5772f4a957cef353d493763cec84321d365fb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62097/consoleFull)** for PR 13704 at commit [`b7477de`](https://github.com/apache/spark/commit/b7477de4c79dac42b42d745e419654dcf831bdba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62087/consoleFull)** for PR 13704 at commit [`1bbe859`](https://github.com/apache/spark/commit/1bbe859804d999e30b8ed7f51b13121e30118d5a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62050/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #64658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64658/consoleFull)** for PR 13704 at commit [`40ac2bc`](https://github.com/apache/spark/commit/40ac2bcd2b3a631ca2121ed24137cbb197d1b278).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62057/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62570/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62239/consoleFull)** for PR 13704 at commit [`355f5a5`](https://github.com/apache/spark/commit/355f5a54c1c2fe2f087e6c5d30fbd40aa1d0b6d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13704


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62538/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r68544526
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -837,8 +837,36 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
         val j = ctx.freshName("j")
         val values = ctx.freshName("values")
     
    +    val isPrimitiveFrom = ctx.isPrimitiveType(fromType)
    +    val isPrimitiveTo = ctx.isPrimitiveType(toType)
    +
         (c, evPrim, evNull) =>
    -      s"""
    +      if (isPrimitiveFrom && isPrimitiveTo) {
    +        // ensure no null in input and output arrays
    --- End diff --
    
    where do we ensure it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13704

Regarding the plan tree printout (I removed my debug information), the ```Cast``` performs conversion from ```UnsafeArrayData``` to ```GenericArrayData``` since the target type of ```Cast``` is ```ArrayType<DoubleType>```. Since ```Cast``` shows ```dataType. simpleType``` of the target type, it is shown as ```array<double>```.

Regarding the generated code, we seems to be on the same page. What you said is not done in ```Cast``` now, and what I did. IIUC, current ```Cast``` [code generation for array](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L828) seems to be conservative. Current goal of cast code generation is to create ```GenericArrayData``` object. This code generation always creates ```Object[]``` and assign values into each ```Object[]``` element. Then, the generated code passes ```Object[]``` to the constructor of ```GenericArrayData```. As you pointed out, if code generation takes care of nullability, it can avoid to create ```Object[]```. Unfortunately, it is not done in [the current code]([code generation for array](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L828) (Lines 828-862).
The code generated by this PR can also use specialized ```GenericArrayData``` implemented by https://github.com/apache/spark/pull/13758.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r76721154
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,101 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    --- End diff --
    
    `non-nullable element array to nullable element array cast`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62178/consoleFull)** for PR 13704 at commit [`a236e5e`](https://github.com/apache/spark/commit/a236e5e94935ae9453282c5457589b3609e8b7c6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r72080255
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1441,6 +1441,12 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     object SimplifyCasts extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case Cast(e, dataType) if e.dataType == dataType => e
    +    case c @ Cast(e, dataType) => (e.dataType, dataType) match {
    --- End diff --
    
    It would be nice if somehow we can narrow the scope of this optimization to the exact case described in this PR. In general, making a nullable type non-nullable is not safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62504/consoleFull)** for PR 13704 at commit [`cbcfd56`](https://github.com/apache/spark/commit/cbcfd561d92c02395d685c46cb09cce802b22727).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62538/consoleFull)** for PR 13704 at commit [`e4cd571`](https://github.com/apache/spark/commit/e4cd571bc07a2b8c45580d9ba60f66d5b40b7422).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70402730
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,120 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, true)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val array_intNull = Literal.create(
    +      Seq(1, 2, null, 4, 5), ArrayType(IntegerType, true))
    +    val plan = input.select(array_intNull
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    assert(optimized.resolved === false)
    --- End diff --
    
    > This is because `plan.resolved == true` while `optimized.resolved == false`.
    
    I'm confused about it, if the cast should produce "unresolved", why `plan.resolved == true`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62570/consoleFull)** for PR 13704 at commit [`e4cd571`](https://github.com/apache/spark/commit/e4cd571bc07a2b8c45580d9ba60f66d5b40b7422).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @liancheng Could you please review this?
    cc: @cloud-fan , we are waiting for @liancheng 's review for a long time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @cloud-fan could you please review this? As you pointed, I also changed code related to `cast`. I added benchmark results, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70381491
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    --- End diff --
    
    ah, i see. in `'a.array(dt)`, the `dt` is element type, so you are creating an array of array. However, the `array` method doesn't take `nullable`, maybe we should fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Sorry I should say it more explicitly: `value#63` is already double array type. `UnsafeArrayData` is an internal data representation of array type, so it seems weird to have the `Cast` there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @cloud-fan, I checked the following
    >Let me check which code portion inserts Cast in this tree.
    
    IIUC,  [this code](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L2021) inserts the corresponding ```Cast``` in ```Analyzed Logical Plan``` to ```upcast``` in  ```Parsed Logical Plan```.
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70386711
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,120 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, true)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val array_intNull = Literal.create(
    +      Seq(1, 2, null, 4, 5), ArrayType(IntegerType, true))
    +    val plan = input.select(array_intNull
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    assert(optimized.resolved === false)
    --- End diff --
    
    we can check `comparePlans(optimized, plan)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #64594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64594/consoleFull)** for PR 13704 at commit [`c8f87a1`](https://github.com/apache/spark/commit/c8f87a1a22dfa9c05aa2f514a7af43c046188b38).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r71333538
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,112 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    comparePlans(optimized, plan)
    +  }
    +
    +  test("nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  def map(keyType: DataType, valueType: DataType, nullable: Boolean): AttributeReference =
    --- End diff --
    
    but this [map()](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L249) can, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @liancheng Could you please review this since I resolved conflict?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #61677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61677/consoleFull)** for PR 13704 at commit [`77859cf`](https://github.com/apache/spark/commit/77859cf4397b8a5022b93ffa4996203b36dfef1b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r71275743
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1441,6 +1441,12 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     object SimplifyCasts extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case Cast(e, dataType) if e.dataType == dataType => e
    +    case c @ Cast(e, dataType) => (e.dataType, dataType) match {
    --- End diff --
    
    cc @yhuai  @liancheng, is it always safe to do it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70199583
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2018,6 +2018,8 @@ class Analyzer(
                 fail(child, DateType, walkedTypePath)
               case (StringType, to: NumericType) =>
                 fail(child, to, walkedTypePath)
    +          case (from: ArrayType, to: ArrayType) if !from.containsNull =>
    --- End diff --
    
    I will try improving the `SimplifyCasts` rule to force to eliminate the cast from non-element-nullable array to nullable ones. I do not understand the following. Will it be automatically done by improving the `SimplifyCasts` or do we need to improve another rule?
    > "we can handle map too"
    
    I will add unit tests for it. I think that it is good to add a benchmark to show degree of improvements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r72077898
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1441,6 +1441,12 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     object SimplifyCasts extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case Cast(e, dataType) if e.dataType == dataType => e
    +    case c @ Cast(e, dataType) => (e.dataType, dataType) match {
    --- End diff --
    
    As long as we only do this kind of casting in the case mentioned above, it should be safe. But currently I'm not sure whether that is true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    LGTM, one last comment: https://github.com/apache/spark/pull/13704/files#r70381491
    
    BTW can you also update the PR description? e.g. we don't need to show the difference of generated code as there is no codegen related changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70418445
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,120 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType)))
    --- End diff --
    
    for map, we have
    ```
    def map(keyType: DataType, valueType: DataType): AttributeReference =
      map(MapType(keyType, valueType))
    
    def map(mapType: MapType): AttributeReference =
      AttributeReference(s, mapType, nullable = true)()
    ```
    we can also add similar one to array
    ```
    def array(arrayType: ArrayType): AttributeReference
    ```
    Then here we can create an array type attribute with expected nullability, and avoid using literal in optimizer test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70404870
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,120 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, true)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val array_intNull = Literal.create(
    +      Seq(1, 2, null, 4, 5), ArrayType(IntegerType, true))
    +    val plan = input.select(array_intNull
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    assert(optimized.resolved === false)
    --- End diff --
    
    I am very sorry. I made some mistakes before. Now, `comparePlans(optimized, plan)` worked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70202042
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2018,6 +2018,8 @@ class Analyzer(
                 fail(child, DateType, walkedTypePath)
               case (StringType, to: NumericType) =>
                 fail(child, to, walkedTypePath)
    +          case (from: ArrayType, to: ArrayType) if !from.containsNull =>
    --- End diff --
    
    I mean MapType. It's similar to ArrayType, the value of it can be nullable or non-nullable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r71346126
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,112 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    comparePlans(optimized, plan)
    +  }
    +
    +  test("nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  def map(keyType: DataType, valueType: DataType, nullable: Boolean): AttributeReference =
    --- End diff --
    
    I eliminated them by using the similar approach to what we did for `Array`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70398569
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,120 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, true)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val array_intNull = Literal.create(
    +      Seq(1, 2, null, 4, 5), ArrayType(IntegerType, true))
    +    val plan = input.select(array_intNull
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    assert(optimized.resolved === false)
    --- End diff --
    
    I thought that `optimized.resolved = false` is an expected result. This because a cast from `ArrayType(IntegerType, true)` to `ArrayType(IntegerType, false)` produces `resolved == false` without an optimization in [this test case](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala#L521). I would appreciate it if you correct my understanding.
    
    In addition, I will check my test case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r71215193
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,119 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    --- End diff --
    
    The following code caused an exception. Could you please let me know my mistakes?
    
    ```java
      test("non-nullable to non-nullable array cast") {
        val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
        val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted"))
        val optimized = Optimize.execute(plan)
        comparePlans(optimized, plan)
      }
    ```
    
    ```java
    Invalid call to dataType on unresolved object, tree: 'a
    org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'a
    	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:61)
    	at org.apache.spark.sql.catalyst.optimizer.SimplifyCasts$$anonfun$apply$33.applyOrElse(Optimizer.scala:1443)
    	at org.apache.spark.sql.catalyst.optimizer.SimplifyCasts$$anonfun$apply$33.applyOrElse(Optimizer.scala:1442)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
    	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:284)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:322)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:320)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:156)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:166)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:170)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:170)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$4.apply(QueryPlan.scala:175)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:175)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:144)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformAllExpressions$1.applyOrElse(QueryPlan.scala:220)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformAllExpressions$1.applyOrElse(QueryPlan.scala:219)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:279)
    	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:268)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformAllExpressions(QueryPlan.scala:219)
    	at org.apache.spark.sql.catalyst.optimizer.SimplifyCasts$.apply(Optimizer.scala:1442)
    	at org.apache.spark.sql.catalyst.optimizer.SimplifyCasts$.apply(Optimizer.scala:1441)
    	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
    	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
    	at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
    	at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
    	at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
    	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
    	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
    	at org.apache.spark.sql.catalyst.optimizer.SimplifyCastsSuite$$anonfun$1.apply$mcV$sp(SimplifyCastsSuite.scala:38)
    	at org.apache.spark.sql.catalyst.optimizer.SimplifyCastsSuite$$anonfun$1.apply(SimplifyCastsSuite.scala:35)
    	at org.apache.spark.sql.catalyst.optimizer.SimplifyCastsSuite$$anonfun$1.apply(SimplifyCastsSuite.scala:35)
    	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
    	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
    	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
    	at org.scalatest.Transformer.apply(Transformer.scala:22)
    	at org.scalatest.Transformer.apply(Transformer.scala:20)
    	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
    	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
    	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
    	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
    	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
    	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
    	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
    	at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
    	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
    	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
    	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
    	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
    	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
    	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
    	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
    	at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
    	at org.scalatest.Suite$class.run(Suite.scala:1424)
    	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
    	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
    	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
    	at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
    	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
    	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
    	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
    	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
    	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
    	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
    	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
    	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
    	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
    	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
    	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
    	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
    	at org.scalatest.tools.Runner$.run(Runner.scala:883)
    	at org.scalatest.tools.Runner.run(Runner.scala)
    	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
    	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Thanks again for your kindly reviews. I addresses both comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62084/consoleFull)** for PR 13704 at commit [`c31729f`](https://github.com/apache/spark/commit/c31729f361b5774f36834daeddb338f49377130e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62538/consoleFull)** for PR 13704 at commit [`e4cd571`](https://github.com/apache/spark/commit/e4cd571bc07a2b8c45580d9ba60f66d5b40b7422).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70395547
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,120 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    +      .cast(ArrayType(IntegerType, true)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select(array_intPrimitive.as('a)).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val array_intNull = Literal.create(
    +      Seq(1, 2, null, 4, 5), ArrayType(IntegerType, true))
    +    val plan = input.select(array_intNull
    +      .cast(ArrayType(IntegerType, false)).as('a)).analyze
    +    val optimized = Optimize.execute(plan)
    +    assert(optimized.resolved === false)
    --- End diff --
    
    When I just added 'comparePlan(optimized, plan') after this assert, this comparison failed. This is because `plan.resolved == true` while `optimized.resolved == false`.
    Should we create an `expected` plan whose plan is similar to `plan` with `resolved = false`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #60636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60636/consoleFull)** for PR 13704 at commit [`17f17d6`](https://github.com/apache/spark/commit/17f17d60794c1f0ab81e21ec2484742a7610f06d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    > the `Cast` performs conversion from `UnsafeArrayData` to `GenericArrayData`.
    
    `Cast` is used to cast one type to another, not cast one kind of data representation to another, so I don't quite understand why the `Cast` is there.
    
    > This code generation always creates `Object[]` and assign values into each `Object[]` element.
    
    This is reasonable, as it needs to take care of null elements. And we do have a chance to optimize it: if the target array type's element type is primitive and the input array type's element nullability is false, we can avoid using `Object[]`.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62057/consoleFull)** for PR 13704 at commit [`677d81e`](https://github.com/apache/spark/commit/677d81e8d066cf74f7a86ae61c17dbbd2d74dde6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70230473
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2018,6 +2018,8 @@ class Analyzer(
                 fail(child, DateType, walkedTypePath)
               case (StringType, to: NumericType) =>
                 fail(child, to, walkedTypePath)
    +          case (from: ArrayType, to: ArrayType) if !from.containsNull =>
    --- End diff --
    
    Got it. I will add unit tests later. They require a combination of `Cast` and `SimplifyCasts`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62084/consoleFull)** for PR 13704 at commit [`c31729f`](https://github.com/apache/spark/commit/c31729f361b5774f36834daeddb338f49377130e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @liancheng Could you please review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @cloud-fan I misunderstood your comment. Now, I think that I understand it correctly and addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #61629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61629/consoleFull)** for PR 13704 at commit [`2d3d34a`](https://github.com/apache/spark/commit/2d3d34abf0e7f6eb85125968c987642887b9acad).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70364539
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/PrimitiveArrayBenchmark.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.util.{Arrays, Comparator, Random}
    +
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.sql.test.SharedSQLContext
    +import org.apache.spark.unsafe.array.LongArray
    +import org.apache.spark.unsafe.memory.MemoryBlock
    +import org.apache.spark.util.Benchmark
    +import org.apache.spark.util.collection.Sorter
    +import org.apache.spark.util.collection.unsafe.sort._
    +
    +/**
    + * Benchmark to measure performance for accessing primitive arrays
    + * To run this:
    + *  1. Replace ignore(...) with test(...)
    + *  2. build/sbt "sql/test-only *benchmark.PrimitiveArrayBenchmark"
    + *
    + * Benchmarks in this file are skipped in normal builds.
    + */
    +class PrimitiveArrayBenchmark extends BenchmarkBase {
    --- End diff --
    
    I'm still a little against creating benchmark for this case. It will be really messy if anyone who send a patch for optimizer, also create a benchmark for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #61630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61630/consoleFull)** for PR 13704 at commit [`256c861`](https://github.com/apache/spark/commit/256c8616add059299765a7c7f99e5fa389fd65f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62057/consoleFull)** for PR 13704 at commit [`677d81e`](https://github.com/apache/spark/commit/677d81e8d066cf74f7a86ae61c17dbbd2d74dde6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70193764
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -479,7 +479,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
         case DoubleType => castToDoubleCode(from)
     
         case array: ArrayType =>
    -      castArrayCode(from.asInstanceOf[ArrayType].elementType, array.elementType, ctx)
    +      castArrayCode(from.asInstanceOf[ArrayType], array, ctx)
    --- End diff --
    
    Sure, I will create another PR and update the description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r72116259
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1441,6 +1441,12 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     object SimplifyCasts extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case Cast(e, dataType) if e.dataType == dataType => e
    +    case c @ Cast(e, dataType) => (e.dataType, dataType) match {
    --- End diff --
    
    These cases handle a cast only from non-nullable type. If I am correct, a cast from a nullable type is handled as previously done. cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62530/consoleFull)** for PR 13704 at commit [`e4cd571`](https://github.com/apache/spark/commit/e4cd571bc07a2b8c45580d9ba60f66d5b40b7422).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62087/consoleFull)** for PR 13704 at commit [`1bbe859`](https://github.com/apache/spark/commit/1bbe859804d999e30b8ed7f51b13121e30118d5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62152/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70367470
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/PrimitiveArrayBenchmark.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.util.{Arrays, Comparator, Random}
    +
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.sql.test.SharedSQLContext
    +import org.apache.spark.unsafe.array.LongArray
    +import org.apache.spark.unsafe.memory.MemoryBlock
    +import org.apache.spark.util.Benchmark
    +import org.apache.spark.util.collection.Sorter
    +import org.apache.spark.util.collection.unsafe.sort._
    +
    +/**
    + * Benchmark to measure performance for accessing primitive arrays
    + * To run this:
    + *  1. Replace ignore(...) with test(...)
    + *  2. build/sbt "sql/test-only *benchmark.PrimitiveArrayBenchmark"
    + *
    + * Benchmarks in this file are skipped in normal builds.
    + */
    +class PrimitiveArrayBenchmark extends BenchmarkBase {
    --- End diff --
    
    Got it. In the future, it would be good to prepare criteria to require benchmark results in Wiki if anyone create a PR for optimizers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62159/consoleFull)** for PR 13704 at commit [`e6a5772`](https://github.com/apache/spark/commit/e6a5772f4a957cef353d493763cec84321d365fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62071/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62087/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62155/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62530/consoleFull)** for PR 13704 at commit [`e4cd571`](https://github.com/apache/spark/commit/e4cd571bc07a2b8c45580d9ba60f66d5b40b7422).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60636/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @cloud-fan Could you please take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62530/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62504/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70234587
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1441,6 +1441,26 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     object SimplifyCasts extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case Cast(e, dataType) if e.dataType == dataType => e
    +    case Cast(e, dataType) =>
    +      (e.dataType, dataType) match {
    +        case (fromDt: ArrayType, toDt: ArrayType) =>
    --- End diff --
    
    thanks, I like this simple one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62084/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61630/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70193070
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -479,7 +479,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
         case DoubleType => castToDoubleCode(from)
     
         case array: ArrayType =>
    -      castArrayCode(from.asInstanceOf[ArrayType].elementType, array.elementType, ctx)
    +      castArrayCode(from.asInstanceOf[ArrayType], array, ctx)
    --- End diff --
    
    how about we move this into another PR? I think the main purpose of this PR is to eliminate the unnecessary Cast


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62097/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #61629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61629/consoleFull)** for PR 13704 at commit [`2d3d34a`](https://github.com/apache/spark/commit/2d3d34abf0e7f6eb85125968c987642887b9acad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70368410
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = Literal.create(
    +      Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    --- End diff --
    
    @cloud-fan This statement cannot be optimized, probably, due to of missing information on `a`. As a result, it causes assertion an error. Did I make some mistakes?
    ```
      test("non-nullable to non-nullable array cast") {
        val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
        val plan = input.select(
          'a.cast(ArrayType(IntegerType, false)).as('casted)).analyze
        val optimized = Optimize.execute(plan)
        val expected = input.select('a.as('casted)).analyze
        print(s"optimized: $plan")
        print(s"optimized: $optimized")
        comparePlans(optimized, expected)
      }
    
    optimized: 'Project [cast(a#0 as array<int>) AS casted#1]
    +- LocalRelation <empty>, [a#0]
    optimized: 'Project [cast(a#0 as array<int>) AS casted#1]
    +- LocalRelation <empty>, [a#0]
    
    
    == FAIL: Plans do not match ===
    !'Project [cast(a#0 as array<int>) AS casted#0]   Project [a#0 AS casted#0]
     +- LocalRelation <empty>, [a#0]                  +- LocalRelation <empty>, [a#0]
    ```
    
    The original one
    ```
      test("non-nullable to non-nullable array cast") {
        val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
        val array_intPrimitive = Literal.create(
          Seq(1, 2, 3, 4, 5), ArrayType(IntegerType, false))
        val plan = input.select(array_intPrimitive
          .cast(ArrayType(IntegerType, false)).as('a)).analyze
        val optimized = Optimize.execute(plan)
        val expected = input.select(array_intPrimitive.as('a)).analyze
        print(s"optimized: $plan")
        print(s"optimized: $optimized")
        comparePlans(optimized, expected)
      }
    
    optimized: Project [cast([1,2,3,4,5] as array<int>) AS a#1]
    +- LocalRelation <empty>, [a#0]
    optimized: Project [[1,2,3,4,5] AS a#1]
    +- LocalRelation <empty>, [a#0]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r70231367
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1441,6 +1441,26 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     object SimplifyCasts extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case Cast(e, dataType) if e.dataType == dataType => e
    +    case Cast(e, dataType) =>
    +      (e.dataType, dataType) match {
    +        case (fromDt: ArrayType, toDt: ArrayType) =>
    --- End diff --
    
    how about
    ```
    case c @ Cast(e, dataType) => (e.dataType, dataType) match {
      case (ArrayType(from, false), ArrayType(to, true)) if from == to => e
      case (MapType(fromKey, fromValue, false), ArrayType(toKey, toValue, true)) if fromKey == toKey && fromValue == toValue => e
      case _ => c
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62071/consoleFull)** for PR 13704 at commit [`66800fa`](https://github.com/apache/spark/commit/66800faaebf72e492ee7693d81f8dba980f1dab2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Let me check which code portion inserts ```Cast``` in this tree.
    
    > And we do have a chance to optimize it: if the target array type's element type is primitive and the input array type's element nullability is false, we can avoid using ```Object[]```.
    This PR generated code without using ```Object[]``` by check the above condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @cloud-fan I updated the PR description by adding plan trees.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    @cloud-fan I think ```value#63``` is ```UnsafeArrayData```.
    When I ran a DataFrame program, I got the following trees. Since operations for DataFrame access data in UnsafeArrayData, I think that ```LocalRelation``` and ```LocalTableScan``` keep an array as ```UnsafeArrayData```.
    
    ```
    val df = Seq(Array(1.0, 2.0, 3.0), Array(4.0, 5.0, 6.0)).toDF()
    val df2 = df.selectExpr("value[0] + value[1] + value[2]")
    df2.show
    df2.explain(true)
    
    == Analyzed Logical Plan ==
    ((value[0] + value[1]) + value[2]): double
    Project [((value#63[0] + value#63[1]) + value#63[2]) AS ((value[0] + value[1]) + value[2])#67]
    +- LocalRelation [value#63]
    
    == Optimized Logical Plan ==
    LocalRelation [((value[0] + value[1]) + value[2])#67]
    
    == Physical Plan ==
    LocalTableScan [((value[0] + value[1]) + value[2])#67]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r68885567
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -837,8 +837,36 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w
         val j = ctx.freshName("j")
         val values = ctx.freshName("values")
     
    +    val isPrimitiveFrom = ctx.isPrimitiveType(fromType)
    --- End diff --
    
    we need to make sure the input array's element nullability is false, but primitive type array doesn't guarantee it. e.g. we can have `ArrayType(ByteType, true)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r76721258
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,101 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    comparePlans(optimized, plan)
    +  }
    +
    +  test("nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to non-nullable map cast") {
    +    val input = LocalRelation('m.map(MapType(StringType, StringType, false)))
    +    val plan = input.select('m.cast(MapType(StringType, StringType, false))
    +      .as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('m.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable map cast") {
    +    val input = LocalRelation('m.map(MapType(StringType, StringType, false)))
    +    val plan = input.select('m.cast(MapType(StringType, StringType, true))
    +      .as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('m.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable map cast") {
    +    val input = LocalRelation('m.map(MapType(StringType, StringType, true)))
    +    val plan = input.select('m.cast(MapType(StringType, StringType, false))
    +      .as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    comparePlans(optimized, plan)
    +  }
    +
    +  test("nullable to nullable map cast") {
    +    val input = LocalRelation('m.map(MapType(StringType, StringType, true)))
    +    val plan = input.select('m.cast(MapType(StringType, StringType, true))
    +      .as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('m.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +}
    --- End diff --
    
    actually I think we only need 4 tests: non-nullable to nullable and nullable to non-nullable, for array or map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r76721180
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,101 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    comparePlans(optimized, plan)
    +  }
    +
    +  test("nullable to nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
    +    val plan = input.select('a.cast(ArrayType(IntegerType, true)).as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('a.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to non-nullable map cast") {
    +    val input = LocalRelation('m.map(MapType(StringType, StringType, false)))
    +    val plan = input.select('m.cast(MapType(StringType, StringType, false))
    +      .as("casted")).analyze
    +    val optimized = Optimize.execute(plan)
    +    val expected = input.select('m.as("casted")).analyze
    +    comparePlans(optimized, expected)
    +  }
    +
    +  test("non-nullable to nullable map cast") {
    --- End diff --
    
    non-nullable value map to nullable value map cast


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Addressed a comment. Since the following code cannot pass information on `valueContainsNull = false`, I use the current code.
    ```
    def map(mapType: MapType): AttributeReference =
      AttributeReference(s, mapType, nullable = false)()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    LGTM, cc @liancheng to take another look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r71260242
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyCastsSuite.scala ---
    @@ -0,0 +1,119 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types._
    +
    +class SimplifyCastsSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("SimplifyCasts", FixedPoint(50), SimplifyCasts) :: Nil
    +  }
    +
    +  test("non-nullable to non-nullable array cast") {
    +    val input = LocalRelation('a.array(ArrayType(IntegerType, false)))
    +    val array_intPrimitive = 'a.array(ArrayType(IntegerType, false))
    +    val plan = input.select(array_intPrimitive
    --- End diff --
    
    `val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze`
    you need to resolve the plan before optimize it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62050/consoleFull)** for PR 13704 at commit [`43ced15`](https://github.com/apache/spark/commit/43ced1576f31a202c1c514c4bfe28e0ad0d4c964).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13704#discussion_r73824598
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1441,6 +1441,12 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     object SimplifyCasts extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
         case Cast(e, dataType) if e.dataType == dataType => e
    +    case c @ Cast(e, dataType) => (e.dataType, dataType) match {
    --- End diff --
    
    @liancheng Could you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #64658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64658/consoleFull)** for PR 13704 at commit [`40ac2bc`](https://github.com/apache/spark/commit/40ac2bcd2b3a631ca2121ed24137cbb197d1b278).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #61677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61677/consoleFull)** for PR 13704 at commit [`77859cf`](https://github.com/apache/spark/commit/77859cf4397b8a5022b93ffa4996203b36dfef1b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62152/consoleFull)** for PR 13704 at commit [`8dd829a`](https://github.com/apache/spark/commit/8dd829a4922441cc09dee08b532e6b3c90780535).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64658/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13704
  
    **[Test build #62178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62178/consoleFull)** for PR 13704 at commit [`a236e5e`](https://github.com/apache/spark/commit/a236e5e94935ae9453282c5457589b3609e8b7c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org