You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/03/06 06:46:34 UTC

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/17172

    [SPARK-19008][SQL] Improve performance of Dataset.map by eliminating boxing/unboxing

    ## What changes were proposed in this pull request?
    
    This PR improve performance of Dataset.map() for primitive types by removing boxing/unbox operations.
    
    Current Catalyst generates a method call to a `apply()` method of an anonymous function written in Scala. The types of an argument and return value are `java.lang.Object`. As a result, each method call for a primitive value involves a pair of unboxing and boxing for calling this `apply()` method and a pair of boxing and unboxing for returning from this `apply()` method.
    
    This PR directly calls a specialized version of a `apply()` method without boxing and unboxing. For example, if types of an arguments ant return value is `int`, this PR generates a method call to `apply$mcII$sp`. This PR supports any combination of `Int`, `Long`, `Float`, and `Double`.
    
    
    The following is a benchmark result using [this program](https://github.com/apache/spark/pull/16391/files) with 4.7x. Here is a Dataset part of this program.
    
    Without this PR
    ```
    OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14 on Linux 4.4.0-47-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    back-to-back map:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    RDD                                           1923 / 1952         52.0          19.2       1.0X
    DataFrame                                      526 /  548        190.2           5.3       3.7X
    Dataset                                       3094 / 3154         32.3          30.9       0.6X
    ```
    
    With this PR
    ```
    OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14 on Linux 4.4.0-47-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    back-to-back map:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    RDD                                           1883 / 1892         53.1          18.8       1.0X
    DataFrame                                      502 /  642        199.1           5.0       3.7X
    Dataset                                        657 /  784        152.2           6.6       2.9X
    ```
    
    ```java
      def backToBackMap(spark: SparkSession, numRows: Long, numChains: Int): Benchmark = {
        import spark.implicits._
        val rdd = spark.sparkContext.range(0, numRows)
        val ds = spark.range(0, numRows)
        val func = (l: Long) => l + 1
        val benchmark = new Benchmark("back-to-back map", numRows)
    ...
        benchmark.addCase("Dataset") { iter =>
          var res = ds.as[Long]
          var i = 0
          while (i < numChains) {
            res = res.map(func)
            i += 1
          }
          res.queryExecution.toRdd.foreach(_ => Unit)
        }
        benchmark
      }
    ```
    
    
    A motivating example
    ```java
    Seq(1, 2, 3).toDS.map(i => i * 7).show
    ```
    
    Generated code without this PR
    ```java
    /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 006 */   private Object[] references;
    /* 007 */   private scala.collection.Iterator[] inputs;
    /* 008 */   private scala.collection.Iterator inputadapter_input;
    /* 009 */   private UnsafeRow deserializetoobject_result;
    /* 010 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder deserializetoobject_holder;
    /* 011 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter deserializetoobject_rowWriter;
    /* 012 */   private int mapelements_argValue;
    /* 013 */   private UnsafeRow mapelements_result;
    /* 014 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder mapelements_holder;
    /* 015 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter mapelements_rowWriter;
    /* 016 */   private UnsafeRow serializefromobject_result;
    /* 017 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder serializefromobject_holder;
    /* 018 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter serializefromobject_rowWriter;
    /* 019 */
    /* 020 */   public GeneratedIterator(Object[] references) {
    /* 021 */     this.references = references;
    /* 022 */   }
    /* 023 */
    /* 024 */   public void init(int index, scala.collection.Iterator[] inputs) {
    /* 025 */     partitionIndex = index;
    /* 026 */     this.inputs = inputs;
    /* 027 */     inputadapter_input = inputs[0];
    /* 028 */     deserializetoobject_result = new UnsafeRow(1);
    /* 029 */     this.deserializetoobject_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(deserializetoobject_result, 0);
    /* 030 */     this.deserializetoobject_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(deserializetoobject_holder, 1);
    /* 031 */
    /* 032 */     mapelements_result = new UnsafeRow(1);
    /* 033 */     this.mapelements_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(mapelements_result, 0);
    /* 034 */     this.mapelements_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(mapelements_holder, 1);
    /* 035 */     serializefromobject_result = new UnsafeRow(1);
    /* 036 */     this.serializefromobject_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(serializefromobject_result, 0);
    /* 037 */     this.serializefromobject_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(serializefromobject_holder, 1);
    /* 038 */
    /* 039 */   }
    /* 040 */
    /* 043 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
    /* 044 */       int inputadapter_value = inputadapter_row.getInt(0);
    /* 045 */
    /* 046 */       boolean mapelements_isNull = true;
    /* 047 */       int mapelements_value = -1;
    /* 048 */       if (!false) {
    /* 049 */         mapelements_argValue = inputadapter_value;
    /* 050 */
    /* 051 */         mapelements_isNull = false;
    /* 052 */         if (!mapelements_isNull) {
    /* 053 */           Object mapelements_funcResult = null;
    /* 054 */           mapelements_funcResult = ((scala.Function1) references[0]).apply(mapelements_argValue);
    /* 055 */           if (mapelements_funcResult == null) {
    /* 056 */             mapelements_isNull = true;
    /* 057 */           } else {
    /* 058 */             mapelements_value = (Integer) mapelements_funcResult;
    /* 059 */           }
    /* 060 */
    /* 061 */         }
    /* 062 */
    /* 063 */       }
    /* 064 */
    /* 065 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 066 */
    /* 067 */       if (mapelements_isNull) {
    /* 068 */         serializefromobject_rowWriter.setNullAt(0);
    /* 069 */       } else {
    /* 070 */         serializefromobject_rowWriter.write(0, mapelements_value);
    /* 071 */       }
    /* 072 */       append(serializefromobject_result);
    /* 073 */       if (shouldStop()) return;
    /* 074 */     }
    /* 075 */   }
    /* 076 */ }
    ```
    
    Generated code with this PR (lines 48-56 are changed)
    ```java
    /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 006 */   private Object[] references;
    /* 007 */   private scala.collection.Iterator[] inputs;
    /* 008 */   private scala.collection.Iterator inputadapter_input;
    /* 009 */   private UnsafeRow deserializetoobject_result;
    /* 010 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder deserializetoobject_holder;
    /* 011 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter deserializetoobject_rowWriter;
    /* 012 */   private int mapelements_argValue;
    /* 013 */   private UnsafeRow mapelements_result;
    /* 014 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder mapelements_holder;
    /* 015 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter mapelements_rowWriter;
    /* 016 */   private UnsafeRow serializefromobject_result;
    /* 017 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder serializefromobject_holder;
    /* 018 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter serializefromobject_rowWriter;
    /* 019 */
    /* 020 */   public GeneratedIterator(Object[] references) {
    /* 021 */     this.references = references;
    /* 022 */   }
    /* 023 */
    /* 024 */   public void init(int index, scala.collection.Iterator[] inputs) {
    /* 025 */     partitionIndex = index;
    /* 026 */     this.inputs = inputs;
    /* 027 */     inputadapter_input = inputs[0];
    /* 028 */     deserializetoobject_result = new UnsafeRow(1);
    /* 029 */     this.deserializetoobject_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(deserializetoobject_result, 0);
    /* 030 */     this.deserializetoobject_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(deserializetoobject_holder, 1);
    /* 031 */
    /* 032 */     mapelements_result = new UnsafeRow(1);
    /* 033 */     this.mapelements_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(mapelements_result, 0);
    /* 034 */     this.mapelements_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(mapelements_holder, 1);
    /* 035 */     serializefromobject_result = new UnsafeRow(1);
    /* 036 */     this.serializefromobject_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(serializefromobject_result, 0);
    /* 037 */     this.serializefromobject_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(serializefromobject_holder, 1);
    /* 038 */
    /* 039 */   }
    /* 040 */
    /* 041 */   protected void processNext() throws java.io.IOException {
    /* 042 */     while (inputadapter_input.hasNext() && !stopEarly()) {
    /* 043 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
    /* 044 */       int inputadapter_value = inputadapter_row.getInt(0);
    /* 045 */
    /* 046 */       boolean mapelements_isNull = true;
    /* 047 */       int mapelements_value = -1;
    /* 048 */       if (!false) {
    /* 049 */         mapelements_argValue = inputadapter_value;
    /* 050 */
    /* 051 */         mapelements_isNull = false;
    /* 052 */         if (!mapelements_isNull) {
    /* 053 */           mapelements_value = ((scala.Function1) references[0]).apply$mcII$sp(mapelements_argValue);
    /* 054 */         }
    /* 055 */
    /* 056 */       }
    /* 057 */
    /* 058 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 059 */
    /* 060 */       if (mapelements_isNull) {
    /* 061 */         serializefromobject_rowWriter.setNullAt(0);
    /* 062 */       } else {
    /* 063 */         serializefromobject_rowWriter.write(0, mapelements_value);
    /* 064 */       }
    /* 065 */       append(serializefromobject_result);
    /* 066 */       if (shouldStop()) return;
    /* 067 */     }
    /* 068 */   }
    /* 069 */ }
    ```
    
    ## How was this patch tested?
    
    Added new test suites to `DatasetPrimitiveSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-19008

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17172
    
----
commit d8b5f8d839d5c3388244cf2a6dcf4494d927145f
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-03-06T06:42:10Z

    Initial commit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105246532
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -216,10 +217,39 @@ case class MapElementsExec(
         child.asInstanceOf[CodegenSupport].produce(ctx, this)
       }
     
    +  private def getMethodType(dt: DataType, isOutput: Boolean): String = {
    +    dt match {
    +      case BooleanType if isOutput => "Z"
    --- End diff --
    
    IIUC, [this code](https://github.com/scala/scala/blob/v2.12.0/src/library/scala/Function1.scala#L32https://github.com/scala/scala/blob/v2.12.0/src/library/scala/Function1.scala#L32) specializes boolean type only for return type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73990/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    This is cool! Can you also update the benchmark result in `DatasetBenchmark`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105236895
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -216,10 +217,39 @@ case class MapElementsExec(
         child.asInstanceOf[CodegenSupport].produce(ctx, this)
       }
     
    +  private def getMethodType(dt: DataType, isOutput: Boolean): String = {
    --- End diff --
    
    let's return a `Option[String]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74296/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r104774301
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -219,7 +219,30 @@ case class MapElementsExec(
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val (funcClass, methodName) = func match {
           case m: MapFunction[_, _] => classOf[MapFunction[_, _]] -> "call"
    -      case _ => classOf[Any => Any] -> "apply"
    +      case _ =>
    +        (if (child.output.length == 1) child.output(0).dataType else NullType,
    --- End diff --
    
    the `if` is not needed, see the `assert` in `ObjectConsumerExec`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r104366373
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -217,9 +219,33 @@ case class MapElementsExec(
       }
     
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
    +    val inType = if (child.output.length == 1) child.output(0).dataType else NullType
    --- End diff --
    
    These two are only needed inside the `case _` block right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105237069
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
    @@ -165,11 +208,23 @@ object DatasetBenchmark {
         val numRows = 100000000
         val numChains = 10
     
    -    val benchmark = backToBackMap(spark, numRows, numChains)
    +    val benchmark0 = backToBackMapLong(spark, numRows, numChains)
    +    val benchmark1 = backToBackMap(spark, numRows, numChains)
         val benchmark2 = backToBackFilter(spark, numRows, numChains)
    --- End diff --
    
    we can also add a new case for `backToBackFilterLong`, as we handle boolean type now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74296/testReport)** for PR 17172 at commit [`200cec7`](https://github.com/apache/spark/commit/200cec783f33de21d9895f90161a9d11877d0877).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74249/testReport)** for PR 17172 at commit [`8ee91af`](https://github.com/apache/spark/commit/8ee91af93c1d6f439cbef0e3aa47154b6881946d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17172


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105236823
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -216,10 +217,39 @@ case class MapElementsExec(
         child.asInstanceOf[CodegenSupport].produce(ctx, this)
       }
     
    +  private def getMethodType(dt: DataType, isOutput: Boolean): String = {
    +    dt match {
    +      case BooleanType if isOutput => "Z"
    --- End diff --
    
    so boolean type can't be a parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74201/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r104366216
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -17,6 +17,8 @@
     
     package org.apache.spark.sql.execution
     
    +import com.sun.org.apache.xalan.internal.xsltc.compiler.util.VoidType
    --- End diff --
    
    Is this a mistaken import? I don't see it used in the change and can't imagine we'd be invoking Xalan here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #73971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73971/testReport)** for PR 17172 at commit [`d8b5f8d`](https://github.com/apache/spark/commit/d8b5f8d839d5c3388244cf2a6dcf4494d927145f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74249/testReport)** for PR 17172 at commit [`8ee91af`](https://github.com/apache/spark/commit/8ee91af93c1d6f439cbef0e3aa47154b6881946d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r104990128
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -219,7 +219,33 @@ case class MapElementsExec(
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val (funcClass, methodName) = func match {
           case m: MapFunction[_, _] => classOf[MapFunction[_, _]] -> "call"
    -      case _ => classOf[Any => Any] -> "apply"
    +      case _ => (child.output(0).dataType, outputObjAttr.dataType) match {
    +        // if a pair of an argument and return types is one of specific types
    +        // whose specialized method (apply$mc..$sp) is generated by scalac,
    +        // Catalyst generated a direct method call to the specialized method.
    +        // The followings are references for this specialization:
    +        //   https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/transform/
    +        //     SpecializeTypes.scala
    +        //   http://www.cakesolutions.net/teamblogs/scala-dissection-functions
    +        //   http://axel22.github.io/2013/11/03/specialization-quirks.html
    +        case (IntegerType, IntegerType) => classOf[Int => Int] -> "apply$mcII$sp"
    --- End diff --
    
    what about boolean type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #73987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73987/testReport)** for PR 17172 at commit [`a885907`](https://github.com/apache/spark/commit/a8859078da4257ab6580889b74f463847d3dbb00).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    yea let's add a new case in the benchmark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74201/testReport)** for PR 17172 at commit [`dfbce2a`](https://github.com/apache/spark/commit/dfbce2a484c1e7ea333677e2a6d61913ad9df846).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74201/testReport)** for PR 17172 at commit [`dfbce2a`](https://github.com/apache/spark/commit/dfbce2a484c1e7ea333677e2a6d61913ad9df846).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    cool! merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105252628
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -216,10 +217,39 @@ case class MapElementsExec(
         child.asInstanceOf[CodegenSupport].produce(ctx, this)
       }
     
    +  private def getMethodType(dt: DataType, isOutput: Boolean): String = {
    --- End diff --
    
    Sure, I will do this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105103907
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -219,7 +219,33 @@ case class MapElementsExec(
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val (funcClass, methodName) = func match {
           case m: MapFunction[_, _] => classOf[MapFunction[_, _]] -> "call"
    -      case _ => classOf[Any => Any] -> "apply"
    +      case _ => (child.output(0).dataType, outputObjAttr.dataType) match {
    +        // if a pair of an argument and return types is one of specific types
    +        // whose specialized method (apply$mc..$sp) is generated by scalac,
    +        // Catalyst generated a direct method call to the specialized method.
    +        // The followings are references for this specialization:
    +        //   https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/transform/
    +        //     SpecializeTypes.scala
    +        //   http://www.cakesolutions.net/teamblogs/scala-dissection-functions
    +        //   http://axel22.github.io/2013/11/03/specialization-quirks.html
    +        case (IntegerType, IntegerType) => classOf[Int => Int] -> "apply$mcII$sp"
    --- End diff --
    
    Good catch. I overlooked boolean type for return type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74257/testReport)** for PR 17172 at commit [`1fb2933`](https://github.com/apache/spark/commit/1fb2933c60d32d9652f50d30aeefd4dbe52643e9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74297/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74257/testReport)** for PR 17172 at commit [`1fb2933`](https://github.com/apache/spark/commit/1fb2933c60d32d9652f50d30aeefd4dbe52643e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74296/testReport)** for PR 17172 at commit [`200cec7`](https://github.com/apache/spark/commit/200cec783f33de21d9895f90161a9d11877d0877).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74249/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #73971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73971/testReport)** for PR 17172 at commit [`d8b5f8d`](https://github.com/apache/spark/commit/d8b5f8d839d5c3388244cf2a6dcf4494d927145f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105309201
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
    @@ -165,11 +208,23 @@ object DatasetBenchmark {
         val numRows = 100000000
         val numChains = 10
     
    -    val benchmark = backToBackMap(spark, numRows, numChains)
    +    val benchmark0 = backToBackMapLong(spark, numRows, numChains)
    +    val benchmark1 = backToBackMap(spark, numRows, numChains)
         val benchmark2 = backToBackFilter(spark, numRows, numChains)
    --- End diff --
    
    Correction: `FilterExec()` generates code. `TypedFilter` generated code piece for a method invocation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73971/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #73990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73990/testReport)** for PR 17172 at commit [`65fa05a`](https://github.com/apache/spark/commit/65fa05a72be841219fd1a0ba65d88223ad7b79cb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73987/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105320575
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
    @@ -165,11 +208,23 @@ object DatasetBenchmark {
         val numRows = 100000000
         val numChains = 10
     
    -    val benchmark = backToBackMap(spark, numRows, numChains)
    +    val benchmark0 = backToBackMapLong(spark, numRows, numChains)
    +    val benchmark1 = backToBackMap(spark, numRows, numChains)
         val benchmark2 = backToBackFilter(spark, numRows, numChains)
    --- End diff --
    
    Added a new case `backToBackFilterLong`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105266620
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -216,10 +217,39 @@ case class MapElementsExec(
         child.asInstanceOf[CodegenSupport].produce(ctx, this)
       }
     
    +  private def getMethodType(dt: DataType, isOutput: Boolean): String = {
    +    dt match {
    +      case BooleanType if isOutput => "Z"
    +      case IntegerType => "I"
    +      case LongType => "J"
    +      case FloatType => "F"
    +      case DoubleType => "D"
    +      case _ => null
    +    }
    +  }
    +
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val (funcClass, methodName) = func match {
    --- End diff --
    
    let's put this thing in a util so that `FilterExec` can also use it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #73990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73990/testReport)** for PR 17172 at commit [`65fa05a`](https://github.com/apache/spark/commit/65fa05a72be841219fd1a0ba65d88223ad7b79cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r104991089
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -219,7 +219,33 @@ case class MapElementsExec(
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val (funcClass, methodName) = func match {
           case m: MapFunction[_, _] => classOf[MapFunction[_, _]] -> "call"
    -      case _ => classOf[Any => Any] -> "apply"
    +      case _ => (child.output(0).dataType, outputObjAttr.dataType) match {
    +        // if a pair of an argument and return types is one of specific types
    +        // whose specialized method (apply$mc..$sp) is generated by scalac,
    +        // Catalyst generated a direct method call to the specialized method.
    +        // The followings are references for this specialization:
    +        //   https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/transform/
    +        //     SpecializeTypes.scala
    +        //   http://www.cakesolutions.net/teamblogs/scala-dissection-functions
    +        //   http://axel22.github.io/2013/11/03/specialization-quirks.html
    +        case (IntegerType, IntegerType) => classOf[Int => Int] -> "apply$mcII$sp"
    +        case (IntegerType, LongType) => classOf[Int => Long] -> "apply$mcJI$sp"
    --- End diff --
    
    is it possible do it in a composable way instead of enumerating all combinations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r104370969
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -17,6 +17,8 @@
     
     package org.apache.spark.sql.execution
     
    +import com.sun.org.apache.xalan.internal.xsltc.compiler.util.VoidType
    --- End diff --
    
    Yes, that is why the first try was failed. It was unintentionally imported during my debugging. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74297/testReport)** for PR 17172 at commit [`b25b191`](https://github.com/apache/spark/commit/b25b191687259303df5ab2fad0c64687a88de5bd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    [The latest `DatasetBenchmark`](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala) has a function `val func = (d: Data) => Data(d.l + 1, d.s)` that this PR cannot be applied to.
    
    Do we add a new benchmark with a function `val func = (d: Data) => Data(d.l + 1)` based on [this suggestion](https://github.com/apache/spark/pull/16391#issuecomment-269361025)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r104371116
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -217,9 +219,33 @@ case class MapElementsExec(
       }
     
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
    +    val inType = if (child.output.length == 1) child.output(0).dataType else NullType
    --- End diff --
    
    Good catch. I simplified their scope.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r104774584
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -219,7 +219,30 @@ case class MapElementsExec(
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val (funcClass, methodName) = func match {
           case m: MapFunction[_, _] => classOf[MapFunction[_, _]] -> "call"
    -      case _ => classOf[Any => Any] -> "apply"
    +      case _ =>
    +        (if (child.output.length == 1) child.output(0).dataType else NullType,
    +         outputObjAttr.dataType) match {
    +          // if a pair of an argument and return types is one of specific types
    +          // whose specialized method (apply$mc..$sp) is generated by scalac,
    +          // Catalyst generated a direct method call to the specialized method.
    --- End diff --
    
    can you link to some official document or blogpost?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105103962
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -219,7 +219,33 @@ case class MapElementsExec(
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val (funcClass, methodName) = func match {
           case m: MapFunction[_, _] => classOf[MapFunction[_, _]] -> "call"
    -      case _ => classOf[Any => Any] -> "apply"
    +      case _ => (child.output(0).dataType, outputObjAttr.dataType) match {
    +        // if a pair of an argument and return types is one of specific types
    +        // whose specialized method (apply$mc..$sp) is generated by scalac,
    +        // Catalyst generated a direct method call to the specialized method.
    +        // The followings are references for this specialization:
    +        //   https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/transform/
    +        //     SpecializeTypes.scala
    +        //   http://www.cakesolutions.net/teamblogs/scala-dissection-functions
    +        //   http://axel22.github.io/2013/11/03/specialization-quirks.html
    +        case (IntegerType, IntegerType) => classOf[Int => Int] -> "apply$mcII$sp"
    +        case (IntegerType, LongType) => classOf[Int => Long] -> "apply$mcJI$sp"
    --- End diff --
    
    Yes, I found a composable way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105252486
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
    @@ -165,11 +208,23 @@ object DatasetBenchmark {
         val numRows = 100000000
         val numChains = 10
     
    -    val benchmark = backToBackMap(spark, numRows, numChains)
    +    val benchmark0 = backToBackMapLong(spark, numRows, numChains)
    +    val benchmark1 = backToBackMap(spark, numRows, numChains)
         val benchmark2 = backToBackFilter(spark, numRows, numChains)
    --- End diff --
    
    `filter()` is handled by `FilterExec()`. Should this PR handle `filter()`, too? Or, do I open another PR for `filter()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17172#discussion_r105321757
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -216,10 +217,39 @@ case class MapElementsExec(
         child.asInstanceOf[CodegenSupport].produce(ctx, this)
       }
     
    +  private def getMethodType(dt: DataType, isOutput: Boolean): String = {
    +    dt match {
    +      case BooleanType if isOutput => "Z"
    +      case IntegerType => "I"
    +      case LongType => "J"
    +      case FloatType => "F"
    +      case DoubleType => "D"
    +      case _ => null
    +    }
    +  }
    +
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val (funcClass, methodName) = func match {
    --- End diff --
    
    Sure. Now, can generate a call to a specialized method for `Dataset.filter()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #74297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74297/testReport)** for PR 17172 at commit [`b25b191`](https://github.com/apache/spark/commit/b25b191687259303df5ab2fad0c64687a88de5bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    **[Test build #73987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73987/testReport)** for PR 17172 at commit [`a885907`](https://github.com/apache/spark/commit/a8859078da4257ab6580889b74f463847d3dbb00).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17172
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74257/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org