You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2017/03/12 06:24:59 UTC

[GitHub] spark pull request #17264: [SPARK-19923][SQL] Remove unnecessary type conver...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/17264

    [SPARK-19923][SQL] Remove unnecessary type conversions per call in Hive

    ## What changes were proposed in this pull request?
    This pr removed unnecessary type conversions per call in Hive: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala#L116
    
    ## How was this patch tested?
    Existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-19923

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17264.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17264
    
----
commit 6468fde7a9b726843e505b029cb5f7ac865690fe
Author: Takeshi Yamamuro <ya...@apache.org>
Date:   2017-03-12T06:19:20Z

    Remove unnecessary type conversions per call

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    In the future can we put the perf result in PR descriptions?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    Thanks for valuable comments! I'll update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    @hvanhovell I checked all the related code and then I think this pr includes all the place we could replace `wrap` with `wrapperFor`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74537/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17264: [SPARK-19923][SQL] Remove unnecessary type conver...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17264#discussion_r105555670
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -196,6 +198,11 @@ private[orc] class OrcSerializer(dataSchema: StructType, conf: Configuration)
     
       private[this] val cachedOrcStruct = structOI.create().asInstanceOf[OrcStruct]
     
    +  private[this] val wrappers = dataSchema.zip(structOI.getAllStructFieldRefs().asScala.toSeq)
    +      .map { case (f, i) =>
    +    wrapperFor(i.getFieldObjectInspector, f.dataType)
    +  }
    --- End diff --
    
    (my personal taste that might be worth considering.. 
    
    ```scala
    private[this] val wrappers = dataSchema
      .zip(structOI.getAllStructFieldRefs.asScala.toSeq)
      .map { case (f, i) => wrapperFor(i.getFieldObjectInspector, f.dataType) }
    ```
    
    or maybe
    
    ```scala
    private[this] val wrappers = 
      structOI.getAllStructFieldRefs.asScala.toSeq.zip(dataSchema).map { case (ref, field) =>
        wrapperFor(ref.getFieldObjectInspector, field.dataType)
      }
    ```
    
    which resembles https://github.com/apache/spark/blob/226d38840c8d3f40639715d755df6fb4fee2715f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala#L364-L366
    )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    **[Test build #74537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74537/testReport)** for PR 17264 at commit [`efd3628`](https://github.com/apache/spark/commit/efd3628f66d37cb9c5dfca34694f0206907e08f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    I'm not sure this makes sense, so could you check? cc: @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    **[Test build #74396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74396/testReport)** for PR 17264 at commit [`6468fde`](https://github.com/apache/spark/commit/6468fde7a9b726843e505b029cb5f7ac865690fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17264: [SPARK-19923][SQL] Remove unnecessary type conver...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17264


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    @maropu this looks like a nice improvement. It is an idea to check all uses of `wrap`, and see if we can replace them by `wrapperFor`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    I quickly did check performance changes;
    ```
    public class TestGenericUDF extends GenericUDF {
      @Override
      public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {
          return PrimitiveObjectInspectorFactory.javaLongObjectInspector;
      }
    
      @Override
      public Object evaluate(DeferredObject[] args) throws HiveException {
          final long a1 = PrimitiveObjectInspectorFactory.javaLongObjectInspector.get(args[0].get());
          final long a2 = PrimitiveObjectInspectorFactory.javaLongObjectInspector.get(args[1].get());
          final long a3 = PrimitiveObjectInspectorFactory.javaLongObjectInspector.get(args[2].get());
          final long a4 = PrimitiveObjectInspectorFactory.javaLongObjectInspector.get(args[3].get());
          return a1 + a2 + a3 + a4;
      }
    }
    ```
    
    ```
    $./bin/spark-shell --master local[1]  --conf spark.sql.shuffle.partitions=1 -v
    
    scala> sql("CREATE TEMPORARY FUNCTION testUdf AS 'hivemall.ftvec.TestGenericUDF'")
    scala> :paste
    def timer[R](block: => R): R = {
      val t0 = System.nanoTime()
      val result = block
      val t1 = System.nanoTime()
      println("Elapsed time: " + ((t1 - t0 + 0.0) / 1000000000.0)+ "s")
      result
    }
    
    scala> spark.range(3000000).createOrReplaceTempView("t")
    scala> timer { sql("SELECT testUdf(id, id, id, id) FROM t").queryExecution.executedPlan.execute().foreach(x => {}) }
    ```
    
    ```
    # performance w/ this patch
    Elapsed time: 1.901269167s
    
    # performance w/o this patch
    Elapsed time: 0.492860666s
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17264: [SPARK-19923][SQL] Remove unnecessary type conver...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17264#discussion_r105554355
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -208,10 +215,8 @@ private[orc] class OrcSerializer(dataSchema: StructType, conf: Configuration)
           oi.setStructFieldData(
             struct,
             fieldRefs.get(i),
    -        wrap(
    -          row.get(i, dataSchema(i).dataType),
    -          fieldRefs.get(i).getFieldObjectInspector,
    -          dataSchema(i).dataType))
    +        wrappers(i)(row.get(i, dataSchema(i).dataType))
    --- End diff --
    
    Hi, @maropu .
    This seems to be improvement, too. Could you add some description about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    **[Test build #74537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74537/testReport)** for PR 17264 at commit [`efd3628`](https://github.com/apache/spark/commit/efd3628f66d37cb9c5dfca34694f0206907e08f9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    LGTM merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    **[Test build #74396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74396/testReport)** for PR 17264 at commit [`6468fde`](https://github.com/apache/spark/commit/6468fde7a9b726843e505b029cb5f7ac865690fe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74396/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17264: [SPARK-19923][SQL] Remove unnecessary type conver...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17264#discussion_r105953187
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -208,10 +215,8 @@ private[orc] class OrcSerializer(dataSchema: StructType, conf: Configuration)
           oi.setStructFieldData(
             struct,
             fieldRefs.get(i),
    -        wrap(
    -          row.get(i, dataSchema(i).dataType),
    -          fieldRefs.get(i).getFieldObjectInspector,
    -          dataSchema(i).dataType))
    +        wrappers(i)(row.get(i, dataSchema(i).dataType))
    --- End diff --
    
    Added the description to explain `wrappers`: https://github.com/apache/spark/pull/17264/files#diff-01999ccbf13e95a0ea2d223f69d8ae23R201


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17264: [SPARK-19923][SQL] Remove unnecessary type conversions p...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/17264
  
    @rxin okay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org