You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bogdanrdc <gi...@git.apache.org> on 2018/08/23 12:14:14 UTC

[GitHub] spark pull request #22201: [SPARK-25209][SQL] Avoid deserializer check in Da...

GitHub user bogdanrdc opened a pull request:

    https://github.com/apache/spark/pull/22201

    [SPARK-25209][SQL] Avoid deserializer check in Dataset.apply when Dataset is actually DataFrame

    ## What changes were proposed in this pull request?
    Dataset.apply calls dataset.deserializer (to provide an early error) which ends up calling the full Analyzer on the deserializer. This can take tens of milliseconds, depending on how big the plan is.
    Since Dataset.apply is called for many Dataset operations such as Dataset.where it can be a significant overhead for short queries.
    According to a comment in the PR that introduced this check, we can at least remove this check for DataFrames: https://github.com/apache/spark/pull/20402#discussion_r164338267
    
    ## How was this patch tested?
    Existing tests + manual benchmark

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bogdanrdc/spark deserializer-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22201
    
----
commit 7089e035253c80bd143f3af4d12f39643e9eaf84
Author: Bogdan Raducanu <bo...@...>
Date:   2018-08-23T12:11:34Z

    optimization

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22201: [SPARK-25209][SQL] Avoid deserializer check in Da...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22201


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22201: [SPARK-25209][SQL] Avoid deserializer check in Dataset.a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95153/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22201: [SPARK-25209][SQL] Avoid deserializer check in Dataset.a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22201
  
    **[Test build #95153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95153/testReport)** for PR 22201 at commit [`7089e03`](https://github.com/apache/spark/commit/7089e035253c80bd143f3af4d12f39643e9eaf84).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22201: [SPARK-25209][SQL] Avoid deserializer check in Dataset.a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22201
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22201: [SPARK-25209][SQL] Avoid deserializer check in Dataset.a...

Posted by bogdanrdc <gi...@git.apache.org>.
Github user bogdanrdc commented on the issue:

    https://github.com/apache/spark/pull/22201
  
    cc @gatorsmile @hvanhovell 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22201: [SPARK-25209][SQL] Avoid deserializer check in Dataset.a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22201
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22201: [SPARK-25209][SQL] Avoid deserializer check in Dataset.a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22201
  
    **[Test build #95153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95153/testReport)** for PR 22201 at commit [`7089e03`](https://github.com/apache/spark/commit/7089e035253c80bd143f3af4d12f39643e9eaf84).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22201: [SPARK-25209][SQL] Avoid deserializer check in Dataset.a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2485/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org