You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hvanhovell <gi...@git.apache.org> on 2017/02/22 14:06:12 UTC

[GitHub] spark pull request #17027: [SPARK-19650] Runnable commands should not trigge...

GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/17027

    [SPARK-19650] Runnable commands should not trigger a Spark job [WIP]

    ## What changes were proposed in this pull request?
    Spark executes SQL commands eagerly. It does this by materializing an RDD (which triggers execution of the actual command) with the command's results. The downside to this approach is that this also triggers a Spark job which quite expensive and unnecessary.
    
    This PR fixes this by avoiding the materialization of an `RDD` for `RunnableCommands`; it just calls `executedPlan.collectToIterate` to trigger the execution and wraps the `executedPlan` with a `MaterializedPlan` to avoid another execution of the plan.
    
    ## How was this patch tested?
    *TODO*

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark no-job-command

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17027.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17027
    
----
commit 4eea40baf569fe989ac6ec0d723259f1ab886ed3
Author: Herman van Hovell <hv...@databricks.com>
Date:   2017-02-17T22:09:02Z

    Do not trigger a job for runnable commands unless we have to.

commit bd379340d16ac1f75b4b94cb739fb2db2a18dbb8
Author: Herman van Hovell <hv...@databricks.com>
Date:   2017-02-22T13:51:46Z

    Introduce materialized plan

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    **[Test build #73279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73279/testReport)** for PR 17027 at commit [`bd37934`](https://github.com/apache/spark/commit/bd379340d16ac1f75b4b94cb739fb2db2a18dbb8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17027: [SPARK-19650] Commands should not trigger a Spark...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17027#discussion_r103050183
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -175,19 +175,14 @@ class Dataset[T] private[sql](
       }
     
       @transient private[sql] val logicalPlan: LogicalPlan = {
    -    def hasSideEffects(plan: LogicalPlan): Boolean = plan match {
    -      case _: Command |
    -           _: InsertIntoTable => true
    -      case _ => false
    -    }
    -
    +    // For various commands (like DDL) and queries with side effects, we force query execution
    +    // to happen right away to let these side effects take place eagerly.
         queryExecution.analyzed match {
           // For various commands (like DDL) and queries with side effects, we force query execution
    --- End diff --
    
    remove this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17027: [SPARK-19650] Commands should not trigger a Spark...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17027#discussion_r103073282
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -175,19 +175,14 @@ class Dataset[T] private[sql](
       }
     
       @transient private[sql] val logicalPlan: LogicalPlan = {
    -    def hasSideEffects(plan: LogicalPlan): Boolean = plan match {
    -      case _: Command |
    -           _: InsertIntoTable => true
    -      case _ => false
    -    }
    -
    +    // For various commands (like DDL) and queries with side effects, we force query execution
    +    // to happen right away to let these side effects take place eagerly.
         queryExecution.analyzed match {
           // For various commands (like DDL) and queries with side effects, we force query execution
    --- End diff --
    
    actually let me remove it while merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17027: [SPARK-19650] Commands should not trigger a Spark...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17027


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73345/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73350/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Commands should not trigger a Spark job

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73425/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    **[Test build #73350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73350/testReport)** for PR 17027 at commit [`e8acd98`](https://github.com/apache/spark/commit/e8acd98f933b58782850cf91eda1ac31123e342b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17027: [SPARK-19650] Commands should not trigger a Spark...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17027#discussion_r103050359
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -125,8 +125,6 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         // SHOW TABLES in Hive only output table names, while ours outputs database, table name, isTemp.
         case command: ExecutedCommandExec if command.cmd.isInstanceOf[ShowTablesCommand] =>
           command.executeCollect().map(_.getString(1))
    -    case command: ExecutedCommandExec =>
    --- End diff --
    
    why remove this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73279/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    **[Test build #73345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73345/testReport)** for PR 17027 at commit [`fdfe7fe`](https://github.com/apache/spark/commit/fdfe7fed6cfffa810744871dfa3b1ce35f9ca8bd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Commands should not trigger a Spark job

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    **[Test build #73425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73425/testReport)** for PR 17027 at commit [`dad6b13`](https://github.com/apache/spark/commit/dad6b13c3940133743f0821f80da1bffc7327b67).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Commands should not trigger a Spark job

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Commands should not trigger a Spark job

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    **[Test build #73350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73350/testReport)** for PR 17027 at commit [`e8acd98`](https://github.com/apache/spark/commit/e8acd98f933b58782850cf91eda1ac31123e342b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Commands should not trigger a Spark job

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    **[Test build #73425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73425/testReport)** for PR 17027 at commit [`dad6b13`](https://github.com/apache/spark/commit/dad6b13c3940133743f0821f80da1bffc7327b67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Commands should not trigger a Spark job

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    **[Test build #73345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73345/testReport)** for PR 17027 at commit [`fdfe7fe`](https://github.com/apache/spark/commit/fdfe7fed6cfffa810744871dfa3b1ce35f9ca8bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17027: [SPARK-19650] Runnable commands should not trigger a Spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17027
  
    **[Test build #73279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73279/testReport)** for PR 17027 at commit [`bd37934`](https://github.com/apache/spark/commit/bd379340d16ac1f75b4b94cb739fb2db2a18dbb8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class MaterializedPlan(plan: SparkPlan) extends LeafNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org