You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2014/08/08 10:57:11 UTC

[jira] [Commented] (SPARK-2590) Add config property to disable incremental collection used in Thrift server

    [ https://issues.apache.org/jira/browse/SPARK-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090492#comment-14090492 ] 

Apache Spark commented on SPARK-2590:
-------------------------------------

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/1853

> Add config property to disable incremental collection used in Thrift server
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-2590
>                 URL: https://issues.apache.org/jira/browse/SPARK-2590
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>            Priority: Blocker
>
> {{SparkSQLOperationManager}} uses {{RDD.toLocalIterator}} to collect the result set one partition at a time. This is useful to avoid OOM when the result is large, but introduces extra job scheduling costs as each partition is collected with a separate job. Users may want to disable this when the result set is expected to be small.
> *UPDATE* Incremental collection hurts performance because tasks of the last stage of the RDD DAG generated from the SQL query plan are executed sequentially. Thus we decided to disable it by default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org