You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/08/28 19:26:20 UTC

[jira] [Commented] (SPARK-17283) Cancel job in RDD.take() as soon as enough output is receieved

    [ https://issues.apache.org/jira/browse/SPARK-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15443975#comment-15443975 ] 

Apache Spark commented on SPARK-17283:
--------------------------------------

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/14854

> Cancel job in RDD.take() as soon as enough output is receieved
> --------------------------------------------------------------
>
>                 Key: SPARK-17283
>                 URL: https://issues.apache.org/jira/browse/SPARK-17283
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> The current implementation of RDD.take() waits until all partitions of each job have been computed before checking whether enough rows have been received. If take() were to perform this check on-the-fly as individual partitions were completed then it could stop early, offering large speedups for certain interactive queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org