You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Robert Kruszewski (JIRA)" <ji...@apache.org> on 2016/08/10 01:28:20 UTC

[jira] [Created] (SPARK-16984) executeTake tries all partitions if first parition is empty

Robert Kruszewski created SPARK-16984:
-----------------------------------------

             Summary: executeTake tries all partitions if first parition is empty
                 Key: SPARK-16984
                 URL: https://issues.apache.org/jira/browse/SPARK-16984
             Project: Spark
          Issue Type: Bug
    Affects Versions: 2.0.0
            Reporter: Robert Kruszewski


in executeTake if the number of rows returned by first partition is 0 we try all partitions next time. This can lead to pathological cases where your first partition is empty and rest have data. This unfortunately can happen with skewed data. Empirically observed it's better to make few roundtrips instead of potentially killing driver with big collect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org