You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/06/25 08:35:41 UTC

[jira] [Updated] (SPARK-16207) order guarantees for DataFrames

     [ https://issues.apache.org/jira/browse/SPARK-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-16207:
------------------------------
    Priority: Minor  (was: Major)

Generally, things like RDD and DataFrame don't guarantee any order at all, unless they are product of an ordering operation like sort. I don't think blogs/SO are relevant as much as Spark docs, and they do cover this in places. If you have a specific suggestion, make a PR, but if this is a question, then it should be closed.

> order guarantees for DataFrames
> -------------------------------
>
>                 Key: SPARK-16207
>                 URL: https://issues.apache.org/jira/browse/SPARK-16207
>             Project: Spark
>          Issue Type: Documentation
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Max Moroz
>            Priority: Minor
>
> There's no clear explanation in the documentation about what guarantees are available for the preservation of order in DataFrames. Different blogs, SO answers, and posts on course websites suggest different things. It would be good to provide clarity on this.
> Examples of questions on which I could not find clarification:
> 1) Does groupby() preserve order?
> 2) Does take() preserve order?
> 3) Is DataFrame guaranteed to have the same order of lines as the text file it was read from? (Or as the json file, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org