You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Cesar Flores <ce...@gmail.com> on 2015/07/02 16:44:46 UTC
Dataframe in single partition after sorting?
I am sorting a data frame using something like:
val sortedDF = df.orderBy(df("score").desc)
The sorting is really fast. The issue I have is that after sorting, the
resulting data frame sortedDF appears to be in a single partition, which is
a problem because when I try to execute another operation in this new data
frame (i.e sortedDF.limit(1000000)) I have an error like the following:
Job aborted due to stage failure: Total size of serialized results of 194
tasks (5.0 GB) is bigger than spark.driver.maxResultSize (5.0 GB)
I have already tried to repartition the resulting sortedDF before doing any
operation on it, but the same error appears.
*Is there any smarter way to use dataframe orderBy on Spark, such that I do
not have this problem?*
The current version of spark I am using is 1.3.0, and due to company policy
it is not possible for me to try it in a newer version.
Thanks!!!
--
Cesar Flores