You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "drow blonde messi (JIRA)" <ji...@apache.org> on 2016/07/28 08:53:20 UTC

[jira] [Created] (SPARK-16766) TakeOrderedAndProjectExec easily cause OOM

drow blonde messi created SPARK-16766:
-----------------------------------------

             Summary: TakeOrderedAndProjectExec easily cause OOM
                 Key: SPARK-16766
                 URL: https://issues.apache.org/jira/browse/SPARK-16766
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0, 1.6.2
            Reporter: drow blonde messi
            Priority: Critical


I found that a very simple SQL statement can easily cause a OOM.


Like this:
"insert into xyz2 select * from xyz order by x limit 900000000;"



The problem is obvious: TakeOrderedAndProjectExec always malloc a huge Object array(array size equals to the limit count) when the executeCollect or doExecute is called.


In Spark 1.6,  terminal/non-terminal TakeOrderedAndProject works the same way: call the RDD.takeOrdered(limit), which produces a huge BoundedPriorityQueue for every partition.

In Spark 2.0, non-terminal TakeOrderedAndProject switch to use the  org.apache.spark.util.collection.Utils.takeOrdered, but the problem is still exists, the expression ordering.leastOf(input.asJava, num).iterator.asScala calls the leastOf method of com.google.common.collect.Ordering, and a large Object Array is produced:

    int bufferCap = k * 2;
    @SuppressWarnings("unchecked") // we'll only put E's in
    E[] buffer = (E[]) new Object[bufferCap];








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org