You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2016/08/09 22:18:20 UTC

[jira] [Updated] (SPARK-16766) TakeOrderedAndProjectExec easily cause OOM

     [ https://issues.apache.org/jira/browse/SPARK-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Davies Liu updated SPARK-16766:
-------------------------------
    Priority: Minor  (was: Critical)

> TakeOrderedAndProjectExec easily cause OOM
> ------------------------------------------
>
>                 Key: SPARK-16766
>                 URL: https://issues.apache.org/jira/browse/SPARK-16766
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: drow blonde messi
>            Priority: Minor
>
> I found that a very simple SQL statement can easily cause a OOM.
> Like this:
> "insert into xyz2 select * from xyz order by x limit 900000000;"
> The problem is obvious: TakeOrderedAndProjectExec always malloc a huge Object array(array size equals to the limit count) when the executeCollect or doExecute is called.
> In Spark 1.6,  terminal/non-terminal TakeOrderedAndProject works the same way: call the RDD.takeOrdered(limit), which produces a huge BoundedPriorityQueue for every partition.
> In Spark 2.0, non-terminal TakeOrderedAndProject switch to use the  org.apache.spark.util.collection.Utils.takeOrdered, but the problem is still exists, the expression ordering.leastOf(input.asJava, num).iterator.asScala calls the leastOf method of com.google.common.collect.Ordering, and a large Object Array is produced:
>     int bufferCap = k * 2;
>     @SuppressWarnings("unchecked") // we'll only put E's in
>     E[] buffer = (E[]) new Object[bufferCap];



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org