You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2016/08/09 22:18:20 UTC
[jira] [Updated] (SPARK-16766) TakeOrderedAndProjectExec easily
cause OOM
[ https://issues.apache.org/jira/browse/SPARK-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davies Liu updated SPARK-16766:
-------------------------------
Priority: Minor (was: Critical)
> TakeOrderedAndProjectExec easily cause OOM
> ------------------------------------------
>
> Key: SPARK-16766
> URL: https://issues.apache.org/jira/browse/SPARK-16766
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.2, 2.0.0
> Reporter: drow blonde messi
> Priority: Minor
>
> I found that a very simple SQL statement can easily cause a OOM.
> Like this:
> "insert into xyz2 select * from xyz order by x limit 900000000;"
> The problem is obvious: TakeOrderedAndProjectExec always malloc a huge Object array(array size equals to the limit count) when the executeCollect or doExecute is called.
> In Spark 1.6, terminal/non-terminal TakeOrderedAndProject works the same way: call the RDD.takeOrdered(limit), which produces a huge BoundedPriorityQueue for every partition.
> In Spark 2.0, non-terminal TakeOrderedAndProject switch to use the org.apache.spark.util.collection.Utils.takeOrdered, but the problem is still exists, the expression ordering.leastOf(input.asJava, num).iterator.asScala calls the leastOf method of com.google.common.collect.Ordering, and a large Object Array is produced:
> int bufferCap = k * 2;
> @SuppressWarnings("unchecked") // we'll only put E's in
> E[] buffer = (E[]) new Object[bufferCap];
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org