You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "drow blonde messi (JIRA)" <ji...@apache.org> on 2016/07/28 08:53:20 UTC
[jira] [Created] (SPARK-16766) TakeOrderedAndProjectExec easily
cause OOM
drow blonde messi created SPARK-16766:
-----------------------------------------
Summary: TakeOrderedAndProjectExec easily cause OOM
Key: SPARK-16766
URL: https://issues.apache.org/jira/browse/SPARK-16766
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.0, 1.6.2
Reporter: drow blonde messi
Priority: Critical
I found that a very simple SQL statement can easily cause a OOM.
Like this:
"insert into xyz2 select * from xyz order by x limit 900000000;"
The problem is obvious: TakeOrderedAndProjectExec always malloc a huge Object array(array size equals to the limit count) when the executeCollect or doExecute is called.
In Spark 1.6, terminal/non-terminal TakeOrderedAndProject works the same way: call the RDD.takeOrdered(limit), which produces a huge BoundedPriorityQueue for every partition.
In Spark 2.0, non-terminal TakeOrderedAndProject switch to use the org.apache.spark.util.collection.Utils.takeOrdered, but the problem is still exists, the expression ordering.leastOf(input.asJava, num).iterator.asScala calls the leastOf method of com.google.common.collect.Ordering, and a large Object Array is produced:
int bufferCap = k * 2;
@SuppressWarnings("unchecked") // we'll only put E's in
E[] buffer = (E[]) new Object[bufferCap];
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org