You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Zhong (JIRA)" <ji...@apache.org> on 2016/09/07 04:44:20 UTC
[jira] [Created] (SPARK-17426) Current TreeNode.toJSON may trigger
OOM under some corner cases
Sean Zhong created SPARK-17426:
----------------------------------
Summary: Current TreeNode.toJSON may trigger OOM under some corner cases
Key: SPARK-17426
URL: https://issues.apache.org/jira/browse/SPARK-17426
Project: Spark
Issue Type: Bug
Reporter: Sean Zhong
In SPARK-17356, we fix the OOM issue when {monospace}Metadata{monospace} is super big. There are other cases that may also trigger OOM. Current implementation of TreeNode.toJSON will recursively search and print all fields of current TreeNode, even if the field's type is of type Seq or type Map.
This is not safe because:
1. the Seq or Map can be very big. Converting them to JSON make take huge memory, which may trigger out of memory error.
2. Some user space input may also be propagated to the Plan. The input can be of arbitrary type, and may also be self-referencing. Trying to print user space to JSON input is very risky.
The following example triggers a StackOverflowError when calling toJSON on a plan with user defined UDF.
{code}
case class SelfReferenceUDF(
var config: Map[String, Any] = Map.empty[String, Any]) extends Function1[String, Boolean] {
config += "self" -> this
def apply(key: String): Boolean = config.contains(key)
}
test("toJSON should not throws java.lang.StackOverflowError") {
val udf = ScalaUDF(SelfReferenceUDF(), BooleanType, Seq("col1".attr))
// triggers java.lang.StackOverflowError
udf.toJSON
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org