You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2015/08/21 11:37:45 UTC

[jira] [Created] (SPARK-10155) Memory leak in SQL parsers

Shixiong Zhu created SPARK-10155:
------------------------------------

             Summary: Memory leak in SQL parsers
                 Key: SPARK-10155
                 URL: https://issues.apache.org/jira/browse/SPARK-10155
             Project: Spark
          Issue Type: Bug
            Reporter: Shixiong Zhu
            Priority: Critical


I saw a lot of `ThreadLocal` objects in the following app:
{code}
import org.apache.spark._
import org.apache.spark.sql._

object SparkApp {

  def foo(sqlContext: SQLContext): Unit = {
    import sqlContext.implicits._
    sqlContext.sparkContext.parallelize(Seq("aaa", "bbb", "ccc")).toDF().filter("length(_1) > 0").count()
  }

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("sql-memory-leak")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    while (true) {
      foo(sqlContext)
    }
  }
}
{code}
Running the above codes in a long time and finally it will OOM.

These ThreadLocal are from "scala.util.parsing.combinator.Parsers.lastNoSuccessVar", which stores `Failure("end of input", ...)`.

There is an issue in Scala here: https://issues.scala-lang.org/browse/SI-9010
and some discussions here: https://issues.scala-lang.org/browse/SI-4929

I tried to fix it using reflection but failed because of the complicated byte codes generated by Scala trait mixin.

Looks the best solution is reusing Parser?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org