You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liang-Chi Hsieh (JIRA)" <ji...@apache.org> on 2016/11/17 08:44:58 UTC

[jira] [Created] (SPARK-18487) Consume all elements for Dataset.show/take to avoid memory leak

Liang-Chi Hsieh created SPARK-18487:
---------------------------------------

             Summary: Consume all elements for Dataset.show/take to avoid memory leak
                 Key: SPARK-18487
                 URL: https://issues.apache.org/jira/browse/SPARK-18487
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Liang-Chi Hsieh


The methods such as Dataset.show and take use Limit (CollectLimitExec) which leverages SparkPlan.executeTake to efficiently collect required number of elements back to the driver.

However, under wholestage codege, we usually release resources after all elements are consumed (e.g., HashAggregate). In this case, we will not release the resources and cause memory leak with Dataset.show, for example.

We should consume all elements in the iterator to avoid memory leak.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org