You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "chillon_m (JIRA)" <ji...@apache.org> on 2016/03/03 03:15:18 UTC

[jira] [Comment Edited] (SPARK-13614) show() trigger memory leak,why?

    [ https://issues.apache.org/jira/browse/SPARK-13614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175516#comment-15175516 ] 

chillon_m edited comment on SPARK-13614 at 3/3/16 2:14 AM:
-----------------------------------------------------------

[~srowen]
the same size of dataset(hot.count()=599147,ghot.size=21844),collect don't trigger memory leak(first image),but show() trigger it.why?in general,collect trigger it easily("Keep in mind that your entire dataset must fit in memory on a single machine to use collect() on it, so collect() shouldn’t be used on large datasets." in <learning spark>),but collect don't trigger.



was (Author: chillon_m):
the same size of dataset,collect don't trigger memory leak(first image),but show() trigger it.why?

> show() trigger memory leak,why?
> -------------------------------
>
>                 Key: SPARK-13614
>                 URL: https://issues.apache.org/jira/browse/SPARK-13614
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: chillon_m
>         Attachments: memory leak.png, memory.png
>
>
> hot.count()=599147
> ghot.size=21844
> [bigdata@namenode spark-1.5.2-bin-hadoop2.4]$ bin/spark-shell --driver-class-path /home/bigdata/mysql-connector-java-5.1.38-bin.jar 
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>       /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
> Type in expressions to have them evaluated.
> Type :help for more information.
> Spark context available as sc.
> SQL context available as sqlContext.
> scala> val hot=sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:mysql://:/?user=&password=","dbtable" -> "")).load()
> Wed Mar 02 14:22:37 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
> hot: org.apache.spark.sql.DataFrame = []
> scala> val ghot=hot.groupBy("Num","pNum").count().collect()
> Wed Mar 02 14:22:59 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
> ghot: Array[org.apache.spark.sql.Row] = Array([[],[],[], [,42310...
> scala> ghot.take(20)
> res0: Array[org.apache.spark.sql.Row] = Array([],[],[],[],[],[],[],[]....)
> scala> hot.groupBy("Num","pNum").count().show()
> Wed Mar 02 14:26:05 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
> 16/03/02 14:26:33 ERROR Executor: Managed memory leak detected; size = 4194304 bytes, TID = 202
> +----------+---------+-----+
> |     QQNum| TroopNum|count|
> +----------+---------+-----+
> |1XXXXXXXXX|38XXXXXXX|    1|
> |1XXXXXXXXX| 5XXXXXXX|    2|
> |1XXXXXXXXX|26XXXXXXX|    6|
> |1XXXXXXXXX|14XXXXXXX|    3|
> |1XXXXXXXXX|41XXXXXXX|   14|
> |1XXXXXXXXX|48XXXXXXX|   18|
> |1XXXXXXXXX|23XXXXXXX|    2|
> |1XXXXXXXXX|  XXXXXXX|   34|
> |1XXXXXXXXX|52XXXXXXX|    1|
> |1XXXXXXXXX|52XXXXXXX|    2|
> |1XXXXXXXXX|49XXXXXXX|    3|
> |1XXXXXXXXX|42XXXXXXX|    3|
> |1XXXXXXXXX|17XXXXXXX|   11|
> |1XXXXXXXXX|25XXXXXXX|  129|
> |1XXXXXXXXX|13XXXXXXX|    2|
> |1XXXXXXXXX|19XXXXXXX|    1|
> |1XXXXXXXXX|32XXXXXXX|    9|
> |1XXXXXXXXX|38XXXXXXX|    6|
> |1XXXXXXXXX|38XXXXXXX|   13|
> |1XXXXXXXXX|30XXXXXXX|    4|
> +----------+---------+-----+
> only showing top 20 rows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org