You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kannu Gupta (JIRA)" <ji...@apache.org> on 2017/07/14 07:30:00 UTC

[jira] [Commented] (SPARK-15423) why it is very slow to clean resources in Spark-2.0.0-preview

    [ https://issues.apache.org/jira/browse/SPARK-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086973#comment-16086973 ] 

Kannu Gupta commented on SPARK-15423:
-------------------------------------

[~srowen] I am facing the same issue with spark-2.1.0. Is the issue solved for spark-2.1.0?

> why it is very slow to clean resources in Spark-2.0.0-preview
> -------------------------------------------------------------
>
>                 Key: SPARK-15423
>                 URL: https://issues.apache.org/jira/browse/SPARK-15423
>             Project: Spark
>          Issue Type: Question
>          Components: Block Manager, MLlib
>    Affects Versions: 2.0.0
>         Environment: RedHat 6.5 (64 bit), JDK 1.8, Standalone mode
>            Reporter: zszhong
>              Labels: newbie, starter
>
> Hi, everyone! I'm new to Spark. Originally I submitted a post in [http://stackoverflow.com/questions/37331226/why-it-is-very-slow-to-clean-resources-in-spark], but somebody think that it is off-topic. Thus I post here to ask for your help. If this post is not related here, please feel free to delete it. I just copy the content here, I don't know how to edit the code to be more readable, so please refer to the link in stackoverflow.
> I've submitted a very simple task into a standalone Spark environment (`spark-2.0.0-preview`, `jdk 1.8`, `48 CPU cores`, `250 Gb memory`) with the following command:
> bin/spark-submit.sh --master spark://hostname.domain:7077 --conf "spark.executor.memory=8G" ../SimpleApp.py ../data/train/ ../data/val/
> where the `SimpleApp.py` is:
>         from __future__ import print_function
>         import sys
>         import os
>         import numpy as np
>         from pyspark import SparkContext 
>         from pyspark.mllib.tree import RandomForest, RandomForestModel
>         from pyspark.mllib.util import MLUtils 
>         trainDataPath = sys.argv[1]
>         valDataPath = sys.argv[2]
>         sc = SparkContext(appName="Classification using Spark Random Forest")
>         trainData = MLUtils.loadLibSVMFile(sc, trainDataPath)
>         valData = MLUtils.loadLibSVMFile(sc, valDataPath) 
>        model = RandomForest.trainClassifier(trainData, numClasses=6, categoricalFeaturesInfo={}, numTrees=3, featureSubsetStrategy="auto", impurity='gini', maxDepth=4, maxBins=32)
>         predictions = model.predict(valData.map(lambda x: x.features))
>         labelsAndPredictions = valData.map(lambda lp: lp.label).zip(predictions)
>         testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(valData.count())
>         print('Test Error = ' + str(testErr))
> And the task is running OK and can output the `Test Error` as follows:
> Test Error = 0.380580779161
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_21_piece0 on 127.0.0.1:59714 in memory (size: 12.1 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_21_piece0 on 127.0.0.1:37978 in memory (size: 12.1 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_19_piece0 on 127.0.0.1:37978 in memory (size: 10.9 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_19_piece0 on 127.0.0.1:59714 in memory (size: 10.9 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_20_piece0 on 127.0.0.1:59714 in memory (size: 4.6 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_20_piece0 on 127.0.0.1:37978 in memory (size: 4.6 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_17_piece0 on 127.0.0.1:59714 in memory (size: 4.0 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_17_piece0 on 127.0.0.1:37978 in memory (size: 4.0 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_18_piece0 on 127.0.0.1:59714 in memory (size: 455.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_18_piece0 on 127.0.0.1:37978 in memory (size: 455.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 4
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_16_piece0 on 127.0.0.1:59714 in memory (size: 9.2 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_16_piece0 on 127.0.0.1:37978 in memory (size: 9.2 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_14_piece0 on 127.0.0.1:59714 in memory (size: 3.6 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_14_piece0 on 127.0.0.1:37978 in memory (size: 3.6 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_15_piece0 on 127.0.0.1:59714 in memory (size: 389.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_15_piece0 on 127.0.0.1:37978 in memory (size: 389.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 3
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_12_piece0 on 127.0.0.1:59714 in memory (size: 345.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_12_piece0 on 127.0.0.1:37978 in memory (size: 345.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 2
> 16/05/20 01:04:52 INFO BlockManager: Removing RDD 19
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned RDD 19
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_22_piece0 on 127.0.0.1:59714 in memory (size: 4.5 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_22_piece0 on 127.0.0.1:37978 in memory (size: 4.5 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManager: Removing RDD 10
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned RDD 10
> 16/05/20 01:20:01 INFO BlockManager: Removing RDD 2
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned RDD 2
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 127.0.0.1:59714 in memory (size: 14.3 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 127.0.0.1:37978 on disk (size: 14.3 KB)
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned accumulator 0
> 16/05/20 01:20:01 INFO BlockManager: Removing RDD 6
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned RDD 6
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 127.0.0.1:59714 in memory (size: 14.3 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 127.0.0.1:37978 on disk (size: 14.3 KB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 127.0.0.1:59714 in memory (size: 4.1 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 127.0.0.1:37978 on disk (size: 4.1 KB)
> But after that, the task is still running and have no any signals to be exited. In the picture, it shows the task outputs the `Test Error` at `01:04:52`, and after more than an hour (I submitted the task at `00:50:00`), the job is still running. It is expected that the job should exit within a reasonable time.
> The job is still running after I submit this post (Now it is still running without any failed information). In Spark Master UI, it shows the job have been running 6.8 hours since I submitted (From 00:50:00 to Now).  
> Why is the cleaning procedure is so slow? Is there any related configuration that I missed?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org