You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kannu Gupta (JIRA)" <ji...@apache.org> on 2017/07/14 07:30:00 UTC
[jira] [Commented] (SPARK-15423) why it is very slow to clean
resources in Spark-2.0.0-preview
[ https://issues.apache.org/jira/browse/SPARK-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086973#comment-16086973 ]
Kannu Gupta commented on SPARK-15423:
-------------------------------------
[~srowen] I am facing the same issue with spark-2.1.0. Is the issue solved for spark-2.1.0?
> why it is very slow to clean resources in Spark-2.0.0-preview
> -------------------------------------------------------------
>
> Key: SPARK-15423
> URL: https://issues.apache.org/jira/browse/SPARK-15423
> Project: Spark
> Issue Type: Question
> Components: Block Manager, MLlib
> Affects Versions: 2.0.0
> Environment: RedHat 6.5 (64 bit), JDK 1.8, Standalone mode
> Reporter: zszhong
> Labels: newbie, starter
>
> Hi, everyone! I'm new to Spark. Originally I submitted a post in [http://stackoverflow.com/questions/37331226/why-it-is-very-slow-to-clean-resources-in-spark], but somebody think that it is off-topic. Thus I post here to ask for your help. If this post is not related here, please feel free to delete it. I just copy the content here, I don't know how to edit the code to be more readable, so please refer to the link in stackoverflow.
> I've submitted a very simple task into a standalone Spark environment (`spark-2.0.0-preview`, `jdk 1.8`, `48 CPU cores`, `250 Gb memory`) with the following command:
> bin/spark-submit.sh --master spark://hostname.domain:7077 --conf "spark.executor.memory=8G" ../SimpleApp.py ../data/train/ ../data/val/
> where the `SimpleApp.py` is:
> from __future__ import print_function
> import sys
> import os
> import numpy as np
> from pyspark import SparkContext
> from pyspark.mllib.tree import RandomForest, RandomForestModel
> from pyspark.mllib.util import MLUtils
> trainDataPath = sys.argv[1]
> valDataPath = sys.argv[2]
> sc = SparkContext(appName="Classification using Spark Random Forest")
> trainData = MLUtils.loadLibSVMFile(sc, trainDataPath)
> valData = MLUtils.loadLibSVMFile(sc, valDataPath)
> model = RandomForest.trainClassifier(trainData, numClasses=6, categoricalFeaturesInfo={}, numTrees=3, featureSubsetStrategy="auto", impurity='gini', maxDepth=4, maxBins=32)
> predictions = model.predict(valData.map(lambda x: x.features))
> labelsAndPredictions = valData.map(lambda lp: lp.label).zip(predictions)
> testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(valData.count())
> print('Test Error = ' + str(testErr))
> And the task is running OK and can output the `Test Error` as follows:
> Test Error = 0.380580779161
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_21_piece0 on 127.0.0.1:59714 in memory (size: 12.1 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_21_piece0 on 127.0.0.1:37978 in memory (size: 12.1 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_19_piece0 on 127.0.0.1:37978 in memory (size: 10.9 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_19_piece0 on 127.0.0.1:59714 in memory (size: 10.9 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_20_piece0 on 127.0.0.1:59714 in memory (size: 4.6 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_20_piece0 on 127.0.0.1:37978 in memory (size: 4.6 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_17_piece0 on 127.0.0.1:59714 in memory (size: 4.0 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_17_piece0 on 127.0.0.1:37978 in memory (size: 4.0 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_18_piece0 on 127.0.0.1:59714 in memory (size: 455.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_18_piece0 on 127.0.0.1:37978 in memory (size: 455.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 4
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_16_piece0 on 127.0.0.1:59714 in memory (size: 9.2 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_16_piece0 on 127.0.0.1:37978 in memory (size: 9.2 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_14_piece0 on 127.0.0.1:59714 in memory (size: 3.6 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_14_piece0 on 127.0.0.1:37978 in memory (size: 3.6 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_15_piece0 on 127.0.0.1:59714 in memory (size: 389.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_15_piece0 on 127.0.0.1:37978 in memory (size: 389.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 3
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_12_piece0 on 127.0.0.1:59714 in memory (size: 345.0 B, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_12_piece0 on 127.0.0.1:37978 in memory (size: 345.0 B, free: 4.5 GB)
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned shuffle 2
> 16/05/20 01:04:52 INFO BlockManager: Removing RDD 19
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned RDD 19
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_22_piece0 on 127.0.0.1:59714 in memory (size: 4.5 KB, free: 511.1 MB)
> 16/05/20 01:04:52 INFO BlockManagerInfo: Removed broadcast_22_piece0 on 127.0.0.1:37978 in memory (size: 4.5 KB, free: 4.5 GB)
> 16/05/20 01:04:52 INFO BlockManager: Removing RDD 10
> 16/05/20 01:04:52 INFO ContextCleaner: Cleaned RDD 10
> 16/05/20 01:20:01 INFO BlockManager: Removing RDD 2
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned RDD 2
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 127.0.0.1:59714 in memory (size: 14.3 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 127.0.0.1:37978 on disk (size: 14.3 KB)
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned accumulator 0
> 16/05/20 01:20:01 INFO BlockManager: Removing RDD 6
> 16/05/20 01:20:01 INFO ContextCleaner: Cleaned RDD 6
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 127.0.0.1:59714 in memory (size: 14.3 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 127.0.0.1:37978 on disk (size: 14.3 KB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 127.0.0.1:59714 in memory (size: 4.1 KB, free: 511.1 MB)
> 16/05/20 01:20:01 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 127.0.0.1:37978 on disk (size: 4.1 KB)
> But after that, the task is still running and have no any signals to be exited. In the picture, it shows the task outputs the `Test Error` at `01:04:52`, and after more than an hour (I submitted the task at `00:50:00`), the job is still running. It is expected that the job should exit within a reasonable time.
> The job is still running after I submit this post (Now it is still running without any failed information). In Spark Master UI, it shows the job have been running 6.8 hours since I submitted (From 00:50:00 to Now).
> Why is the cleaning procedure is so slow? Is there any related configuration that I missed?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org