You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by crater <cq...@ucmerced.edu> on 2014/07/12 20:10:15 UTC

Putting block rdd failed when running example svm on large data

Hi,

I am trying to run the example BinaryClassification
(org.apache.spark.examples.mllib.BinaryClassification) on a 202G file. I am
constantly getting the messages looks like below, it is normal or I am
missing something.

14/07/12 09:49:04 WARN BlockManager: Block rdd_4_196 could not be dropped
from memory as it does not exist
14/07/12 09:49:04 WARN BlockManager: Putting block rdd_4_196 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_201 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_201 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_202 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_202 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_198 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_198 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_199 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_199 failed
14/07/12 09:49:05 WARN BlockManager: Block rdd_4_204 could not be dropped
from memory as it does not exist
14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_204 failed
14/07/12 09:49:06 WARN BlockManager: Block rdd_4_203 could not be dropped
from memory as it does not exist
14/07/12 09:49:06 WARN BlockManager: Putting block rdd_4_203 failed
14/07/12 09:49:07 WARN BlockManager: Block rdd_4_205 could not be dropped
from memory as it does not exist
14/07/12 09:49:07 WARN BlockManager: Putting block rdd_4_205 failed

Some info:
8 node cluster with 28G RAM per node, I configure 25G memory for spark. (So
this does not seem to be fit in memory).




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Putting block rdd failed when running example svm on large data

Posted by Aaron Davidson <il...@gmail.com>.

Also check the web ui for that. Each iteration will have one or more stages
associated with it in the driver web ui.

On Sat, Jul 12, 2014 at 6:47 PM, crater <cq...@ucmerced.edu> wrote:

> Hi Xiangrui,
>
> Thanks for the information. Also, it is possible to figure out the
> execution
> time per iteration for SVM?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515p9535.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Putting block rdd failed when running example svm on large data

Posted by crater <cq...@ucmerced.edu>.

Hi Xiangrui, 

Thanks for the information. Also, it is possible to figure out the execution
time per iteration for SVM?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515p9535.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Putting block rdd failed when running example svm on large data

Posted by Xiangrui Meng <me...@gmail.com>.

By default, Spark uses half of the memory for caching RDDs
(configurable by spark.storage.memoryFraction). That is about 25 * 8 /
2 = 100G for your setup, which is smaller than the 202G data size. So
you don't have enough memory to fully cache the RDD. You can confirm
it in the storage tab of the WebUI. SVM is still able to run, but
slower. -Xiangrui

On Sat, Jul 12, 2014 at 11:10 AM, crater <cq...@ucmerced.edu> wrote:
> Hi,
>
> I am trying to run the example BinaryClassification
> (org.apache.spark.examples.mllib.BinaryClassification) on a 202G file. I am
> constantly getting the messages looks like below, it is normal or I am
> missing something.
>
> 14/07/12 09:49:04 WARN BlockManager: Block rdd_4_196 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:04 WARN BlockManager: Putting block rdd_4_196 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_201 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_201 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_202 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_202 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_198 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_198 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_199 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_199 failed
> 14/07/12 09:49:05 WARN BlockManager: Block rdd_4_204 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:05 WARN BlockManager: Putting block rdd_4_204 failed
> 14/07/12 09:49:06 WARN BlockManager: Block rdd_4_203 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:06 WARN BlockManager: Putting block rdd_4_203 failed
> 14/07/12 09:49:07 WARN BlockManager: Block rdd_4_205 could not be dropped
> from memory as it does not exist
> 14/07/12 09:49:07 WARN BlockManager: Putting block rdd_4_205 failed
>
> Some info:
> 8 node cluster with 28G RAM per node, I configure 25G memory for spark. (So
> this does not seem to be fit in memory).
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Putting-block-rdd-failed-when-running-example-svm-on-large-data-tp9515.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.