You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Xi Shen <da...@gmail.com> on 2015/03/16 05:25:12 UTC

k-means hang without error/warning

Hi,

I am running k-means using Spark in local mode. My data set is about 30k
records, and I set the k = 1000.

The algorithm starts and finished 13 jobs according to the UI monitor, then
it stopped working.

The last log I saw was:

[Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned
broadcast *16*

There're many similar log repeated, but it seems it always stop at the 16th.

If I try to low down the *k* value, the algorithm will terminated. So I
just want to know what's wrong with *k=1000*.


Thanks,
David

Re: k-means hang without error/warning

Posted by Xi Shen <da...@gmail.com>.

I used "local[*]". The CPU hits about 80% when there are active jobs, then
it drops to about 13% and hand for a very long time.

Thanks,
David

On Mon, 16 Mar 2015 17:46 Akhil Das <ak...@sigmoidanalytics.com> wrote:

> How many threads are you allocating while creating the sparkContext? like
> local[4] will allocate 4 threads. You can try increasing it to a higher
> number also try setting level of parallelism to a higher number.
>
> Thanks
> Best Regards
>
> On Mon, Mar 16, 2015 at 9:55 AM, Xi Shen <da...@gmail.com> wrote:
>
>> Hi,
>>
>> I am running k-means using Spark in local mode. My data set is about 30k
>> records, and I set the k = 1000.
>>
>> The algorithm starts and finished 13 jobs according to the UI monitor,
>> then it stopped working.
>>
>> The last log I saw was:
>>
>> [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned
>> broadcast *16*
>>
>> There're many similar log repeated, but it seems it always stop at the
>> 16th.
>>
>> If I try to low down the *k* value, the algorithm will terminated. So I
>> just want to know what's wrong with *k=1000*.
>>
>>
>> Thanks,
>> David
>>
>>
>

Re: k-means hang without error/warning

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

How many threads are you allocating while creating the sparkContext? like
local[4] will allocate 4 threads. You can try increasing it to a higher
number also try setting level of parallelism to a higher number.

Thanks
Best Regards

On Mon, Mar 16, 2015 at 9:55 AM, Xi Shen <da...@gmail.com> wrote:

> Hi,
>
> I am running k-means using Spark in local mode. My data set is about 30k
> records, and I set the k = 1000.
>
> The algorithm starts and finished 13 jobs according to the UI monitor,
> then it stopped working.
>
> The last log I saw was:
>
> [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned
> broadcast *16*
>
> There're many similar log repeated, but it seems it always stop at the
> 16th.
>
> If I try to low down the *k* value, the algorithm will terminated. So I
> just want to know what's wrong with *k=1000*.
>
>
> Thanks,
> David
>
>

Re: k-means hang without error/warning

Posted by Sean Owen <so...@cloudera.com>.

I think you'd have to say more about "stopped working". Is the GC
thrashing? does the UI respond? is the CPU busy or not?

On Mon, Mar 16, 2015 at 4:25 AM, Xi Shen <da...@gmail.com> wrote:
> Hi,
>
> I am running k-means using Spark in local mode. My data set is about 30k
> records, and I set the k = 1000.
>
> The algorithm starts and finished 13 jobs according to the UI monitor, then
> it stopped working.
>
> The last log I saw was:
>
> [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned
> broadcast 16
>
> There're many similar log repeated, but it seems it always stop at the 16th.
>
> If I try to low down the k value, the algorithm will terminated. So I just
> want to know what's wrong with k=1000.
>
>
> Thanks,
> David
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org