You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ilya Matiach (JIRA)" <ji...@apache.org> on 2017/10/05 21:05:00 UTC

[jira] [Commented] (SPARK-16473) BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found

    [ https://issues.apache.org/jira/browse/SPARK-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193675#comment-16193675 ] 

Ilya Matiach commented on SPARK-16473:
--------------------------------------

[~podongfeng] interesting - it looks like the dataset representation is somehow changing when it is cached?  My guess is that the row order may be changing or the numeric values may be changing?  The test failure itself is ok if the number of clusters is equal to k (which is actually perfectly fine for the algorithm), it just means that the dataset was not generated correctly to hit the very special edge case I was looking for, where one cluster is empty after a split in bisecting k-means.  I can't seem to see the test failure error message in your PR, could you run another build and post it here?  We may need to add some debugging/print statements everywhere to determine how the data is changing when you cache it - this doesn't mean there is any bug in the algorithm, it just means the test needs to be changed so that the test data, even after caching, is the same as the original one.

> BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-16473
>                 URL: https://issues.apache.org/jira/browse/SPARK-16473
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib
>    Affects Versions: 1.6.1, 2.0.0
>         Environment: AWS EC2 linux instance. 
>            Reporter: Alok Bhandari
>            Assignee: Ilya Matiach
>             Fix For: 2.1.1, 2.2.0
>
>
> Hello , 
> I am using apache spark 1.6.1. 
> I am executing bisecting k means algorithm on a specific dataset .
> Dataset details :- 
> K=100,
> input vector =100K*100k
> Memory assigned 16GB per node ,
> number of nodes =2.
>  Till K=75 it os working fine , but when I set k=100 , it fails with java.util.NoSuchElementException: key not found. 
> *I suspect it is failing because of lack of some resources , but somehow exception does not convey anything as why this spark job failed.* 
> Please can someone point me to root cause of this exception , why it is failing. 
> This is the exception stack-trace:- 
> {code}
> java.util.NoSuchElementException: key not found: 166 
>         at scala.collection.MapLike$class.default(MapLike.scala:228) 
>         at scala.collection.AbstractMap.default(Map.scala:58) 
>         at scala.collection.MapLike$class.apply(MapLike.scala:141) 
>         at scala.collection.AbstractMap.apply(Map.scala:58) 
>         at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
>         at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
>         at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
>         at scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231) 
>         at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) 
>         at scala.collection.immutable.List.foldLeft(List.scala:84) 
>         at scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125) 
>         at scala.collection.immutable.List.reduceLeft(List.scala:84) 
>         at scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231) 
>         at scala.collection.AbstractTraversable.minBy(Traversable.scala:105) 
>         at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337) 
>         at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334) 
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
>         at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) 
> {code}
> Issue is that , it is failing but not giving any explicit message as to why it failed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org