You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alok Bhandari (JIRA)" <ji...@apache.org> on 2016/07/10 11:10:11 UTC
[jira] [Created] (SPARK-16473) BisectingKMeans Algorithm failing
with java.util.NoSuchElementException: key not found
Alok Bhandari created SPARK-16473:
-------------------------------------
Summary: BisectingKMeans Algorithm failing with java.util.NoSuchElementException: key not found
Key: SPARK-16473
URL: https://issues.apache.org/jira/browse/SPARK-16473
Project: Spark
Issue Type: Bug
Components: MLlib
Affects Versions: 1.6.1
Environment: AWS EC2 linux instance.
Reporter: Alok Bhandari
Hello ,
I am using apache spark 1.6.1.
I am executing bisecting k means algorithm on a specific dataset .
Dataset details :-
K=100,
input vector =100K*100k
Memory assigned 16GB per node ,
number of nodes =2.
Till K=75 it os working fine , but when I set k=100 , it fails with java.util.NoSuchElementException: key not found.
*I suspect it is failing because of lack of some resources , but somehow exception does not convey anything as why this spark job failed.*
Please can someone point me to root cause of this exception , why it is failing.
This is the exception stack-trace:-
{code}
java.util.NoSuchElementException: key not found: 166
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:58)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply$mcDJ$sp(BisectingKMeans.scala:338)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1$$anonfun$2.apply(BisectingKMeans.scala:337)
at scala.collection.TraversableOnce$$anonfun$minBy$1.apply(TraversableOnce.scala:231)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:125)
at scala.collection.immutable.List.reduceLeft(List.scala:84)
at scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:231)
at scala.collection.AbstractTraversable.minBy(Traversable.scala:105)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:337)
at org.apache.spark.mllib.clustering.BisectingKMeans$$anonfun$org$apache$spark$mllib$clustering$BisectingKMeans$$updateAssignments$1.apply(BisectingKMeans.scala:334)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
{code}
Issue is that , it is failing but not giving any explicit message as to why it failed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org