You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michal Laclavik (JIRA)" <ji...@apache.org> on 2015/08/19 17:15:45 UTC

[jira] [Created] (SPARK-10115) MLlib ALS training fails with java.lang.ClassCastException

Michal Laclavik created SPARK-10115:
---------------------------------------

             Summary: MLlib ALS training fails with java.lang.ClassCastException
                 Key: SPARK-10115
                 URL: https://issues.apache.org/jira/browse/SPARK-10115
             Project: Spark
          Issue Type: Bug
         Environment: first experienced on spark 1.2.1 but then also with latest
spark-1.4.1-bin-hadoop2.6
            Reporter: Michal Laclavik


I am running ALS collaborative filtering training on data which looks as follows (sample by running "user_product.take(10)":
{code}
[(1205640308657491975, 50233468418, 1.0),
 (4743366459073625989, 50233472294, 1.0),
 (4743366459073625989, 50233473253, 1.0),
 (4743366459073625989, 75586230246, 1.0),
 (4743366459073625989, 50233473248, 1.0),
 (56766162624422850, 74848929776, 1.0),
 (56766162624422850, 50233473397, 1.0),
 (56766162624422850, 78185852309, 1.0),
 (56766162624422850, 73533710263, 1.0),
 (56766162624422850, 78185852319, 1.0)]
{code}

and then I call training on that RDD and I get following error:
{code}
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-54-4e711b94952d> in <module>()
      2 rank = 12
      3 iterations=5
----> 4 model = ALS.train(user_product, rank, iterations)

/opt/spark/python/pyspark/mllib/recommendation.py in train(cls, ratings, rank, iterations, lambda_, blocks, nonnegative, seed)
    192               seed=None):
    193         model = callMLlibFunc("trainALSModel", cls._prepare(ratings), rank, iterations,
--> 194                               lambda_, blocks, nonnegative, seed)
    195         return MatrixFactorizationModel(model)
    196 

/opt/spark/python/pyspark/mllib/common.py in callMLlibFunc(name, *args)
    126     sc = SparkContext._active_spark_context
    127     api = getattr(sc._jvm.PythonMLLibAPI(), name)
--> 128     return callJavaFunc(sc, api, *args)
    129 
    130 

/opt/spark/python/pyspark/mllib/common.py in callJavaFunc(sc, func, *args)
    119     """ Call Java Function """
    120     args = [_py2java(sc, a) for a in args]
--> 121     return _java2py(sc, func(*args))
    122 
    123 

/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    536         answer = self.gateway_client.send_command(command)
    537         return_value = get_return_value(answer, self.gateway_client,
--> 538                 self.target_id, self.name)
    539 
    540         for temp_arg in temp_args:

/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    298                 raise Py4JJavaError(
    299                     'An error occurred while calling {0}{1}{2}.\n'.
--> 300                     format(target_id, '.', name), value)
    301             else:
    302                 raise Py4JError(

Py4JJavaError: An error occurred while calling o448.trainALSModel.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 57.0 failed 1 times, most recent failure: Lost task 9.0 in stage 57.0 (TID 4187, localhost): java.lang.ClassCastException

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org