You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by sooraj <so...@gmail.com> on 2015/07/08 09:05:29 UTC

PySpark MLlib: py4j cannot find trainImplicitALSModel method

Hi,

I am using MLlib collaborative filtering API on an implicit preference data
set. From a pySpark notebook, I am iteratively creating the matrix
factorization model with the aim of measuring the RMSE for each combination
of parameters for this API like the rank, lambda and alpha. After the code
successfully completed six iterations, on the seventh call of the
ALS.trainImplicit API, I get a confusing exception that says py4j cannot
find the method trainImplicitALSmodel.  The full trace is included at the
end of the email.

I am running Spark over YARN (yarn-client mode) with five executors. This
error seems to be happening completely on the driver as I don't see any
error on the Spark web interface. I have tried changing the
spark.yarn.am.memory configuration value, but it doesn't help. Any
suggestion on how to debug this will be very helpful.

Thank you,
Sooraj

Here is the full error trace:

---------------------------------------------------------------------------Py4JError
                                Traceback (most recent call
last)<ipython-input-8-ad6ca35e7521> in <module>()      3       4 for
index, (r, l, a, i) in enumerate(itertools.product(ranks, lambdas,
alphas, iters)):----> 5     model = ALS.trainImplicit(scoreTableTrain,
rank = r, iterations = i, lambda_ = l, alpha = a)      6       7
predictionsTrain = model.predictAll(userProductTrainRDD)
/usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/recommendation.pyc
in trainImplicit(cls, ratings, rank, iterations, lambda_, blocks,
alpha, nonnegative, seed)    198
nonnegative=False, seed=None):    199         model =
callMLlibFunc("trainImplicitALSModel", cls._prepare(ratings), rank,-->
200                               iterations, lambda_, blocks, alpha,
nonnegative, seed)    201         return
MatrixFactorizationModel(model)    202
/usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc
in callMLlibFunc(name, *args)    126     sc =
SparkContext._active_spark_context    127     api =
getattr(sc._jvm.PythonMLLibAPI(), name)--> 128     return
callJavaFunc(sc, api, *args)    129     130
/usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc
in callJavaFunc(sc, func, *args)    119     """ Call Java Function """
   120     args = [_py2java(sc, a) for a in args]--> 121     return
_java2py(sc, func(*args))    122     123
/usr/local/lib/python2.7/site-packages/py4j/java_gateway.pyc in
__call__(self, *args)    536         answer =
self.gateway_client.send_command(command)    537         return_value
= get_return_value(answer, self.gateway_client,--> 538
self.target_id, self.name)    539     540         for temp_arg in
temp_args:
/usr/local/lib/python2.7/site-packages/py4j/protocol.pyc in
get_return_value(answer, gateway_client, target_id, name)    302
          raise Py4JError(    303                     'An error
occurred while calling {0}{1}{2}. Trace:\n{3}\n'.--> 304
      format(target_id, '.', name, value))    305         else:    306
            raise Py4JError(
Py4JError: An error occurred while calling o667.trainImplicitALSModel. Trace:
py4j.Py4JException: Method trainImplicitALSModel([class
org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class
java.lang.Integer, class java.lang.Integer, class java.lang.Integer,
class java.lang.Double, class java.lang.Boolean, null]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
	at py4j.Gateway.invoke(Gateway.java:252)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:207)
	at java.lang.Thread.run(Thread.java:724)

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

Posted by Ashish Dutt <as...@gmail.com>.
Hello Sooraj,
I see you are using ipython notebook.
Can you tell me are you on Windows OS or Linux based OS? I am using Windows
7 and I am new to Spark.
I am trying to connect ipython with my local cluster based on CDH5.4. I
followed these tutorials here but they are written on linux environment and
hence not much help to me.
I am able to launch ipython on localhost but cannot get it to work on the
cluster


Sincerely,
Ashish Dutt

On Wed, Jul 8, 2015 at 5:49 PM, sooraj <so...@gmail.com> wrote:

> That turned out to be a silly data type mistake. At one point in the
> iterative call, I was passing an integer value for the parameter 'alpha' of
> the ALS train API, which was expecting a Double. So, py4j in fact
> complained that it cannot take a method that takes an integer value for
> that parameter.
>
> On 8 July 2015 at 12:35, sooraj <so...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using MLlib collaborative filtering API on an implicit preference
>> data set. From a pySpark notebook, I am iteratively creating the matrix
>> factorization model with the aim of measuring the RMSE for each combination
>> of parameters for this API like the rank, lambda and alpha. After the code
>> successfully completed six iterations, on the seventh call of the
>> ALS.trainImplicit API, I get a confusing exception that says py4j cannot
>> find the method trainImplicitALSmodel.  The full trace is included at the
>> end of the email.
>>
>> I am running Spark over YARN (yarn-client mode) with five executors. This
>> error seems to be happening completely on the driver as I don't see any
>> error on the Spark web interface. I have tried changing the
>> spark.yarn.am.memory configuration value, but it doesn't help. Any
>> suggestion on how to debug this will be very helpful.
>>
>> Thank you,
>> Sooraj
>>
>> Here is the full error trace:
>>
>> ---------------------------------------------------------------------------Py4JError                                 Traceback (most recent call last)<ipython-input-8-ad6ca35e7521> in <module>()      3       4 for index, (r, l, a, i) in enumerate(itertools.product(ranks, lambdas, alphas, iters)):----> 5     model = ALS.trainImplicit(scoreTableTrain, rank = r, iterations = i, lambda_ = l, alpha = a)      6       7     predictionsTrain = model.predictAll(userProductTrainRDD)
>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/recommendation.pyc in trainImplicit(cls, ratings, rank, iterations, lambda_, blocks, alpha, nonnegative, seed)    198                       nonnegative=False, seed=None):    199         model = callMLlibFunc("trainImplicitALSModel", cls._prepare(ratings), rank,--> 200                               iterations, lambda_, blocks, alpha, nonnegative, seed)    201         return MatrixFactorizationModel(model)    202
>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc in callMLlibFunc(name, *args)    126     sc = SparkContext._active_spark_context    127     api = getattr(sc._jvm.PythonMLLibAPI(), name)--> 128     return callJavaFunc(sc, api, *args)    129     130
>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc in callJavaFunc(sc, func, *args)    119     """ Call Java Function """    120     args = [_py2java(sc, a) for a in args]--> 121     return _java2py(sc, func(*args))    122     123
>> /usr/local/lib/python2.7/site-packages/py4j/java_gateway.pyc in __call__(self, *args)    536         answer = self.gateway_client.send_command(command)    537         return_value = get_return_value(answer, self.gateway_client,--> 538                 self.target_id, self.name)    539     540         for temp_arg in temp_args:
>> /usr/local/lib/python2.7/site-packages/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name)    302                 raise Py4JError(    303                     'An error occurred while calling {0}{1}{2}. Trace:\n{3}\n'.--> 304                     format(target_id, '.', name, value))    305         else:    306             raise Py4JError(
>> Py4JError: An error occurred while calling o667.trainImplicitALSModel. Trace:
>> py4j.Py4JException: Method trainImplicitALSModel([class org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class java.lang.Double, class java.lang.Boolean, null]) does not exist
>> 	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>> 	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>> 	at py4j.Gateway.invoke(Gateway.java:252)
>> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> 	at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> 	at java.lang.Thread.run(Thread.java:724)
>>
>>
>>
>

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

Posted by Ashish Dutt <as...@gmail.com>.
Hello Sooraj,
Thank you for your response. It indeed give me a ray of hope now.
Can you please suggest any good tutorials for installing and working with
ipython notebook server on the node.
Thank you
Ashish
On 08-Jul-2015 6:16 PM, "sooraj" <so...@gmail.com> wrote:
>
> Hi Ashish,
>
> I am running ipython notebook server on one of the nodes of the cluster
(HDP). Setting it up was quite straightforward, and I guess I followed the
same references that you linked to. Then I access the notebook remotely
from my development PC. Never tried to connect a local ipython (on a PC) to
a remote Spark cluster. Not sure if that is possible.
>
> - Sooraj
>
> On 8 July 2015 at 15:31, Ashish Dutt <as...@gmail.com> wrote:
>>
>> My apologies for double posting but I missed the web links that i
followed which are 1, 2, 3
>>
>> Thanks,
>> Ashish
>>
>> On Wed, Jul 8, 2015 at 5:49 PM, sooraj <so...@gmail.com> wrote:
>>>
>>> That turned out to be a silly data type mistake. At one point in the
iterative call, I was passing an integer value for the parameter 'alpha' of
the ALS train API, which was expecting a Double. So, py4j in fact
complained that it cannot take a method that takes an integer value for
that parameter.
>>>
>>> On 8 July 2015 at 12:35, sooraj <so...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am using MLlib collaborative filtering API on an implicit preference
data set. From a pySpark notebook, I am iteratively creating the matrix
factorization model with the aim of measuring the RMSE for each combination
of parameters for this API like the rank, lambda and alpha. After the code
successfully completed six iterations, on the seventh call of the
ALS.trainImplicit API, I get a confusing exception that says py4j cannot
find the method trainImplicitALSmodel.  The full trace is included at the
end of the email.
>>>>
>>>> I am running Spark over YARN (yarn-client mode) with five executors.
This error seems to be happening completely on the driver as I don't see
any error on the Spark web interface. I have tried changing the
spark.yarn.am.memory configuration value, but it doesn't help. Any
suggestion on how to debug this will be very helpful.
>>>>
>>>> Thank you,
>>>> Sooraj
>>>>
>>>> Here is the full error trace:
>>>>
>>>>
---------------------------------------------------------------------------
>>>> Py4JError                                 Traceback (most recent call
last)
>>>> <ipython-input-8-ad6ca35e7521> in <module>()
>>>>       3
>>>>       4 for index, (r, l, a, i) in enumerate(itertools.product(ranks,
lambdas, alphas, iters)):
>>>> ----> 5     model = ALS.trainImplicit(scoreTableTrain, rank = r,
iterations = i, lambda_ = l, alpha = a)
>>>>       6
>>>>       7     predictionsTrain = model.predictAll(userProductTrainRDD)
>>>>
>>>>
/usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/recommendation.pyc
in trainImplicit(cls, ratings, rank, iterations, lambda_, blocks, alpha,
nonnegative, seed)
>>>>     198                       nonnegative=False, seed=None):
>>>>     199         model = callMLlibFunc("trainImplicitALSModel",
cls._prepare(ratings), rank,
>>>> --> 200                               iterations, lambda_, blocks,
alpha, nonnegative, seed)
>>>>     201         return MatrixFactorizationModel(model)
>>>>     202
>>>>
>>>>
/usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc
in callMLlibFunc(name, *args)
>>>>     126     sc = SparkContext._active_spark_context
>>>>     127     api = getattr(sc._jvm.PythonMLLibAPI(), name)
>>>> --> 128     return callJavaFunc(sc, api, *args)
>>>>     129
>>>>     130
>>>>
>>>>
/usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc
in callJavaFunc(sc, func, *args)
>>>>     119     """ Call Java Function """
>>>>     120     args = [_py2java(sc, a) for a in args]
>>>> --> 121     return _java2py(sc, func(*args))
>>>>     122
>>>>     123
>>>>
>>>> /usr/local/lib/python2.7/site-packages/py4j/java_gateway.pyc in
__call__(self, *args)
>>>>     536         answer = self.gateway_client.send_command(command)
>>>>     537         return_value = get_return_value(answer,
self.gateway_client,
>>>> --> 538                 self.target_id, self.name)
>>>>     539
>>>>     540         for temp_arg in temp_args:
>>>>
>>>> /usr/local/lib/python2.7/site-packages/py4j/protocol.pyc in
get_return_value(answer, gateway_client, target_id, name)
>>>>     302                 raise Py4JError(
>>>>     303                     'An error occurred while calling
{0}{1}{2}. Trace:\n{3}\n'.
>>>> --> 304                     format(target_id, '.', name, value))
>>>>     305         else:
>>>>     306             raise Py4JError(
>>>>
>>>> Py4JError: An error occurred while calling o667.trainImplicitALSModel.
Trace:
>>>> py4j.Py4JException: Method trainImplicitALSModel([class
org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class
java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class
java.lang.Double, class java.lang.Boolean, null]) does not exist
>>>> at
py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>>>> at
py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>>>> at py4j.Gateway.invoke(Gateway.java:252)
>>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>> at java.lang.Thread.run(Thread.java:724)
>>>>
>>>>
>>>
>>
>

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

Posted by sooraj <so...@gmail.com>.
Hi Ashish,

I am running ipython notebook server on one of the nodes of the cluster
(HDP). Setting it up was quite straightforward, and I guess I followed the
same references that you linked to. Then I access the notebook remotely
from my development PC. Never tried to connect a local ipython (on a PC) to
a remote Spark cluster. Not sure if that is possible.

- Sooraj

On 8 July 2015 at 15:31, Ashish Dutt <as...@gmail.com> wrote:

> My apologies for double posting but I missed the web links that i followed
> which are 1
> <http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/>,
> 2
> <http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/>,
> 3
> <http://nbviewer.ipython.org/gist/fperez/6384491/00-Setup-IPython-PySpark.ipynb>
>
> Thanks,
> Ashish
>
> On Wed, Jul 8, 2015 at 5:49 PM, sooraj <so...@gmail.com> wrote:
>
>> That turned out to be a silly data type mistake. At one point in the
>> iterative call, I was passing an integer value for the parameter 'alpha' of
>> the ALS train API, which was expecting a Double. So, py4j in fact
>> complained that it cannot take a method that takes an integer value for
>> that parameter.
>>
>> On 8 July 2015 at 12:35, sooraj <so...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am using MLlib collaborative filtering API on an implicit preference
>>> data set. From a pySpark notebook, I am iteratively creating the matrix
>>> factorization model with the aim of measuring the RMSE for each combination
>>> of parameters for this API like the rank, lambda and alpha. After the code
>>> successfully completed six iterations, on the seventh call of the
>>> ALS.trainImplicit API, I get a confusing exception that says py4j cannot
>>> find the method trainImplicitALSmodel.  The full trace is included at the
>>> end of the email.
>>>
>>> I am running Spark over YARN (yarn-client mode) with five executors.
>>> This error seems to be happening completely on the driver as I don't see
>>> any error on the Spark web interface. I have tried changing the
>>> spark.yarn.am.memory configuration value, but it doesn't help. Any
>>> suggestion on how to debug this will be very helpful.
>>>
>>> Thank you,
>>> Sooraj
>>>
>>> Here is the full error trace:
>>>
>>> ---------------------------------------------------------------------------Py4JError                                 Traceback (most recent call last)<ipython-input-8-ad6ca35e7521> in <module>()      3       4 for index, (r, l, a, i) in enumerate(itertools.product(ranks, lambdas, alphas, iters)):----> 5     model = ALS.trainImplicit(scoreTableTrain, rank = r, iterations = i, lambda_ = l, alpha = a)      6       7     predictionsTrain = model.predictAll(userProductTrainRDD)
>>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/recommendation.pyc in trainImplicit(cls, ratings, rank, iterations, lambda_, blocks, alpha, nonnegative, seed)    198                       nonnegative=False, seed=None):    199         model = callMLlibFunc("trainImplicitALSModel", cls._prepare(ratings), rank,--> 200                               iterations, lambda_, blocks, alpha, nonnegative, seed)    201         return MatrixFactorizationModel(model)    202
>>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc in callMLlibFunc(name, *args)    126     sc = SparkContext._active_spark_context    127     api = getattr(sc._jvm.PythonMLLibAPI(), name)--> 128     return callJavaFunc(sc, api, *args)    129     130
>>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc in callJavaFunc(sc, func, *args)    119     """ Call Java Function """    120     args = [_py2java(sc, a) for a in args]--> 121     return _java2py(sc, func(*args))    122     123
>>> /usr/local/lib/python2.7/site-packages/py4j/java_gateway.pyc in __call__(self, *args)    536         answer = self.gateway_client.send_command(command)    537         return_value = get_return_value(answer, self.gateway_client,--> 538                 self.target_id, self.name)    539     540         for temp_arg in temp_args:
>>> /usr/local/lib/python2.7/site-packages/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name)    302                 raise Py4JError(    303                     'An error occurred while calling {0}{1}{2}. Trace:\n{3}\n'.--> 304                     format(target_id, '.', name, value))    305         else:    306             raise Py4JError(
>>> Py4JError: An error occurred while calling o667.trainImplicitALSModel. Trace:
>>> py4j.Py4JException: Method trainImplicitALSModel([class org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class java.lang.Double, class java.lang.Boolean, null]) does not exist
>>> 	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>>> 	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>>> 	at py4j.Gateway.invoke(Gateway.java:252)
>>> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>> 	at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>> 	at java.lang.Thread.run(Thread.java:724)
>>>
>>>
>>>
>>
>

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

Posted by Ashish Dutt <as...@gmail.com>.
My apologies for double posting but I missed the web links that i followed
which are 1
<http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/>,
2
<http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/>,
3
<http://nbviewer.ipython.org/gist/fperez/6384491/00-Setup-IPython-PySpark.ipynb>

Thanks,
Ashish

On Wed, Jul 8, 2015 at 5:49 PM, sooraj <so...@gmail.com> wrote:

> That turned out to be a silly data type mistake. At one point in the
> iterative call, I was passing an integer value for the parameter 'alpha' of
> the ALS train API, which was expecting a Double. So, py4j in fact
> complained that it cannot take a method that takes an integer value for
> that parameter.
>
> On 8 July 2015 at 12:35, sooraj <so...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using MLlib collaborative filtering API on an implicit preference
>> data set. From a pySpark notebook, I am iteratively creating the matrix
>> factorization model with the aim of measuring the RMSE for each combination
>> of parameters for this API like the rank, lambda and alpha. After the code
>> successfully completed six iterations, on the seventh call of the
>> ALS.trainImplicit API, I get a confusing exception that says py4j cannot
>> find the method trainImplicitALSmodel.  The full trace is included at the
>> end of the email.
>>
>> I am running Spark over YARN (yarn-client mode) with five executors. This
>> error seems to be happening completely on the driver as I don't see any
>> error on the Spark web interface. I have tried changing the
>> spark.yarn.am.memory configuration value, but it doesn't help. Any
>> suggestion on how to debug this will be very helpful.
>>
>> Thank you,
>> Sooraj
>>
>> Here is the full error trace:
>>
>> ---------------------------------------------------------------------------Py4JError                                 Traceback (most recent call last)<ipython-input-8-ad6ca35e7521> in <module>()      3       4 for index, (r, l, a, i) in enumerate(itertools.product(ranks, lambdas, alphas, iters)):----> 5     model = ALS.trainImplicit(scoreTableTrain, rank = r, iterations = i, lambda_ = l, alpha = a)      6       7     predictionsTrain = model.predictAll(userProductTrainRDD)
>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/recommendation.pyc in trainImplicit(cls, ratings, rank, iterations, lambda_, blocks, alpha, nonnegative, seed)    198                       nonnegative=False, seed=None):    199         model = callMLlibFunc("trainImplicitALSModel", cls._prepare(ratings), rank,--> 200                               iterations, lambda_, blocks, alpha, nonnegative, seed)    201         return MatrixFactorizationModel(model)    202
>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc in callMLlibFunc(name, *args)    126     sc = SparkContext._active_spark_context    127     api = getattr(sc._jvm.PythonMLLibAPI(), name)--> 128     return callJavaFunc(sc, api, *args)    129     130
>> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc in callJavaFunc(sc, func, *args)    119     """ Call Java Function """    120     args = [_py2java(sc, a) for a in args]--> 121     return _java2py(sc, func(*args))    122     123
>> /usr/local/lib/python2.7/site-packages/py4j/java_gateway.pyc in __call__(self, *args)    536         answer = self.gateway_client.send_command(command)    537         return_value = get_return_value(answer, self.gateway_client,--> 538                 self.target_id, self.name)    539     540         for temp_arg in temp_args:
>> /usr/local/lib/python2.7/site-packages/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name)    302                 raise Py4JError(    303                     'An error occurred while calling {0}{1}{2}. Trace:\n{3}\n'.--> 304                     format(target_id, '.', name, value))    305         else:    306             raise Py4JError(
>> Py4JError: An error occurred while calling o667.trainImplicitALSModel. Trace:
>> py4j.Py4JException: Method trainImplicitALSModel([class org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class java.lang.Double, class java.lang.Boolean, null]) does not exist
>> 	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>> 	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>> 	at py4j.Gateway.invoke(Gateway.java:252)
>> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> 	at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> 	at java.lang.Thread.run(Thread.java:724)
>>
>>
>>
>

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

Posted by sooraj <so...@gmail.com>.
That turned out to be a silly data type mistake. At one point in the
iterative call, I was passing an integer value for the parameter 'alpha' of
the ALS train API, which was expecting a Double. So, py4j in fact
complained that it cannot take a method that takes an integer value for
that parameter.

On 8 July 2015 at 12:35, sooraj <so...@gmail.com> wrote:

> Hi,
>
> I am using MLlib collaborative filtering API on an implicit preference
> data set. From a pySpark notebook, I am iteratively creating the matrix
> factorization model with the aim of measuring the RMSE for each combination
> of parameters for this API like the rank, lambda and alpha. After the code
> successfully completed six iterations, on the seventh call of the
> ALS.trainImplicit API, I get a confusing exception that says py4j cannot
> find the method trainImplicitALSmodel.  The full trace is included at the
> end of the email.
>
> I am running Spark over YARN (yarn-client mode) with five executors. This
> error seems to be happening completely on the driver as I don't see any
> error on the Spark web interface. I have tried changing the
> spark.yarn.am.memory configuration value, but it doesn't help. Any
> suggestion on how to debug this will be very helpful.
>
> Thank you,
> Sooraj
>
> Here is the full error trace:
>
> ---------------------------------------------------------------------------Py4JError                                 Traceback (most recent call last)<ipython-input-8-ad6ca35e7521> in <module>()      3       4 for index, (r, l, a, i) in enumerate(itertools.product(ranks, lambdas, alphas, iters)):----> 5     model = ALS.trainImplicit(scoreTableTrain, rank = r, iterations = i, lambda_ = l, alpha = a)      6       7     predictionsTrain = model.predictAll(userProductTrainRDD)
> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/recommendation.pyc in trainImplicit(cls, ratings, rank, iterations, lambda_, blocks, alpha, nonnegative, seed)    198                       nonnegative=False, seed=None):    199         model = callMLlibFunc("trainImplicitALSModel", cls._prepare(ratings), rank,--> 200                               iterations, lambda_, blocks, alpha, nonnegative, seed)    201         return MatrixFactorizationModel(model)    202
> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc in callMLlibFunc(name, *args)    126     sc = SparkContext._active_spark_context    127     api = getattr(sc._jvm.PythonMLLibAPI(), name)--> 128     return callJavaFunc(sc, api, *args)    129     130
> /usr/local/spark-1.4/spark-1.4.0-bin-hadoop2.6/python/pyspark/mllib/common.pyc in callJavaFunc(sc, func, *args)    119     """ Call Java Function """    120     args = [_py2java(sc, a) for a in args]--> 121     return _java2py(sc, func(*args))    122     123
> /usr/local/lib/python2.7/site-packages/py4j/java_gateway.pyc in __call__(self, *args)    536         answer = self.gateway_client.send_command(command)    537         return_value = get_return_value(answer, self.gateway_client,--> 538                 self.target_id, self.name)    539     540         for temp_arg in temp_args:
> /usr/local/lib/python2.7/site-packages/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name)    302                 raise Py4JError(    303                     'An error occurred while calling {0}{1}{2}. Trace:\n{3}\n'.--> 304                     format(target_id, '.', name, value))    305         else:    306             raise Py4JError(
> Py4JError: An error occurred while calling o667.trainImplicitALSModel. Trace:
> py4j.Py4JException: Method trainImplicitALSModel([class org.apache.spark.api.java.JavaRDD, class java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class java.lang.Integer, class java.lang.Double, class java.lang.Boolean, null]) does not exist
> 	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
> 	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
> 	at py4j.Gateway.invoke(Gateway.java:252)
> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:207)
> 	at java.lang.Thread.run(Thread.java:724)
>
>
>