You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yanbo Liang <ya...@gmail.com> on 2014/09/13 09:00:20 UTC

Re: How to save mllib model to hdfs and reload it

Shixiong,
These two snippets behave different in Scala.
In the second snippet, you define variable named m and does evaluate the
right hand size as part of the definition.
In other words,  the variable was replaced by the pre-computed value of
Array(1.0) in the subsequently code.
So in the second snippet, you do not need to serialize class and it can
work well even in distributed environment because it only send the
pre-computed value rather than the whole class to different execute nodes.

2014-08-14 22:54 GMT+08:00 Shixiong Zhu <zs...@gmail.com>:

> I think I can reproduce this error.
>
> The following code cannot work and report "Foo" cannot be serialized. (log
> in gist https://gist.github.com/zsxwing/4f9f17201d4378fe3e16):
>
> class Foo { def foo() = Array(1.0) }
> val t = new Foo
> val m = t.foo
> val r1 = sc.parallelize(List(1, 2, 3))
> val r2 = r1.map(_ + m(0))
> r2.toArray
>
> But the following code can work (log in gist
> https://gist.github.com/zsxwing/802cade0facb36a37656):
>
>  class Foo { def foo() = Array(1.0) }
> var m: Array[Double] = null
> {
>     val t = new Foo
>     m = t.foo
> }
> val r1 = sc.parallelize(List(1, 2, 3))
> val r2 = r1.map(_ + m(0))
> r2.toArray
>
>
> Best Regards,
> Shixiong Zhu
>
>
> 2014-08-14 22:11 GMT+08:00 Christopher Nguyen <ct...@adatao.com>:
>
>> Hi Hoai-Thu, the issue of private default constructor is unlikely the
>> cause here, since Lance was already able to load/deserialize the model
>> object.
>>
>> And on that side topic, I wish all serdes libraries would just use
>> constructor.setAccessible(true) by default :-) Most of the time that
>> privacy is not about serdes reflection restrictions.
>>
>> Sent while mobile. Pls excuse typos etc.
>> On Aug 14, 2014 1:58 AM, "Hoai-Thu Vuong" <th...@gmail.com> wrote:
>>
>>> A man in this community give me a video:
>>> https://www.youtube.com/watch?v=sPhyePwo7FA. I've got a same question
>>> in this community and other guys helped me to solve this problem. I'm
>>> trying to load MatrixFactorizationModel from object file, but compiler said
>>> that, I can not create object because the constructor is private. To solve
>>> this, I put my new object to same package as MatrixFactorizationModel.
>>> Luckly it works.
>>>
>>>
>>> On Wed, Aug 13, 2014 at 9:20 PM, Christopher Nguyen <ct...@adatao.com>
>>> wrote:
>>>
>>>> Lance, some debugging ideas: you might try model.predict(RDD[Vector])
>>>> to isolate the cause to serialization of the loaded model. And also try to
>>>> serialize the deserialized (loaded) model "manually" to see if that throws
>>>> any visible exceptions.
>>>>
>>>> Sent while mobile. Pls excuse typos etc.
>>>> On Aug 13, 2014 7:03 AM, "lancezhange" <la...@gmail.com> wrote:
>>>>
>>>>> my prediction codes are simple enough as follows:
>>>>>
>>>>>   *val labelsAndPredsOnGoodData = goodDataPoints.map { point =>
>>>>>   val prediction = model.predict(point.features)
>>>>>   (point.label, prediction)
>>>>>   }*
>>>>>
>>>>> when model is the loaded one, above code just can't work. Can you
>>>>> catch the
>>>>> error?
>>>>> Thanks.
>>>>>
>>>>> PS. i use spark-shell under standalone mode, version 1.0.0
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-mllib-model-to-hdfs-and-reload-it-tp11953p12035.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Thu.
>>>
>>
>