You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jamborta <ja...@gmail.com> on 2014/09/23 17:48:45 UTC

access javaobject in rdd map

Hi all,

I have a java object that contains a ML model which I would like to use for
prediction (in python). I just want to iterate the data through a mapper and
predict for each value. Unfortunately, this fails when it tries to serialise
the object to sent it to the nodes. 

Is there a trick around this? Surely, this object could be picked up by
reference at the nodes.

many thanks,



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/access-javaobject-in-rdd-map-tp14898.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: access javaobject in rdd map

Posted by jamborta <ja...@gmail.com>.
Great. Thanks a lot.
On 23 Sep 2014 18:44, "Davies Liu-2 [via Apache Spark User List]" <
ml-node+s1001560n14908h13@n3.nabble.com> wrote:

> Right now, there is no way to access JVM in Python worker, in order
> to make this happen, we need to do:
>
> 1. setup py4j in Python worker
> 2. serialize the JVM objects and transfer to executors
> 3. link the JVM objects and py4j together to get an interface
>
> Before these happens, maybe you could try to setup a service
> for the model (such as RESTful service), access it map via RPC.
>
> On Tue, Sep 23, 2014 at 9:48 AM, Tamas Jambor <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=14908&i=0>> wrote:
>
> > Hi Davies,
> >
> > Thanks for the reply. I saw that you guys do that way in the code. Is
> > there no other way?
> >
> > I have implemented all the predict functions in scala, so I prefer not
> > to reimplement the whole thing in python.
> >
> > thanks,
> >
> >
> > On Tue, Sep 23, 2014 at 5:40 PM, Davies Liu <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=14908&i=1>> wrote:
> >> You should create a pure Python object (copy the attributes from Java
> object),
> >>  then it could be used in map.
> >>
> >> Davies
> >>
> >> On Tue, Sep 23, 2014 at 8:48 AM, jamborta <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=14908&i=2>> wrote:
> >>> Hi all,
> >>>
> >>> I have a java object that contains a ML model which I would like to
> use for
> >>> prediction (in python). I just want to iterate the data through a
> mapper and
> >>> predict for each value. Unfortunately, this fails when it tries to
> serialise
> >>> the object to sent it to the nodes.
> >>>
> >>> Is there a trick around this? Surely, this object could be picked up
> by
> >>> reference at the nodes.
> >>>
> >>> many thanks,
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/access-javaobject-in-rdd-map-tp14898.html
> >>> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=14908&i=3>
> >>> For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=14908&i=4>
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=14908&i=5>
> For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=14908&i=6>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/access-javaobject-in-rdd-map-mllib-tp14898p14908.html
>  To unsubscribe from access javaobject in rdd map (mllib), click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=14898&code=amFtYm9ydGFAZ21haWwuY29tfDE0ODk4fC00Mjk2ODU1NTM=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/access-javaobject-in-rdd-map-mllib-tp14898p14920.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: access javaobject in rdd map

Posted by Davies Liu <da...@databricks.com>.
Right now, there is no way to access JVM in Python worker, in order
to make this happen, we need to do:

1. setup py4j in Python worker
2. serialize the JVM objects and transfer to executors
3. link the JVM objects and py4j together to get an interface

Before these happens, maybe you could try to setup a service
for the model (such as RESTful service), access it map via RPC.

On Tue, Sep 23, 2014 at 9:48 AM, Tamas Jambor <ja...@gmail.com> wrote:
> Hi Davies,
>
> Thanks for the reply. I saw that you guys do that way in the code. Is
> there no other way?
>
> I have implemented all the predict functions in scala, so I prefer not
> to reimplement the whole thing in python.
>
> thanks,
>
>
> On Tue, Sep 23, 2014 at 5:40 PM, Davies Liu <da...@databricks.com> wrote:
>> You should create a pure Python object (copy the attributes from Java object),
>>  then it could be used in map.
>>
>> Davies
>>
>> On Tue, Sep 23, 2014 at 8:48 AM, jamborta <ja...@gmail.com> wrote:
>>> Hi all,
>>>
>>> I have a java object that contains a ML model which I would like to use for
>>> prediction (in python). I just want to iterate the data through a mapper and
>>> predict for each value. Unfortunately, this fails when it tries to serialise
>>> the object to sent it to the nodes.
>>>
>>> Is there a trick around this? Surely, this object could be picked up by
>>> reference at the nodes.
>>>
>>> many thanks,
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/access-javaobject-in-rdd-map-tp14898.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: access javaobject in rdd map

Posted by Tamas Jambor <ja...@gmail.com>.
Hi Davies,

Thanks for the reply. I saw that you guys do that way in the code. Is
there no other way?

I have implemented all the predict functions in scala, so I prefer not
to reimplement the whole thing in python.

thanks,


On Tue, Sep 23, 2014 at 5:40 PM, Davies Liu <da...@databricks.com> wrote:
> You should create a pure Python object (copy the attributes from Java object),
>  then it could be used in map.
>
> Davies
>
> On Tue, Sep 23, 2014 at 8:48 AM, jamborta <ja...@gmail.com> wrote:
>> Hi all,
>>
>> I have a java object that contains a ML model which I would like to use for
>> prediction (in python). I just want to iterate the data through a mapper and
>> predict for each value. Unfortunately, this fails when it tries to serialise
>> the object to sent it to the nodes.
>>
>> Is there a trick around this? Surely, this object could be picked up by
>> reference at the nodes.
>>
>> many thanks,
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/access-javaobject-in-rdd-map-tp14898.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: access javaobject in rdd map

Posted by Davies Liu <da...@databricks.com>.
You should create a pure Python object (copy the attributes from Java object),
 then it could be used in map.

Davies

On Tue, Sep 23, 2014 at 8:48 AM, jamborta <ja...@gmail.com> wrote:
> Hi all,
>
> I have a java object that contains a ML model which I would like to use for
> prediction (in python). I just want to iterate the data through a mapper and
> predict for each value. Unfortunately, this fails when it tries to serialise
> the object to sent it to the nodes.
>
> Is there a trick around this? Surely, this object could be picked up by
> reference at the nodes.
>
> many thanks,
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/access-javaobject-in-rdd-map-tp14898.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org