You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Todd Gao <to...@gmail.com> on 2015/02/11 09:32:49 UTC

CallbackServer in PySpark Streaming

Hi all,

I am reading the code of PySpark and its Streaming module.

In PySpark Streaming, when the `compute` method of the instance of
PythonTransformedDStream is invoked, a connection to the CallbackServer
is created internally.
I wonder where is the CallbackServer for each PythonTransformedDStream
instance on the slave nodes in distributed environment.
Is there a CallbackServer running on every slave node?

thanks
Todd

Re: CallbackServer in PySpark Streaming

Posted by Todd Gao <to...@gmail.com>.

Oh I see! Thank you very much, Davies. You correct some of my wrong
understandings.

On Thu, Feb 12, 2015 at 9:50 AM, Davies Liu <da...@databricks.com> wrote:

> Yes.
>
> On Wed, Feb 11, 2015 at 5:44 PM, Todd Gao <to...@gmail.com>
> wrote:
> > Thanks Davies.
> > I am not quite familiar with Spark Streaming. Do you mean that the
> compute
> > routine of DStream is only invoked in the driver node,
> > while only the compute routines of RDD are distributed to the slaves?
> >
> > On Thu, Feb 12, 2015 at 2:38 AM, Davies Liu <da...@databricks.com>
> wrote:
> >>
> >> The CallbackServer is part of Py4j, it's only used in driver, not used
> >> in slaves or workers.
> >>
> >> On Wed, Feb 11, 2015 at 12:32 AM, Todd Gao
> >> <to...@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I am reading the code of PySpark and its Streaming module.
> >> >
> >> > In PySpark Streaming, when the `compute` method of the instance of
> >> > PythonTransformedDStream is invoked, a connection to the
> CallbackServer
> >> > is created internally.
> >> > I wonder where is the CallbackServer for each PythonTransformedDStream
> >> > instance on the slave nodes in distributed environment.
> >> > Is there a CallbackServer running on every slave node?
> >> >
> >> > thanks
> >> > Todd
> >
> >
>

Re: CallbackServer in PySpark Streaming

Posted by Davies Liu <da...@databricks.com>.

Yes.

On Wed, Feb 11, 2015 at 5:44 PM, Todd Gao <to...@gmail.com> wrote:
> Thanks Davies.
> I am not quite familiar with Spark Streaming. Do you mean that the compute
> routine of DStream is only invoked in the driver node,
> while only the compute routines of RDD are distributed to the slaves?
>
> On Thu, Feb 12, 2015 at 2:38 AM, Davies Liu <da...@databricks.com> wrote:
>>
>> The CallbackServer is part of Py4j, it's only used in driver, not used
>> in slaves or workers.
>>
>> On Wed, Feb 11, 2015 at 12:32 AM, Todd Gao
>> <to...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I am reading the code of PySpark and its Streaming module.
>> >
>> > In PySpark Streaming, when the `compute` method of the instance of
>> > PythonTransformedDStream is invoked, a connection to the CallbackServer
>> > is created internally.
>> > I wonder where is the CallbackServer for each PythonTransformedDStream
>> > instance on the slave nodes in distributed environment.
>> > Is there a CallbackServer running on every slave node?
>> >
>> > thanks
>> > Todd
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: CallbackServer in PySpark Streaming

Posted by Todd Gao <to...@gmail.com>.

Thanks Davies.
I am not quite familiar with Spark Streaming. Do you mean that the compute
routine of DStream is only invoked in the driver node,
while only the compute routines of RDD are distributed to the slaves?

On Thu, Feb 12, 2015 at 2:38 AM, Davies Liu <da...@databricks.com> wrote:

> The CallbackServer is part of Py4j, it's only used in driver, not used
> in slaves or workers.
>
> On Wed, Feb 11, 2015 at 12:32 AM, Todd Gao
> <to...@gmail.com> wrote:
> > Hi all,
> >
> > I am reading the code of PySpark and its Streaming module.
> >
> > In PySpark Streaming, when the `compute` method of the instance of
> > PythonTransformedDStream is invoked, a connection to the CallbackServer
> > is created internally.
> > I wonder where is the CallbackServer for each PythonTransformedDStream
> > instance on the slave nodes in distributed environment.
> > Is there a CallbackServer running on every slave node?
> >
> > thanks
> > Todd
>

Re: CallbackServer in PySpark Streaming

Posted by Davies Liu <da...@databricks.com>.

The CallbackServer is part of Py4j, it's only used in driver, not used
in slaves or workers.

On Wed, Feb 11, 2015 at 12:32 AM, Todd Gao
<to...@gmail.com> wrote:
> Hi all,
>
> I am reading the code of PySpark and its Streaming module.
>
> In PySpark Streaming, when the `compute` method of the instance of
> PythonTransformedDStream is invoked, a connection to the CallbackServer
> is created internally.
> I wonder where is the CallbackServer for each PythonTransformedDStream
> instance on the slave nodes in distributed environment.
> Is there a CallbackServer running on every slave node?
>
> thanks
> Todd

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org