You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Max Bridgewater <ma...@gmail.com> on 2016/11/25 16:33:27 UTC

Apex Communication Protocols

Hi Folks,

I was giving an Apex demo the other day and people asked following
questions:

1) what is the communication protocol between operators when they are on
distant nodes. That means, how does Apex transport the tuples from one node
to the other?
Is it a custom protocol on top of TCP/IP or is it RPC?
2) What is the serialization algorithm used?
3) What is the addressing scheme between operators? That means how does
Apex know where an operator is located and how to route data to it? Is
there an operator registry? If so, where does it reside?

Thoughts?

Thanks,
Max.

Re: Apex Communication Protocols

Posted by Max Bridgewater <ma...@gmail.com>.
Thanks Amol. That's a good summary.

On Fri, Nov 25, 2016 at 1:04 PM, Amol Kekre <am...@datatorrent.com> wrote:

>
> Max,
> Between two operators the communication depends on stream locality. For
> thread local, the tuple is passed via thread stack; for container local
> there is a queue in between. For the rest there is a buffer server. This is
> effectively a pub-sub mechanism built over tcp-ip. Default is kryo
> serialization, but you can add your own. Addressing is via pub-sub; i.e.
> sender does not bother about addressing, the receiver connects to the
> buffer server showing interest. Routing is kicked off during launch by the
> master (Stram) and is a launch time or "re-do physical plan" time decision.
> Re-do will happen in outage, or dynamic changes.
>
> Thks
> Amol
>
>
> On Fri, Nov 25, 2016 at 8:33 AM, Max Bridgewater <
> max.bridgewater@gmail.com> wrote:
>
>> Hi Folks,
>>
>> I was giving an Apex demo the other day and people asked following
>> questions:
>>
>> 1) what is the communication protocol between operators when they are on
>> distant nodes. That means, how does Apex transport the tuples from one node
>> to the other?
>> Is it a custom protocol on top of TCP/IP or is it RPC?
>> 2) What is the serialization algorithm used?
>> 3) What is the addressing scheme between operators? That means how does
>> Apex know where an operator is located and how to route data to it? Is
>> there an operator registry? If so, where does it reside?
>>
>> Thoughts?
>>
>> Thanks,
>> Max.
>>
>>
>

Re: Apex Communication Protocols

Posted by Amol Kekre <am...@datatorrent.com>.
Max,
Between two operators the communication depends on stream locality. For
thread local, the tuple is passed via thread stack; for container local
there is a queue in between. For the rest there is a buffer server. This is
effectively a pub-sub mechanism built over tcp-ip. Default is kryo
serialization, but you can add your own. Addressing is via pub-sub; i.e.
sender does not bother about addressing, the receiver connects to the
buffer server showing interest. Routing is kicked off during launch by the
master (Stram) and is a launch time or "re-do physical plan" time decision.
Re-do will happen in outage, or dynamic changes.

Thks
Amol


On Fri, Nov 25, 2016 at 8:33 AM, Max Bridgewater <ma...@gmail.com>
wrote:

> Hi Folks,
>
> I was giving an Apex demo the other day and people asked following
> questions:
>
> 1) what is the communication protocol between operators when they are on
> distant nodes. That means, how does Apex transport the tuples from one node
> to the other?
> Is it a custom protocol on top of TCP/IP or is it RPC?
> 2) What is the serialization algorithm used?
> 3) What is the addressing scheme between operators? That means how does
> Apex know where an operator is located and how to route data to it? Is
> there an operator registry? If so, where does it reside?
>
> Thoughts?
>
> Thanks,
> Max.
>
>

Re: Apex Communication Protocols

Posted by Munagala Ramanath <ra...@datatorrent.com>.
A brief description is here:
http://docs.datatorrent.com/beginner/#buffer-server

It is indeed part of apex-core; see:
https://github.com/apache/apex-core/tree/master/engine/src/main/java/com/datatorrent/stram/
as well as the *stream* subdirectory.

Ram

On Fri, Nov 25, 2016 at 11:13 AM, Max Bridgewater <max.bridgewater@gmail.com
> wrote:

> Awesome. Thanks for the additional details. I can't find any reference to
> Buffer Server online. Could you give some pointers? or is it part of the
> Apex core?
>
> Regards,
> Max.
>
> On Fri, Nov 25, 2016 at 1:57 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
>> Max,
>>
>> Apex deploys a fast pub-sub server called buffer server in every yarn
>> container it gets (except AM) before deploying operators on it. For all the
>> operators which are connected downstream to operators outside the current
>> container, their output ports become publishers to the buffer server. The
>> downstream operators' input ports become subscribers to the buffer server.
>> So there is no concept a central operator/port registry, however all the
>> downstream operators do register their input ports with the buffer server.
>> The serialization algorithm is kryo based. The data transport protocol is
>> based on Netlet: https://github.com/DataTorrent/Netlet which is on top
>> of TCP/IP.
>>
>> Regards,
>> Ashwin.
>>
>> On Fri, Nov 25, 2016 at 8:33 AM, Max Bridgewater <
>> max.bridgewater@gmail.com> wrote:
>>
>>> Hi Folks,
>>>
>>> I was giving an Apex demo the other day and people asked following
>>> questions:
>>>
>>> 1) what is the communication protocol between operators when they are on
>>> distant nodes. That means, how does Apex transport the tuples from one node
>>> to the other?
>>> Is it a custom protocol on top of TCP/IP or is it RPC?
>>> 2) What is the serialization algorithm used?
>>> 3) What is the addressing scheme between operators? That means how does
>>> Apex know where an operator is located and how to route data to it? Is
>>> there an operator registry? If so, where does it reside?
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Max.
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>
>
>

Re: Apex Communication Protocols

Posted by Max Bridgewater <ma...@gmail.com>.
Awesome. Thanks for the additional details. I can't find any reference to
Buffer Server online. Could you give some pointers? or is it part of the
Apex core?

Regards,
Max.

On Fri, Nov 25, 2016 at 1:57 PM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Max,
>
> Apex deploys a fast pub-sub server called buffer server in every yarn
> container it gets (except AM) before deploying operators on it. For all the
> operators which are connected downstream to operators outside the current
> container, their output ports become publishers to the buffer server. The
> downstream operators' input ports become subscribers to the buffer server.
> So there is no concept a central operator/port registry, however all the
> downstream operators do register their input ports with the buffer server.
> The serialization algorithm is kryo based. The data transport protocol is
> based on Netlet: https://github.com/DataTorrent/Netlet which is on top of
> TCP/IP.
>
> Regards,
> Ashwin.
>
> On Fri, Nov 25, 2016 at 8:33 AM, Max Bridgewater <
> max.bridgewater@gmail.com> wrote:
>
>> Hi Folks,
>>
>> I was giving an Apex demo the other day and people asked following
>> questions:
>>
>> 1) what is the communication protocol between operators when they are on
>> distant nodes. That means, how does Apex transport the tuples from one node
>> to the other?
>> Is it a custom protocol on top of TCP/IP or is it RPC?
>> 2) What is the serialization algorithm used?
>> 3) What is the addressing scheme between operators? That means how does
>> Apex know where an operator is located and how to route data to it? Is
>> there an operator registry? If so, where does it reside?
>>
>> Thoughts?
>>
>> Thanks,
>> Max.
>>
>>
>
>
> --
>
> Regards,
> Ashwin.
>

Re: Apex Communication Protocols

Posted by Ashwin Chandra Putta <as...@gmail.com>.
Max,

Apex deploys a fast pub-sub server called buffer server in every yarn
container it gets (except AM) before deploying operators on it. For all the
operators which are connected downstream to operators outside the current
container, their output ports become publishers to the buffer server. The
downstream operators' input ports become subscribers to the buffer server.
So there is no concept a central operator/port registry, however all the
downstream operators do register their input ports with the buffer server.
The serialization algorithm is kryo based. The data transport protocol is
based on Netlet: https://github.com/DataTorrent/Netlet which is on top of
TCP/IP.

Regards,
Ashwin.

On Fri, Nov 25, 2016 at 8:33 AM, Max Bridgewater <ma...@gmail.com>
wrote:

> Hi Folks,
>
> I was giving an Apex demo the other day and people asked following
> questions:
>
> 1) what is the communication protocol between operators when they are on
> distant nodes. That means, how does Apex transport the tuples from one node
> to the other?
> Is it a custom protocol on top of TCP/IP or is it RPC?
> 2) What is the serialization algorithm used?
> 3) What is the addressing scheme between operators? That means how does
> Apex know where an operator is located and how to route data to it? Is
> there an operator registry? If so, where does it reside?
>
> Thoughts?
>
> Thanks,
> Max.
>
>


-- 

Regards,
Ashwin.