You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "Ganelin, Ilya" <Il...@capitalone.com> on 2017/03/02 03:53:47 UTC

Operator Node Affinity

Hello, all – is there any way to deploy a given operator to a specific Node? E.g. if I’m trying to create a listener for a TCP socket that can then pipe data to a DAG, is there any way for the location of that listener to be deterministic so an upstream entity knows what to connect to?

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

RE: Operator Node Affinity

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
Thanks, Amol / that's the solution in a nutshell. I think static ports are fine for our environment so the simpler solution is preferable. I'm hoping to roll out the app tomorrow to give it a shot. Will keep you posted!




________________________________
From: Amol Kekre <am...@datatorrent.com>
Sent: Thursday, March 2, 2017 5:15:42 PM
To: dev@apex.apache.org
Subject: Re: Operator Node Affinity

Ilya,
Put all nodes on the node-balancer list, and only the ones that get the
operator-jvm will respond to load-balancer's status url. One place where
you have to tweek "do not depend on host/port of distributed OS" is the
port number. I believe the load-balancer will use is fixed. You could use a
proxy that periodically figures out the port,host and redirects, but then
you have an extra hardware hop in between (uptime issue?) that negates the
load-balancer play a little. You could do two-proxy servers solution.

Thks
Amol



E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com<http://www.datatorrent.com>  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Thu, Mar 2, 2017 at 8:59 AM, Ganelin, Ilya <Il...@capitalone.com>
wrote:

> Thanks – the solution I’m leaning towards is to deploy a load balancer
> with a list of the nodes in the cluster, once Apex spins up, the load
> balancer should be able to establish connections to the deployed operators
> and route data appropriately.
>
> - Ilya Ganelin
>
>
> On 3/2/17, 8:34 AM, "Amol Kekre" <am...@datatorrent.com> wrote:
>
>     Ilya,
>     As Thomas says, attaching a JVM to an operator is do-able, but is
> against
>     the norm in a distributed cluster. A distributed OS cannot guarantee a
>     node. It could be down or not have resources, .... ZK way or any other
> way
>     to discover post deployment is the way to go. I think a webservice call
>     through Stram to get the specifics will work too.
>
>     Thks
>     Amol
>
>
>
>     E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
>
>     www.datatorrent.com<http://www.datatorrent.com>  |  apex.apache.org
>
>     *Join us at Apex Big Data World-San Jose
>     <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
>     [image: http://www.apexbigdata.com/san-jose-register.html]
>     <http://www.apexbigdata.com/san-jose-register.html>
>
>     On Wed, Mar 1, 2017 at 8:16 PM, Thomas Weise <th...@apache.org> wrote:
>
>     > If I understand it correctly you want to run a server in an operator,
>     > discover its endpoint and push data to it? The preferred way of
> doing that
>     > would be to announce the endpoint through a discovery mechanism
> (such as
>     > ZooKeeper or a shared file) that the upstream entity can use to find
> the
>     > endpoint.
>     >
>     > If you are looking for a way to force deploy on a specific node,
> then have
>     > a look at the OperatorContext.LOCALITY_HOST attribute (and also
>     > AffinityRulesTest). AFAIK you can use a specific host name and the
>     > scheduler will make best effort to get a container on that host, but
> there
>     > isn't a guarantee. Generally, services running on the cluster
> shouldn't
>     > make assumptions about hosts and ports and use discovery instead.
>     >
>     > HTH,
>     > Thomas
>     >
>     >
>     > On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com
>     > >
>     > wrote:
>     >
>     > > Hello, all – is there any way to deploy a given operator to a
> specific
>     > > Node? E.g. if I’m trying to create a listener for a TCP socket
> that can
>     > > then pipe data to a DAG, is there any way for the location of that
>     > listener
>     > > to be deterministic so an upstream entity knows what to connect to?
>     > >
>     > >
>     > >
>     > > - Ilya Ganelin
>     > >
>     > > [image: id:image001.png@01D1F7A4.F3D42980]
>     > >
>     > > ------------------------------
>     > >
>     > > The information contained in this e-mail is confidential and/or
>     > > proprietary to Capital One and/or its affiliates and may only be
> used
>     > > solely in performance of work or services for Capital One. The
>     > information
>     > > transmitted herewith is intended only for use by the individual or
> entity
>     > > to which it is addressed. If the reader of this message is not the
>     > intended
>     > > recipient, you are hereby notified that any review, retransmission,
>     > > dissemination, distribution, copying or other use of, or taking of
> any
>     > > action in reliance upon this information is strictly prohibited.
> If you
>     > > have received this communication in error, please contact the
> sender and
>     > > delete the material from your computer.
>     > >
>     >
>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Operator Node Affinity

Posted by Amol Kekre <am...@datatorrent.com>.
Ilya,
Put all nodes on the node-balancer list, and only the ones that get the
operator-jvm will respond to load-balancer's status url. One place where
you have to tweek "do not depend on host/port of distributed OS" is the
port number. I believe the load-balancer will use is fixed. You could use a
proxy that periodically figures out the port,host and redirects, but then
you have an extra hardware hop in between (uptime issue?) that negates the
load-balancer play a little. You could do two-proxy servers solution.

Thks
Amol



E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Thu, Mar 2, 2017 at 8:59 AM, Ganelin, Ilya <Il...@capitalone.com>
wrote:

> Thanks – the solution I’m leaning towards is to deploy a load balancer
> with a list of the nodes in the cluster, once Apex spins up, the load
> balancer should be able to establish connections to the deployed operators
> and route data appropriately.
>
> - Ilya Ganelin
>
>
> On 3/2/17, 8:34 AM, "Amol Kekre" <am...@datatorrent.com> wrote:
>
>     Ilya,
>     As Thomas says, attaching a JVM to an operator is do-able, but is
> against
>     the norm in a distributed cluster. A distributed OS cannot guarantee a
>     node. It could be down or not have resources, .... ZK way or any other
> way
>     to discover post deployment is the way to go. I think a webservice call
>     through Stram to get the specifics will work too.
>
>     Thks
>     Amol
>
>
>
>     E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
>
>     www.datatorrent.com  |  apex.apache.org
>
>     *Join us at Apex Big Data World-San Jose
>     <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
>     [image: http://www.apexbigdata.com/san-jose-register.html]
>     <http://www.apexbigdata.com/san-jose-register.html>
>
>     On Wed, Mar 1, 2017 at 8:16 PM, Thomas Weise <th...@apache.org> wrote:
>
>     > If I understand it correctly you want to run a server in an operator,
>     > discover its endpoint and push data to it? The preferred way of
> doing that
>     > would be to announce the endpoint through a discovery mechanism
> (such as
>     > ZooKeeper or a shared file) that the upstream entity can use to find
> the
>     > endpoint.
>     >
>     > If you are looking for a way to force deploy on a specific node,
> then have
>     > a look at the OperatorContext.LOCALITY_HOST attribute (and also
>     > AffinityRulesTest). AFAIK you can use a specific host name and the
>     > scheduler will make best effort to get a container on that host, but
> there
>     > isn't a guarantee. Generally, services running on the cluster
> shouldn't
>     > make assumptions about hosts and ports and use discovery instead.
>     >
>     > HTH,
>     > Thomas
>     >
>     >
>     > On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com
>     > >
>     > wrote:
>     >
>     > > Hello, all – is there any way to deploy a given operator to a
> specific
>     > > Node? E.g. if I’m trying to create a listener for a TCP socket
> that can
>     > > then pipe data to a DAG, is there any way for the location of that
>     > listener
>     > > to be deterministic so an upstream entity knows what to connect to?
>     > >
>     > >
>     > >
>     > > - Ilya Ganelin
>     > >
>     > > [image: id:image001.png@01D1F7A4.F3D42980]
>     > >
>     > > ------------------------------
>     > >
>     > > The information contained in this e-mail is confidential and/or
>     > > proprietary to Capital One and/or its affiliates and may only be
> used
>     > > solely in performance of work or services for Capital One. The
>     > information
>     > > transmitted herewith is intended only for use by the individual or
> entity
>     > > to which it is addressed. If the reader of this message is not the
>     > intended
>     > > recipient, you are hereby notified that any review, retransmission,
>     > > dissemination, distribution, copying or other use of, or taking of
> any
>     > > action in reliance upon this information is strictly prohibited.
> If you
>     > > have received this communication in error, please contact the
> sender and
>     > > delete the material from your computer.
>     > >
>     >
>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Operator Node Affinity

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
Thanks – the solution I’m leaning towards is to deploy a load balancer with a list of the nodes in the cluster, once Apex spins up, the load balancer should be able to establish connections to the deployed operators and route data appropriately. 

- Ilya Ganelin


On 3/2/17, 8:34 AM, "Amol Kekre" <am...@datatorrent.com> wrote:

    Ilya,
    As Thomas says, attaching a JVM to an operator is do-able, but is against
    the norm in a distributed cluster. A distributed OS cannot guarantee a
    node. It could be down or not have resources, .... ZK way or any other way
    to discover post deployment is the way to go. I think a webservice call
    through Stram to get the specifics will work too.
    
    Thks
    Amol
    
    
    
    E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
    
    www.datatorrent.com  |  apex.apache.org
    
    *Join us at Apex Big Data World-San Jose
    <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
    [image: http://www.apexbigdata.com/san-jose-register.html]
    <http://www.apexbigdata.com/san-jose-register.html>
    
    On Wed, Mar 1, 2017 at 8:16 PM, Thomas Weise <th...@apache.org> wrote:
    
    > If I understand it correctly you want to run a server in an operator,
    > discover its endpoint and push data to it? The preferred way of doing that
    > would be to announce the endpoint through a discovery mechanism (such as
    > ZooKeeper or a shared file) that the upstream entity can use to find the
    > endpoint.
    >
    > If you are looking for a way to force deploy on a specific node, then have
    > a look at the OperatorContext.LOCALITY_HOST attribute (and also
    > AffinityRulesTest). AFAIK you can use a specific host name and the
    > scheduler will make best effort to get a container on that host, but there
    > isn't a guarantee. Generally, services running on the cluster shouldn't
    > make assumptions about hosts and ports and use discovery instead.
    >
    > HTH,
    > Thomas
    >
    >
    > On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com
    > >
    > wrote:
    >
    > > Hello, all – is there any way to deploy a given operator to a specific
    > > Node? E.g. if I’m trying to create a listener for a TCP socket that can
    > > then pipe data to a DAG, is there any way for the location of that
    > listener
    > > to be deterministic so an upstream entity knows what to connect to?
    > >
    > >
    > >
    > > - Ilya Ganelin
    > >
    > > [image: id:image001.png@01D1F7A4.F3D42980]
    > >
    > > ------------------------------
    > >
    > > The information contained in this e-mail is confidential and/or
    > > proprietary to Capital One and/or its affiliates and may only be used
    > > solely in performance of work or services for Capital One. The
    > information
    > > transmitted herewith is intended only for use by the individual or entity
    > > to which it is addressed. If the reader of this message is not the
    > intended
    > > recipient, you are hereby notified that any review, retransmission,
    > > dissemination, distribution, copying or other use of, or taking of any
    > > action in reliance upon this information is strictly prohibited. If you
    > > have received this communication in error, please contact the sender and
    > > delete the material from your computer.
    > >
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Operator Node Affinity

Posted by Amol Kekre <am...@datatorrent.com>.
Ilya,
As Thomas says, attaching a JVM to an operator is do-able, but is against
the norm in a distributed cluster. A distributed OS cannot guarantee a
node. It could be down or not have resources, .... ZK way or any other way
to discover post deployment is the way to go. I think a webservice call
through Stram to get the specifics will work too.

Thks
Amol



E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Wed, Mar 1, 2017 at 8:16 PM, Thomas Weise <th...@apache.org> wrote:

> If I understand it correctly you want to run a server in an operator,
> discover its endpoint and push data to it? The preferred way of doing that
> would be to announce the endpoint through a discovery mechanism (such as
> ZooKeeper or a shared file) that the upstream entity can use to find the
> endpoint.
>
> If you are looking for a way to force deploy on a specific node, then have
> a look at the OperatorContext.LOCALITY_HOST attribute (and also
> AffinityRulesTest). AFAIK you can use a specific host name and the
> scheduler will make best effort to get a container on that host, but there
> isn't a guarantee. Generally, services running on the cluster shouldn't
> make assumptions about hosts and ports and use discovery instead.
>
> HTH,
> Thomas
>
>
> On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com
> >
> wrote:
>
> > Hello, all – is there any way to deploy a given operator to a specific
> > Node? E.g. if I’m trying to create a listener for a TCP socket that can
> > then pipe data to a DAG, is there any way for the location of that
> listener
> > to be deterministic so an upstream entity knows what to connect to?
> >
> >
> >
> > - Ilya Ganelin
> >
> > [image: id:image001.png@01D1F7A4.F3D42980]
> >
> > ------------------------------
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>

Re: Operator Node Affinity

Posted by Thomas Weise <th...@apache.org>.
If I understand it correctly you want to run a server in an operator,
discover its endpoint and push data to it? The preferred way of doing that
would be to announce the endpoint through a discovery mechanism (such as
ZooKeeper or a shared file) that the upstream entity can use to find the
endpoint.

If you are looking for a way to force deploy on a specific node, then have
a look at the OperatorContext.LOCALITY_HOST attribute (and also
AffinityRulesTest). AFAIK you can use a specific host name and the
scheduler will make best effort to get a container on that host, but there
isn't a guarantee. Generally, services running on the cluster shouldn't
make assumptions about hosts and ports and use discovery instead.

HTH,
Thomas


On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya <Il...@capitalone.com>
wrote:

> Hello, all – is there any way to deploy a given operator to a specific
> Node? E.g. if I’m trying to create a listener for a TCP socket that can
> then pipe data to a DAG, is there any way for the location of that listener
> to be deterministic so an upstream entity knows what to connect to?
>
>
>
> - Ilya Ganelin
>
> [image: id:image001.png@01D1F7A4.F3D42980]
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>