You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Tommy Becker <to...@tivo.com> on 2015/07/30 20:35:06 UTC

Coordinator URL always 127.0.0.1

We are testing some jobs on a YARN grid and noticed they are often not starting up properly due to being unable to connect to the job coordinator. After some investigation it seems as if the jobs are always getting a coordinator URL of http://127.0.0.1:<port>  But my understanding is that the coordinator runs only in the AM, so I'd expect these URLs to more often than not be to some other machine.  Looking at the code however, I'm not sure how that would ever happen since the URL for the coordinator always comes from InetAddress.getLocalHost().getHostAddress() in org.apache.samza.coordinator.server.HttpServer#getUrl

Am I off base here?  Because I don't see how this is ever going to work in scenarios where the AM is on a different node than the containers.

--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com>
tobecker@tivo.com<ma...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

Re: Coordinator URL always 127.0.0.1

Posted by Yan Fang <ya...@gmail.com>.
Created https://issues.apache.org/jira/browse/SAMZA-748

Fang, Yan
yanfang724@gmail.com

On Thu, Jul 30, 2015 at 7:17 PM, Yi Pan <ni...@gmail.com> wrote:

> +1 on the fix in 0.10.0. It should be an easy one.
>
> On Thu, Jul 30, 2015 at 7:08 PM, Yan Fang <ya...@gmail.com> wrote:
>
> > Hi Thommy,
> >
> > {quote}
> > Because I don't see how this is ever going to work in scenarios where the
> > AM is on a different node than the containers.
> > {quote}
> >
> > -- I do not quite understand this part. AM essentially is running in a
> > container as well. And the http server is brought up in the same
> container.
> >
> > {quote}
> > even if we can't get a better address for the AM from YARN, we could at
> > least filter the addresses we get back from the JVM to exclude loopbacks.
> > {quote}
> >
> > -- You are right. InetAddress.getLocalHost() gives back loopback address
> > sometimes. We should filter this out. Just googling one possible solution
> > <http://www.coderanch.com/t/491883/java/java/IP> .
> >
> > + @Yi, @Navina,
> >
> > Also, I think this fix should go to the 0.10.0 release.
> >
> > What do you guys think?
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang724@gmail.com
> >
> > On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang <ya...@gmail.com> wrote:
> >
> > > Just one point to add:
> > >
> > > {quote}
> > > AM gets notified of container status from the RM.
> > > {quote}
> > >
> > > I think this is not 100% correct. AM can communicate with NM through
> > > NMClientAsync
> > > <
> >
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html
> >
> > to
> > > get container status, though Samza does not implement the
> > CallbackHandler.
> > >
> > > Thanks,
> > >
> > > Fang, Yan
> > > yanfang724@gmail.com
> > >
> > > On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
> > > nramesh@linkedin.com.invalid> wrote:
> > >
> > >> The NM (and hence, by extension the container) heartbeats to the RM,
> not
> > >> the AM. AM gets notified of container status from the RM.
> > >> The AM starts / stops /releases a container process by communicating
> to
> > >> the
> > >> NM.
> > >>
> > >> Navina
> > >>
> > >>
> > >> On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com>
> > wrote:
> > >>
> > >> > Ok, I thought there was some communication from the container to the
> > AM,
> > >> > it sounds like you're saying it's in the other direction only?
> Don't
> > >> > containers heartbeat to the AM?  Regardless, even if we can't get a
> > >> better
> > >> > address for the AM from YARN, we could at least filter the addresses
> > we
> > >> get
> > >> > back from the JVM to exclude loopbacks.
> > >> >
> > >> > -Tommy
> > >> > ________________________________________
> > >> > From: Navina Ramesh [nramesh@linkedin.com.INVALID]
> > >> > Sent: Thursday, July 30, 2015 8:40 PM
> > >> > To: dev@samza.apache.org
> > >> > Subject: Re: Coordinator URL always 127.0.0.1
> > >> >
> > >> > Hi Tommy,
> > >> > Yi is right. Container start is coordinated by the AppMaster using
> an
> > >> > NMClient. Container host name and port is provided by the RM during
> > >> > allocation.
> > >> > In Yarn (at least, afaik), when the node joins a cluster, the NM
> > >> registers
> > >> > itself with the RM. So, the NM might still be using
> > >> > getLocalhost.getAddress().
> > >> >
> > >> > I don't know of any other way to programmatically fetch the
> machine's
> > >> > hostname (apart from some hacky shell commands).
> > >> >
> > >> > Cheers,
> > >> > Navina
> > >> >
> > >> > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com>
> wrote:
> > >> >
> > >> > > Hi, Tommy,
> > >> > >
> > >> > > Yeah, I agree that the current implementation is not bullet-proof
> to
> > >> any
> > >> > > different networking configuration on the host. As for the AM <->
> > >> > container
> > >> > > communication, if I am not mistaken, it is through the NMClient
> and
> > >> the
> > >> > > node HTTP address is wrapped within the Container object returned
> > from
> > >> > RM.
> > >> > > I am not very familiar with that part of source code. Navina may
> be
> > >> able
> > >> > to
> > >> > > help more here.
> > >> > >
> > >> > > -Yi
> > >> > >
> > >> > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <tobecker@tivo.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hi Yi,
> > >> > > > Thanks a lot for your reply.  I don't doubt we can get it to
> work
> > by
> > >> > > > mucking with the networking configuration, but to me this feels
> > >> like a
> > >> > > > workaround, not a solution.
> > >> > InetAddress.getLocalHost().getHostAddress()
> > >> > > is
> > >> > > > not a reliable way of obtaining an IP that other machines can
> > >> connect
> > >> > to.
> > >> > > > Just today I tested on several Linux distros and it did not work
> > on
> > >> any
> > >> > > of
> > >> > > > them.  Can we do something more robust here?  How does the
> > container
> > >> > > > communicate status to the AM?
> > >> > > >
> > >> > > > -Tommy
> > >> > > >
> > >> > > > ________________________________________
> > >> > > > From: Yi Pan [nickpan47@gmail.com]
> > >> > > > Sent: Thursday, July 30, 2015 6:48 PM
> > >> > > > To: dev@samza.apache.org
> > >> > > > Subject: Re: Coordinator URL always 127.0.0.1
> > >> > > >
> > >> > > > Hi, Tommy,
> > >> > > >
> > >> > > > I think that it might be a commonly asked question regarding to
> > >> > multiple
> > >> > > > IPs on a single host. A common trick w/o changing code is
> (copied
> > >> from
> > >> > > SO:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> > >> > > > )
> > >> > > >
> > >> > > > {code}
> > >> > > >
> > >> > > >    1.
> > >> > > >
> > >> > > >    Find your host name. Type: hostname. For example, you find
> your
> > >> > > hostname
> > >> > > >    is mycomputer.xzy.com
> > >> > > >    2.
> > >> > > >
> > >> > > >    Put your host name in your hosts file. /etc/hosts . Such as
> > >> > > >
> > >> > > >    10.50.16.136 mycomputer.xzy.com
> > >> > > >
> > >> > > >
> > >> > > > {code}
> > >> > > >
> > >> > > > -Yi
> > >> > > >
> > >> > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <
> tobecker@tivo.com
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > > We are testing some jobs on a YARN grid and noticed they are
> > often
> > >> > not
> > >> > > > > starting up properly due to being unable to connect to the job
> > >> > > > coordinator.
> > >> > > > > After some investigation it seems as if the jobs are always
> > >> getting a
> > >> > > > > coordinator URL of http://127.0.0.1:<port>  But my
> > understanding
> > >> is
> > >> > > that
> > >> > > > > the coordinator runs only in the AM, so I'd expect these URLs
> to
> > >> more
> > >> > > > often
> > >> > > > > than not be to some other machine.  Looking at the code
> however,
> > >> I'm
> > >> > > not
> > >> > > > > sure how that would ever happen since the URL for the
> > coordinator
> > >> > > always
> > >> > > > > comes from InetAddress.getLocalHost().getHostAddress() in
> > >> > > > > org.apache.samza.coordinator.server.HttpServer#getUrl
> > >> > > > >
> > >> > > > > Am I off base here?  Because I don't see how this is ever
> going
> > to
> > >> > work
> > >> > > > in
> > >> > > > > scenarios where the AM is on a different node than the
> > containers.
> > >> > > > >
> > >> > > > > --
> > >> > > > > Tommy Becker
> > >> > > > > Senior Software Engineer
> > >> > > > >
> > >> > > > > Digitalsmiths
> > >> > > > > A TiVo Company
> > >> > > > >
> > >> > > > > www.digitalsmiths.com<http://www.digitalsmiths.com>
> > >> > > > > tobecker@tivo.com<ma...@tivo.com>
> > >> > > > >
> > >> > > > > ________________________________
> > >> > > > >
> > >> > > > > This email and any attachments may contain confidential and
> > >> > privileged
> > >> > > > > material for the sole use of the intended recipient. Any
> review,
> > >> > > copying,
> > >> > > > > or distribution of this email (or any attachments) by others
> is
> > >> > > > prohibited.
> > >> > > > > If you are not the intended recipient, please contact the
> sender
> > >> > > > > immediately and permanently delete this email and any
> > >> attachments. No
> > >> > > > > employee or agent of TiVo Inc. is authorized to conclude any
> > >> binding
> > >> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements
> > with
> > >> > TiVo
> > >> > > > > Inc. may only be made by a signed written agreement.
> > >> > > > >
> > >> > > >
> > >> > > > ________________________________
> > >> > > >
> > >> > > > This email and any attachments may contain confidential and
> > >> privileged
> > >> > > > material for the sole use of the intended recipient. Any review,
> > >> > copying,
> > >> > > > or distribution of this email (or any attachments) by others is
> > >> > > prohibited.
> > >> > > > If you are not the intended recipient, please contact the sender
> > >> > > > immediately and permanently delete this email and any
> attachments.
> > >> No
> > >> > > > employee or agent of TiVo Inc. is authorized to conclude any
> > binding
> > >> > > > agreement on behalf of TiVo Inc. by email. Binding agreements
> with
> > >> TiVo
> > >> > > > Inc. may only be made by a signed written agreement.
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Navina R.
> > >> >
> > >> > ________________________________
> > >> >
> > >> > This email and any attachments may contain confidential and
> privileged
> > >> > material for the sole use of the intended recipient. Any review,
> > >> copying,
> > >> > or distribution of this email (or any attachments) by others is
> > >> prohibited.
> > >> > If you are not the intended recipient, please contact the sender
> > >> > immediately and permanently delete this email and any attachments.
> No
> > >> > employee or agent of TiVo Inc. is authorized to conclude any binding
> > >> > agreement on behalf of TiVo Inc. by email. Binding agreements with
> > TiVo
> > >> > Inc. may only be made by a signed written agreement.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Navina R.
> > >>
> > >
> > >
> >
>

Re: Coordinator URL always 127.0.0.1

Posted by Yi Pan <ni...@gmail.com>.
+1 on the fix in 0.10.0. It should be an easy one.

On Thu, Jul 30, 2015 at 7:08 PM, Yan Fang <ya...@gmail.com> wrote:

> Hi Thommy,
>
> {quote}
> Because I don't see how this is ever going to work in scenarios where the
> AM is on a different node than the containers.
> {quote}
>
> -- I do not quite understand this part. AM essentially is running in a
> container as well. And the http server is brought up in the same container.
>
> {quote}
> even if we can't get a better address for the AM from YARN, we could at
> least filter the addresses we get back from the JVM to exclude loopbacks.
> {quote}
>
> -- You are right. InetAddress.getLocalHost() gives back loopback address
> sometimes. We should filter this out. Just googling one possible solution
> <http://www.coderanch.com/t/491883/java/java/IP> .
>
> + @Yi, @Navina,
>
> Also, I think this fix should go to the 0.10.0 release.
>
> What do you guys think?
>
> Thanks,
>
> Fang, Yan
> yanfang724@gmail.com
>
> On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang <ya...@gmail.com> wrote:
>
> > Just one point to add:
> >
> > {quote}
> > AM gets notified of container status from the RM.
> > {quote}
> >
> > I think this is not 100% correct. AM can communicate with NM through
> > NMClientAsync
> > <
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html>
> to
> > get container status, though Samza does not implement the
> CallbackHandler.
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang724@gmail.com
> >
> > On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
> > nramesh@linkedin.com.invalid> wrote:
> >
> >> The NM (and hence, by extension the container) heartbeats to the RM, not
> >> the AM. AM gets notified of container status from the RM.
> >> The AM starts / stops /releases a container process by communicating to
> >> the
> >> NM.
> >>
> >> Navina
> >>
> >>
> >> On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com>
> wrote:
> >>
> >> > Ok, I thought there was some communication from the container to the
> AM,
> >> > it sounds like you're saying it's in the other direction only?  Don't
> >> > containers heartbeat to the AM?  Regardless, even if we can't get a
> >> better
> >> > address for the AM from YARN, we could at least filter the addresses
> we
> >> get
> >> > back from the JVM to exclude loopbacks.
> >> >
> >> > -Tommy
> >> > ________________________________________
> >> > From: Navina Ramesh [nramesh@linkedin.com.INVALID]
> >> > Sent: Thursday, July 30, 2015 8:40 PM
> >> > To: dev@samza.apache.org
> >> > Subject: Re: Coordinator URL always 127.0.0.1
> >> >
> >> > Hi Tommy,
> >> > Yi is right. Container start is coordinated by the AppMaster using an
> >> > NMClient. Container host name and port is provided by the RM during
> >> > allocation.
> >> > In Yarn (at least, afaik), when the node joins a cluster, the NM
> >> registers
> >> > itself with the RM. So, the NM might still be using
> >> > getLocalhost.getAddress().
> >> >
> >> > I don't know of any other way to programmatically fetch the machine's
> >> > hostname (apart from some hacky shell commands).
> >> >
> >> > Cheers,
> >> > Navina
> >> >
> >> > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:
> >> >
> >> > > Hi, Tommy,
> >> > >
> >> > > Yeah, I agree that the current implementation is not bullet-proof to
> >> any
> >> > > different networking configuration on the host. As for the AM <->
> >> > container
> >> > > communication, if I am not mistaken, it is through the NMClient and
> >> the
> >> > > node HTTP address is wrapped within the Container object returned
> from
> >> > RM.
> >> > > I am not very familiar with that part of source code. Navina may be
> >> able
> >> > to
> >> > > help more here.
> >> > >
> >> > > -Yi
> >> > >
> >> > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com>
> >> > wrote:
> >> > >
> >> > > > Hi Yi,
> >> > > > Thanks a lot for your reply.  I don't doubt we can get it to work
> by
> >> > > > mucking with the networking configuration, but to me this feels
> >> like a
> >> > > > workaround, not a solution.
> >> > InetAddress.getLocalHost().getHostAddress()
> >> > > is
> >> > > > not a reliable way of obtaining an IP that other machines can
> >> connect
> >> > to.
> >> > > > Just today I tested on several Linux distros and it did not work
> on
> >> any
> >> > > of
> >> > > > them.  Can we do something more robust here?  How does the
> container
> >> > > > communicate status to the AM?
> >> > > >
> >> > > > -Tommy
> >> > > >
> >> > > > ________________________________________
> >> > > > From: Yi Pan [nickpan47@gmail.com]
> >> > > > Sent: Thursday, July 30, 2015 6:48 PM
> >> > > > To: dev@samza.apache.org
> >> > > > Subject: Re: Coordinator URL always 127.0.0.1
> >> > > >
> >> > > > Hi, Tommy,
> >> > > >
> >> > > > I think that it might be a commonly asked question regarding to
> >> > multiple
> >> > > > IPs on a single host. A common trick w/o changing code is (copied
> >> from
> >> > > SO:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> >> > > > )
> >> > > >
> >> > > > {code}
> >> > > >
> >> > > >    1.
> >> > > >
> >> > > >    Find your host name. Type: hostname. For example, you find your
> >> > > hostname
> >> > > >    is mycomputer.xzy.com
> >> > > >    2.
> >> > > >
> >> > > >    Put your host name in your hosts file. /etc/hosts . Such as
> >> > > >
> >> > > >    10.50.16.136 mycomputer.xzy.com
> >> > > >
> >> > > >
> >> > > > {code}
> >> > > >
> >> > > > -Yi
> >> > > >
> >> > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <tobecker@tivo.com
> >
> >> > > wrote:
> >> > > >
> >> > > > > We are testing some jobs on a YARN grid and noticed they are
> often
> >> > not
> >> > > > > starting up properly due to being unable to connect to the job
> >> > > > coordinator.
> >> > > > > After some investigation it seems as if the jobs are always
> >> getting a
> >> > > > > coordinator URL of http://127.0.0.1:<port>  But my
> understanding
> >> is
> >> > > that
> >> > > > > the coordinator runs only in the AM, so I'd expect these URLs to
> >> more
> >> > > > often
> >> > > > > than not be to some other machine.  Looking at the code however,
> >> I'm
> >> > > not
> >> > > > > sure how that would ever happen since the URL for the
> coordinator
> >> > > always
> >> > > > > comes from InetAddress.getLocalHost().getHostAddress() in
> >> > > > > org.apache.samza.coordinator.server.HttpServer#getUrl
> >> > > > >
> >> > > > > Am I off base here?  Because I don't see how this is ever going
> to
> >> > work
> >> > > > in
> >> > > > > scenarios where the AM is on a different node than the
> containers.
> >> > > > >
> >> > > > > --
> >> > > > > Tommy Becker
> >> > > > > Senior Software Engineer
> >> > > > >
> >> > > > > Digitalsmiths
> >> > > > > A TiVo Company
> >> > > > >
> >> > > > > www.digitalsmiths.com<http://www.digitalsmiths.com>
> >> > > > > tobecker@tivo.com<ma...@tivo.com>
> >> > > > >
> >> > > > > ________________________________
> >> > > > >
> >> > > > > This email and any attachments may contain confidential and
> >> > privileged
> >> > > > > material for the sole use of the intended recipient. Any review,
> >> > > copying,
> >> > > > > or distribution of this email (or any attachments) by others is
> >> > > > prohibited.
> >> > > > > If you are not the intended recipient, please contact the sender
> >> > > > > immediately and permanently delete this email and any
> >> attachments. No
> >> > > > > employee or agent of TiVo Inc. is authorized to conclude any
> >> binding
> >> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements
> with
> >> > TiVo
> >> > > > > Inc. may only be made by a signed written agreement.
> >> > > > >
> >> > > >
> >> > > > ________________________________
> >> > > >
> >> > > > This email and any attachments may contain confidential and
> >> privileged
> >> > > > material for the sole use of the intended recipient. Any review,
> >> > copying,
> >> > > > or distribution of this email (or any attachments) by others is
> >> > > prohibited.
> >> > > > If you are not the intended recipient, please contact the sender
> >> > > > immediately and permanently delete this email and any attachments.
> >> No
> >> > > > employee or agent of TiVo Inc. is authorized to conclude any
> binding
> >> > > > agreement on behalf of TiVo Inc. by email. Binding agreements with
> >> TiVo
> >> > > > Inc. may only be made by a signed written agreement.
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Navina R.
> >> >
> >> > ________________________________
> >> >
> >> > This email and any attachments may contain confidential and privileged
> >> > material for the sole use of the intended recipient. Any review,
> >> copying,
> >> > or distribution of this email (or any attachments) by others is
> >> prohibited.
> >> > If you are not the intended recipient, please contact the sender
> >> > immediately and permanently delete this email and any attachments. No
> >> > employee or agent of TiVo Inc. is authorized to conclude any binding
> >> > agreement on behalf of TiVo Inc. by email. Binding agreements with
> TiVo
> >> > Inc. may only be made by a signed written agreement.
> >> >
> >>
> >>
> >>
> >> --
> >> Navina R.
> >>
> >
> >
>

Re: Coordinator URL always 127.0.0.1

Posted by Tommy Becker <to...@tivo.com>.
I realize the JC runs in the AM YARN container.  The SamzaContainers run in their own YARN containers, which may or may not be on the same node as the AM in the YARN grid.  My point is simply that when they are not, it does not work.  This is because the AM is telling the SamzaContainers to fetch the config from 127.0.0.1.

-Tommy

On 07/31/2015 02:31 PM, Navina Ramesh wrote:

Hi Tommy,
I am not sure what you mean below:
{quote}
this context by container I meant the SamzaContainer.  What we are seeing
is that jobs only start when YARN happens to place the AM and
SamzaContainer(s) on the same node.Which is increasingly unlikely as you
increase container count for your job and/or expand your YARN grid.
{quote}
Any YARN application finds a yarn container to run the AM. In our case, JC
is running in the same container as the AM. So, I don't understand why this
will cause an issue on cluster expansion. I can understand your concern if
JC and AM are running on 2 separate containers.

Please correct me if I have misunderstood your statement.

Thanks!
Navina

On Fri, Jul 31, 2015 at 6:05 AM, Tommy Becker <to...@tivo.com> wrote:



Hey Yan,

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container

Sorry, the term "container" is overloaded.  In this context by container I
meant the SamzaContainer.  What we are seeing is that jobs only start when
YARN happens to place the AM and SamzaContainer(s) on the same node.  Which
is increasingly unlikely as you increase container count for your job
and/or expand your YARN grid.

-Tommy

On 07/30/2015 10:08 PM, Yan Fang wrote:

Hi Thommy,

{quote}
Because I don't see how this is ever going to work in scenarios where the
AM is on a different node than the containers.
{quote}

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container.

{quote}
even if we can't get a better address for the AM from YARN, we could at
least filter the addresses we get back from the JVM to exclude loopbacks.
{quote}

-- You are right. InetAddress.getLocalHost() gives back loopback address
sometimes. We should filter this out. Just googling one possible solution
<http://www.coderanch.com/t/491883/java/java/IP><http://www.coderanch.com/t/491883/java/java/IP><
http://www.coderanch.com/t/491883/java/java/IP><http://www.coderanch.com/t/491883/java/java/IP> .

+ @Yi, @Navina,

Also, I think this fix should go to the 0.10.0 release.

What do you guys think?

Thanks,

Fang, Yan
yanfang724@gmail.com<ma...@gmail.com>

On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang <ya...@gmail.com><mailto:
yanfang724@gmail.com><ma...@gmail.com> wrote:



Just one point to add:

{quote}
AM gets notified of container status from the RM.
{quote}

I think this is not 100% correct. AM can communicate with NM through
NMClientAsync
<
https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html


<


https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html>
to
get container status, though Samza does not implement the CallbackHandler.

Thanks,

Fang, Yan
yanfang724@gmail.com<ma...@gmail.com>

On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
nramesh@linkedin.com.invalid<ma...@linkedin.com.invalid>> wrote:



The NM (and hence, by extension the container) heartbeats to the RM, not
the AM. AM gets notified of container status from the RM.
The AM starts / stops /releases a container process by communicating to
the
NM.

Navina


On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com><mailto:
tobecker@tivo.com><ma...@tivo.com> wrote:



Ok, I thought there was some communication from the container to the AM,
it sounds like you're saying it's in the other direction only?  Don't
containers heartbeat to the AM?  Regardless, even if we can't get a


better


address for the AM from YARN, we could at least filter the addresses we


get


back from the JVM to exclude loopbacks.

-Tommy
________________________________________
From: Navina Ramesh [nramesh@linkedin.com.INVALID<ma...@linkedin.com.INVALID><mailto:
nramesh@linkedin.com.INVALID><ma...@linkedin.com.INVALID>]
Sent: Thursday, July 30, 2015 8:40 PM
To: dev@samza.apache.org<ma...@samza.apache.org>
Subject: Re: Coordinator URL always 127.0.0.1

Hi Tommy,
Yi is right. Container start is coordinated by the AppMaster using an
NMClient. Container host name and port is provided by the RM during
allocation.
In Yarn (at least, afaik), when the node joins a cluster, the NM


registers


itself with the RM. So, the NM might still be using
getLocalhost.getAddress().

I don't know of any other way to programmatically fetch the machine's
hostname (apart from some hacky shell commands).

Cheers,
Navina

On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com><mailto:
nickpan47@gmail.com><ma...@gmail.com> wrote:



Hi, Tommy,

Yeah, I agree that the current implementation is not bullet-proof to


any


different networking configuration on the host. As for the AM <->


container


communication, if I am not mistaken, it is through the NMClient and


the


node HTTP address is wrapped within the Container object returned from


RM.


I am not very familiar with that part of source code. Navina may be


able


to


help more here.

-Yi

On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com><mailto:
tobecker@tivo.com><ma...@tivo.com>


wrote:





Hi Yi,
Thanks a lot for your reply.  I don't doubt we can get it to work by
mucking with the networking configuration, but to me this feels


like a


workaround, not a solution.


InetAddress.getLocalHost().getHostAddress()


is


not a reliable way of obtaining an IP that other machines can


connect


to.


Just today I tested on several Linux distros and it did not work on


any


of


them.  Can we do something more robust here?  How does the container
communicate status to the AM?

-Tommy

________________________________________
From: Yi Pan [nickpan47@gmail.com<ma...@gmail.com>]
Sent: Thursday, July 30, 2015 6:48 PM
To: dev@samza.apache.org<ma...@samza.apache.org>
Subject: Re: Coordinator URL always 127.0.0.1

Hi, Tommy,

I think that it might be a commonly asked question regarding to


multiple


IPs on a single host. A common trick w/o changing code is (copied


from


SO:













http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip


)

{code}

  1.

  Find your host name. Type: hostname. For example, you find your


hostname


  is mycomputer.xzy.com
  2.

  Put your host name in your hosts file. /etc/hosts . Such as

  10.50.16.136 mycomputer.xzy.com


{code}

-Yi

On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com><mailto:
tobecker@tivo.com><ma...@tivo.com>



wrote:





We are testing some jobs on a YARN grid and noticed they are often


not


starting up properly due to being unable to connect to the job


coordinator.


After some investigation it seems as if the jobs are always


getting a


coordinator URL of http://127.0.0.1:<port>  But my understanding


is


that


the coordinator runs only in the AM, so I'd expect these URLs to


more


often


than not be to some other machine.  Looking at the code however,


I'm


not


sure how that would ever happen since the URL for the coordinator


always


comes from InetAddress.getLocalHost().getHostAddress() in
org.apache.samza.coordinator.server.HttpServer#getUrl

Am I off base here?  Because I don't see how this is ever going to


work


in


scenarios where the AM is on a different node than the containers.

--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com><http://www.digitalsmiths.com><http://www.digitalsmiths.com><
http://www.digitalsmiths.com><http://www.digitalsmiths.com><http://www.digitalsmiths.com><http://www.digitalsmiths.com>
tobecker@tivo.com<ma...@tivo.com><mailto:tobecker@tivo.com


<ma...@tivo.com>




________________________________

This email and any attachments may contain confidential and


privileged


material for the sole use of the intended recipient. Any review,


copying,


or distribution of this email (or any attachments) by others is


prohibited.


If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any


attachments. No


employee or agent of TiVo Inc. is authorized to conclude any


binding


agreement on behalf of TiVo Inc. by email. Binding agreements with


TiVo


Inc. may only be made by a signed written agreement.




________________________________

This email and any attachments may contain confidential and


privileged


material for the sole use of the intended recipient. Any review,


copying,


or distribution of this email (or any attachments) by others is


prohibited.


If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments.


No


employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with


TiVo


Inc. may only be made by a signed written agreement.









--
Navina R.

________________________________

This email and any attachments may contain confidential and privileged
material for the sole use of the intended recipient. Any review,


copying,


or distribution of this email (or any attachments) by others is


prohibited.


If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments. No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
Inc. may only be made by a signed written agreement.






--
Navina R.










--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com><http://www.digitalsmiths.com><http://www.digitalsmiths.com>
tobecker@tivo.com<ma...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged
material for the sole use of the intended recipient. Any review, copying,
or distribution of this email (or any attachments) by others is prohibited.
If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments. No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
Inc. may only be made by a signed written agreement.








--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com>
tobecker@tivo.com<ma...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

Re: Coordinator URL always 127.0.0.1

Posted by Navina Ramesh <nr...@linkedin.com.INVALID>.
Hi Tommy,
I am not sure what you mean below:
{quote}
this context by container I meant the SamzaContainer.  What we are seeing
is that jobs only start when YARN happens to place the AM and
SamzaContainer(s) on the same node.Which is increasingly unlikely as you
increase container count for your job and/or expand your YARN grid.
{quote}
Any YARN application finds a yarn container to run the AM. In our case, JC
is running in the same container as the AM. So, I don't understand why this
will cause an issue on cluster expansion. I can understand your concern if
JC and AM are running on 2 separate containers.

Please correct me if I have misunderstood your statement.

Thanks!
Navina

On Fri, Jul 31, 2015 at 6:05 AM, Tommy Becker <to...@tivo.com> wrote:

> Hey Yan,
>
> -- I do not quite understand this part. AM essentially is running in a
> container as well. And the http server is brought up in the same container
>
> Sorry, the term "container" is overloaded.  In this context by container I
> meant the SamzaContainer.  What we are seeing is that jobs only start when
> YARN happens to place the AM and SamzaContainer(s) on the same node.  Which
> is increasingly unlikely as you increase container count for your job
> and/or expand your YARN grid.
>
> -Tommy
>
> On 07/30/2015 10:08 PM, Yan Fang wrote:
>
> Hi Thommy,
>
> {quote}
> Because I don't see how this is ever going to work in scenarios where the
> AM is on a different node than the containers.
> {quote}
>
> -- I do not quite understand this part. AM essentially is running in a
> container as well. And the http server is brought up in the same container.
>
> {quote}
> even if we can't get a better address for the AM from YARN, we could at
> least filter the addresses we get back from the JVM to exclude loopbacks.
> {quote}
>
> -- You are right. InetAddress.getLocalHost() gives back loopback address
> sometimes. We should filter this out. Just googling one possible solution
> <http://www.coderanch.com/t/491883/java/java/IP><
> http://www.coderanch.com/t/491883/java/java/IP> .
>
> + @Yi, @Navina,
>
> Also, I think this fix should go to the 0.10.0 release.
>
> What do you guys think?
>
> Thanks,
>
> Fang, Yan
> yanfang724@gmail.com<ma...@gmail.com>
>
> On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang <ya...@gmail.com><mailto:
> yanfang724@gmail.com> wrote:
>
>
>
> Just one point to add:
>
> {quote}
> AM gets notified of container status from the RM.
> {quote}
>
> I think this is not 100% correct. AM can communicate with NM through
> NMClientAsync
> <
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html
> ><
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html>
> to
> get container status, though Samza does not implement the CallbackHandler.
>
> Thanks,
>
> Fang, Yan
> yanfang724@gmail.com<ma...@gmail.com>
>
> On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
> nramesh@linkedin.com.invalid<ma...@linkedin.com.invalid>> wrote:
>
>
>
> The NM (and hence, by extension the container) heartbeats to the RM, not
> the AM. AM gets notified of container status from the RM.
> The AM starts / stops /releases a container process by communicating to
> the
> NM.
>
> Navina
>
>
> On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com><mailto:
> tobecker@tivo.com> wrote:
>
>
>
> Ok, I thought there was some communication from the container to the AM,
> it sounds like you're saying it's in the other direction only?  Don't
> containers heartbeat to the AM?  Regardless, even if we can't get a
>
>
> better
>
>
> address for the AM from YARN, we could at least filter the addresses we
>
>
> get
>
>
> back from the JVM to exclude loopbacks.
>
> -Tommy
> ________________________________________
> From: Navina Ramesh [nramesh@linkedin.com.INVALID<mailto:
> nramesh@linkedin.com.INVALID>]
> Sent: Thursday, July 30, 2015 8:40 PM
> To: dev@samza.apache.org<ma...@samza.apache.org>
> Subject: Re: Coordinator URL always 127.0.0.1
>
> Hi Tommy,
> Yi is right. Container start is coordinated by the AppMaster using an
> NMClient. Container host name and port is provided by the RM during
> allocation.
> In Yarn (at least, afaik), when the node joins a cluster, the NM
>
>
> registers
>
>
> itself with the RM. So, the NM might still be using
> getLocalhost.getAddress().
>
> I don't know of any other way to programmatically fetch the machine's
> hostname (apart from some hacky shell commands).
>
> Cheers,
> Navina
>
> On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com><mailto:
> nickpan47@gmail.com> wrote:
>
>
>
> Hi, Tommy,
>
> Yeah, I agree that the current implementation is not bullet-proof to
>
>
> any
>
>
> different networking configuration on the host. As for the AM <->
>
>
> container
>
>
> communication, if I am not mistaken, it is through the NMClient and
>
>
> the
>
>
> node HTTP address is wrapped within the Container object returned from
>
>
> RM.
>
>
> I am not very familiar with that part of source code. Navina may be
>
>
> able
>
>
> to
>
>
> help more here.
>
> -Yi
>
> On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com><mailto:
> tobecker@tivo.com>
>
>
> wrote:
>
>
>
>
>
> Hi Yi,
> Thanks a lot for your reply.  I don't doubt we can get it to work by
> mucking with the networking configuration, but to me this feels
>
>
> like a
>
>
> workaround, not a solution.
>
>
> InetAddress.getLocalHost().getHostAddress()
>
>
> is
>
>
> not a reliable way of obtaining an IP that other machines can
>
>
> connect
>
>
> to.
>
>
> Just today I tested on several Linux distros and it did not work on
>
>
> any
>
>
> of
>
>
> them.  Can we do something more robust here?  How does the container
> communicate status to the AM?
>
> -Tommy
>
> ________________________________________
> From: Yi Pan [nickpan47@gmail.com<ma...@gmail.com>]
> Sent: Thursday, July 30, 2015 6:48 PM
> To: dev@samza.apache.org<ma...@samza.apache.org>
> Subject: Re: Coordinator URL always 127.0.0.1
>
> Hi, Tommy,
>
> I think that it might be a commonly asked question regarding to
>
>
> multiple
>
>
> IPs on a single host. A common trick w/o changing code is (copied
>
>
> from
>
>
> SO:
>
>
>
>
>
>
>
>
>
>
>
>
>
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
>
>
> )
>
> {code}
>
>   1.
>
>   Find your host name. Type: hostname. For example, you find your
>
>
> hostname
>
>
>   is mycomputer.xzy.com
>   2.
>
>   Put your host name in your hosts file. /etc/hosts . Such as
>
>   10.50.16.136 mycomputer.xzy.com
>
>
> {code}
>
> -Yi
>
> On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com><mailto:
> tobecker@tivo.com>
>
>
>
> wrote:
>
>
>
>
>
> We are testing some jobs on a YARN grid and noticed they are often
>
>
> not
>
>
> starting up properly due to being unable to connect to the job
>
>
> coordinator.
>
>
> After some investigation it seems as if the jobs are always
>
>
> getting a
>
>
> coordinator URL of http://127.0.0.1:<port>  But my understanding
>
>
> is
>
>
> that
>
>
> the coordinator runs only in the AM, so I'd expect these URLs to
>
>
> more
>
>
> often
>
>
> than not be to some other machine.  Looking at the code however,
>
>
> I'm
>
>
> not
>
>
> sure how that would ever happen since the URL for the coordinator
>
>
> always
>
>
> comes from InetAddress.getLocalHost().getHostAddress() in
> org.apache.samza.coordinator.server.HttpServer#getUrl
>
> Am I off base here?  Because I don't see how this is ever going to
>
>
> work
>
>
> in
>
>
> scenarios where the AM is on a different node than the containers.
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com<http://www.digitalsmiths.com><
> http://www.digitalsmiths.com><http://www.digitalsmiths.com>
> tobecker@tivo.com<ma...@tivo.com><mailto:tobecker@tivo.com
> ><ma...@tivo.com>
>
>
> ________________________________
>
> This email and any attachments may contain confidential and
>
>
> privileged
>
>
> material for the sole use of the intended recipient. Any review,
>
>
> copying,
>
>
> or distribution of this email (or any attachments) by others is
>
>
> prohibited.
>
>
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any
>
>
> attachments. No
>
>
> employee or agent of TiVo Inc. is authorized to conclude any
>
>
> binding
>
>
> agreement on behalf of TiVo Inc. by email. Binding agreements with
>
>
> TiVo
>
>
> Inc. may only be made by a signed written agreement.
>
>
>
>
> ________________________________
>
> This email and any attachments may contain confidential and
>
>
> privileged
>
>
> material for the sole use of the intended recipient. Any review,
>
>
> copying,
>
>
> or distribution of this email (or any attachments) by others is
>
>
> prohibited.
>
>
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments.
>
>
> No
>
>
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with
>
>
> TiVo
>
>
> Inc. may only be made by a signed written agreement.
>
>
>
>
>
>
>
>
>
> --
> Navina R.
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review,
>
>
> copying,
>
>
> or distribution of this email (or any attachments) by others is
>
>
> prohibited.
>
>
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>
>
>
>
>
>
> --
> Navina R.
>
>
>
>
>
>
>
>
>
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com<http://www.digitalsmiths.com>
> tobecker@tivo.com<ma...@tivo.com>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>



-- 
Navina R.

Re: Coordinator URL always 127.0.0.1

Posted by Tommy Becker <to...@tivo.com>.
Hey Yan,

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container

Sorry, the term "container" is overloaded.  In this context by container I meant the SamzaContainer.  What we are seeing is that jobs only start when YARN happens to place the AM and SamzaContainer(s) on the same node.  Which is increasingly unlikely as you increase container count for your job and/or expand your YARN grid.

-Tommy

On 07/30/2015 10:08 PM, Yan Fang wrote:

Hi Thommy,

{quote}
Because I don't see how this is ever going to work in scenarios where the
AM is on a different node than the containers.
{quote}

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container.

{quote}
even if we can't get a better address for the AM from YARN, we could at
least filter the addresses we get back from the JVM to exclude loopbacks.
{quote}

-- You are right. InetAddress.getLocalHost() gives back loopback address
sometimes. We should filter this out. Just googling one possible solution
<http://www.coderanch.com/t/491883/java/java/IP><http://www.coderanch.com/t/491883/java/java/IP> .

+ @Yi, @Navina,

Also, I think this fix should go to the 0.10.0 release.

What do you guys think?

Thanks,

Fang, Yan
yanfang724@gmail.com<ma...@gmail.com>

On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang <ya...@gmail.com> wrote:



Just one point to add:

{quote}
AM gets notified of container status from the RM.
{quote}

I think this is not 100% correct. AM can communicate with NM through
NMClientAsync
<https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html><https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html> to
get container status, though Samza does not implement the CallbackHandler.

Thanks,

Fang, Yan
yanfang724@gmail.com<ma...@gmail.com>

On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
nramesh@linkedin.com.invalid<ma...@linkedin.com.invalid>> wrote:



The NM (and hence, by extension the container) heartbeats to the RM, not
the AM. AM gets notified of container status from the RM.
The AM starts / stops /releases a container process by communicating to
the
NM.

Navina


On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com> wrote:



Ok, I thought there was some communication from the container to the AM,
it sounds like you're saying it's in the other direction only?  Don't
containers heartbeat to the AM?  Regardless, even if we can't get a


better


address for the AM from YARN, we could at least filter the addresses we


get


back from the JVM to exclude loopbacks.

-Tommy
________________________________________
From: Navina Ramesh [nramesh@linkedin.com.INVALID<ma...@linkedin.com.INVALID>]
Sent: Thursday, July 30, 2015 8:40 PM
To: dev@samza.apache.org<ma...@samza.apache.org>
Subject: Re: Coordinator URL always 127.0.0.1

Hi Tommy,
Yi is right. Container start is coordinated by the AppMaster using an
NMClient. Container host name and port is provided by the RM during
allocation.
In Yarn (at least, afaik), when the node joins a cluster, the NM


registers


itself with the RM. So, the NM might still be using
getLocalhost.getAddress().

I don't know of any other way to programmatically fetch the machine's
hostname (apart from some hacky shell commands).

Cheers,
Navina

On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:



Hi, Tommy,

Yeah, I agree that the current implementation is not bullet-proof to


any


different networking configuration on the host. As for the AM <->


container


communication, if I am not mistaken, it is through the NMClient and


the


node HTTP address is wrapped within the Container object returned from


RM.


I am not very familiar with that part of source code. Navina may be


able


to


help more here.

-Yi

On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com>


wrote:





Hi Yi,
Thanks a lot for your reply.  I don't doubt we can get it to work by
mucking with the networking configuration, but to me this feels


like a


workaround, not a solution.


InetAddress.getLocalHost().getHostAddress()


is


not a reliable way of obtaining an IP that other machines can


connect


to.


Just today I tested on several Linux distros and it did not work on


any


of


them.  Can we do something more robust here?  How does the container
communicate status to the AM?

-Tommy

________________________________________
From: Yi Pan [nickpan47@gmail.com<ma...@gmail.com>]
Sent: Thursday, July 30, 2015 6:48 PM
To: dev@samza.apache.org<ma...@samza.apache.org>
Subject: Re: Coordinator URL always 127.0.0.1

Hi, Tommy,

I think that it might be a commonly asked question regarding to


multiple


IPs on a single host. A common trick w/o changing code is (copied


from


SO:












http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip


)

{code}

   1.

   Find your host name. Type: hostname. For example, you find your


hostname


   is mycomputer.xzy.com
   2.

   Put your host name in your hosts file. /etc/hosts . Such as

   10.50.16.136 mycomputer.xzy.com


{code}

-Yi

On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com>


wrote:





We are testing some jobs on a YARN grid and noticed they are often


not


starting up properly due to being unable to connect to the job


coordinator.


After some investigation it seems as if the jobs are always


getting a


coordinator URL of http://127.0.0.1:<port>  But my understanding


is


that


the coordinator runs only in the AM, so I'd expect these URLs to


more


often


than not be to some other machine.  Looking at the code however,


I'm


not


sure how that would ever happen since the URL for the coordinator


always


comes from InetAddress.getLocalHost().getHostAddress() in
org.apache.samza.coordinator.server.HttpServer#getUrl

Am I off base here?  Because I don't see how this is ever going to


work


in


scenarios where the AM is on a different node than the containers.

--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com><http://www.digitalsmiths.com><http://www.digitalsmiths.com>
tobecker@tivo.com<ma...@tivo.com>

________________________________

This email and any attachments may contain confidential and


privileged


material for the sole use of the intended recipient. Any review,


copying,


or distribution of this email (or any attachments) by others is


prohibited.


If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any


attachments. No


employee or agent of TiVo Inc. is authorized to conclude any


binding


agreement on behalf of TiVo Inc. by email. Binding agreements with


TiVo


Inc. may only be made by a signed written agreement.




________________________________

This email and any attachments may contain confidential and


privileged


material for the sole use of the intended recipient. Any review,


copying,


or distribution of this email (or any attachments) by others is


prohibited.


If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments.


No


employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with


TiVo


Inc. may only be made by a signed written agreement.









--
Navina R.

________________________________

This email and any attachments may contain confidential and privileged
material for the sole use of the intended recipient. Any review,


copying,


or distribution of this email (or any attachments) by others is


prohibited.


If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments. No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
Inc. may only be made by a signed written agreement.






--
Navina R.










--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com>
tobecker@tivo.com<ma...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

Re: Coordinator URL always 127.0.0.1

Posted by Yan Fang <ya...@gmail.com>.
Hi Thommy,

{quote}
Because I don't see how this is ever going to work in scenarios where the
AM is on a different node than the containers.
{quote}

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container.

{quote}
even if we can't get a better address for the AM from YARN, we could at
least filter the addresses we get back from the JVM to exclude loopbacks.
{quote}

-- You are right. InetAddress.getLocalHost() gives back loopback address
sometimes. We should filter this out. Just googling one possible solution
<http://www.coderanch.com/t/491883/java/java/IP> .

+ @Yi, @Navina,

Also, I think this fix should go to the 0.10.0 release.

What do you guys think?

Thanks,

Fang, Yan
yanfang724@gmail.com

On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang <ya...@gmail.com> wrote:

> Just one point to add:
>
> {quote}
> AM gets notified of container status from the RM.
> {quote}
>
> I think this is not 100% correct. AM can communicate with NM through
> NMClientAsync
> <https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html> to
> get container status, though Samza does not implement the CallbackHandler.
>
> Thanks,
>
> Fang, Yan
> yanfang724@gmail.com
>
> On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
> nramesh@linkedin.com.invalid> wrote:
>
>> The NM (and hence, by extension the container) heartbeats to the RM, not
>> the AM. AM gets notified of container status from the RM.
>> The AM starts / stops /releases a container process by communicating to
>> the
>> NM.
>>
>> Navina
>>
>>
>> On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com> wrote:
>>
>> > Ok, I thought there was some communication from the container to the AM,
>> > it sounds like you're saying it's in the other direction only?  Don't
>> > containers heartbeat to the AM?  Regardless, even if we can't get a
>> better
>> > address for the AM from YARN, we could at least filter the addresses we
>> get
>> > back from the JVM to exclude loopbacks.
>> >
>> > -Tommy
>> > ________________________________________
>> > From: Navina Ramesh [nramesh@linkedin.com.INVALID]
>> > Sent: Thursday, July 30, 2015 8:40 PM
>> > To: dev@samza.apache.org
>> > Subject: Re: Coordinator URL always 127.0.0.1
>> >
>> > Hi Tommy,
>> > Yi is right. Container start is coordinated by the AppMaster using an
>> > NMClient. Container host name and port is provided by the RM during
>> > allocation.
>> > In Yarn (at least, afaik), when the node joins a cluster, the NM
>> registers
>> > itself with the RM. So, the NM might still be using
>> > getLocalhost.getAddress().
>> >
>> > I don't know of any other way to programmatically fetch the machine's
>> > hostname (apart from some hacky shell commands).
>> >
>> > Cheers,
>> > Navina
>> >
>> > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:
>> >
>> > > Hi, Tommy,
>> > >
>> > > Yeah, I agree that the current implementation is not bullet-proof to
>> any
>> > > different networking configuration on the host. As for the AM <->
>> > container
>> > > communication, if I am not mistaken, it is through the NMClient and
>> the
>> > > node HTTP address is wrapped within the Container object returned from
>> > RM.
>> > > I am not very familiar with that part of source code. Navina may be
>> able
>> > to
>> > > help more here.
>> > >
>> > > -Yi
>> > >
>> > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com>
>> > wrote:
>> > >
>> > > > Hi Yi,
>> > > > Thanks a lot for your reply.  I don't doubt we can get it to work by
>> > > > mucking with the networking configuration, but to me this feels
>> like a
>> > > > workaround, not a solution.
>> > InetAddress.getLocalHost().getHostAddress()
>> > > is
>> > > > not a reliable way of obtaining an IP that other machines can
>> connect
>> > to.
>> > > > Just today I tested on several Linux distros and it did not work on
>> any
>> > > of
>> > > > them.  Can we do something more robust here?  How does the container
>> > > > communicate status to the AM?
>> > > >
>> > > > -Tommy
>> > > >
>> > > > ________________________________________
>> > > > From: Yi Pan [nickpan47@gmail.com]
>> > > > Sent: Thursday, July 30, 2015 6:48 PM
>> > > > To: dev@samza.apache.org
>> > > > Subject: Re: Coordinator URL always 127.0.0.1
>> > > >
>> > > > Hi, Tommy,
>> > > >
>> > > > I think that it might be a commonly asked question regarding to
>> > multiple
>> > > > IPs on a single host. A common trick w/o changing code is (copied
>> from
>> > > SO:
>> > > >
>> > > >
>> > >
>> >
>> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
>> > > > )
>> > > >
>> > > > {code}
>> > > >
>> > > >    1.
>> > > >
>> > > >    Find your host name. Type: hostname. For example, you find your
>> > > hostname
>> > > >    is mycomputer.xzy.com
>> > > >    2.
>> > > >
>> > > >    Put your host name in your hosts file. /etc/hosts . Such as
>> > > >
>> > > >    10.50.16.136 mycomputer.xzy.com
>> > > >
>> > > >
>> > > > {code}
>> > > >
>> > > > -Yi
>> > > >
>> > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com>
>> > > wrote:
>> > > >
>> > > > > We are testing some jobs on a YARN grid and noticed they are often
>> > not
>> > > > > starting up properly due to being unable to connect to the job
>> > > > coordinator.
>> > > > > After some investigation it seems as if the jobs are always
>> getting a
>> > > > > coordinator URL of http://127.0.0.1:<port>  But my understanding
>> is
>> > > that
>> > > > > the coordinator runs only in the AM, so I'd expect these URLs to
>> more
>> > > > often
>> > > > > than not be to some other machine.  Looking at the code however,
>> I'm
>> > > not
>> > > > > sure how that would ever happen since the URL for the coordinator
>> > > always
>> > > > > comes from InetAddress.getLocalHost().getHostAddress() in
>> > > > > org.apache.samza.coordinator.server.HttpServer#getUrl
>> > > > >
>> > > > > Am I off base here?  Because I don't see how this is ever going to
>> > work
>> > > > in
>> > > > > scenarios where the AM is on a different node than the containers.
>> > > > >
>> > > > > --
>> > > > > Tommy Becker
>> > > > > Senior Software Engineer
>> > > > >
>> > > > > Digitalsmiths
>> > > > > A TiVo Company
>> > > > >
>> > > > > www.digitalsmiths.com<http://www.digitalsmiths.com>
>> > > > > tobecker@tivo.com<ma...@tivo.com>
>> > > > >
>> > > > > ________________________________
>> > > > >
>> > > > > This email and any attachments may contain confidential and
>> > privileged
>> > > > > material for the sole use of the intended recipient. Any review,
>> > > copying,
>> > > > > or distribution of this email (or any attachments) by others is
>> > > > prohibited.
>> > > > > If you are not the intended recipient, please contact the sender
>> > > > > immediately and permanently delete this email and any
>> attachments. No
>> > > > > employee or agent of TiVo Inc. is authorized to conclude any
>> binding
>> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements with
>> > TiVo
>> > > > > Inc. may only be made by a signed written agreement.
>> > > > >
>> > > >
>> > > > ________________________________
>> > > >
>> > > > This email and any attachments may contain confidential and
>> privileged
>> > > > material for the sole use of the intended recipient. Any review,
>> > copying,
>> > > > or distribution of this email (or any attachments) by others is
>> > > prohibited.
>> > > > If you are not the intended recipient, please contact the sender
>> > > > immediately and permanently delete this email and any attachments.
>> No
>> > > > employee or agent of TiVo Inc. is authorized to conclude any binding
>> > > > agreement on behalf of TiVo Inc. by email. Binding agreements with
>> TiVo
>> > > > Inc. may only be made by a signed written agreement.
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Navina R.
>> >
>> > ________________________________
>> >
>> > This email and any attachments may contain confidential and privileged
>> > material for the sole use of the intended recipient. Any review,
>> copying,
>> > or distribution of this email (or any attachments) by others is
>> prohibited.
>> > If you are not the intended recipient, please contact the sender
>> > immediately and permanently delete this email and any attachments. No
>> > employee or agent of TiVo Inc. is authorized to conclude any binding
>> > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
>> > Inc. may only be made by a signed written agreement.
>> >
>>
>>
>>
>> --
>> Navina R.
>>
>
>

Re: Coordinator URL always 127.0.0.1

Posted by Navina Ramesh <nr...@linkedin.com.INVALID>.
+1 for the fix!
On Jul 30, 2015 9:55 PM, "Navina Ramesh" <nr...@linkedin.com> wrote:

> Yes, Yan. But that communication is initiated by the AM. Whether an
> application's AM does it  or not, the NM always heartbeats the status of
> its containers to the RM.
> On Jul 30, 2015 6:40 PM, "Yan Fang" <ya...@gmail.com> wrote:
>
>> Just one point to add:
>>
>> {quote}
>> AM gets notified of container status from the RM.
>> {quote}
>>
>> I think this is not 100% correct. AM can communicate with NM through
>> NMClientAsync
>> <
>> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html
>> >
>> to
>> get container status, though Samza does not implement the CallbackHandler.
>>
>> Thanks,
>>
>> Fang, Yan
>> yanfang724@gmail.com
>>
>> On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh
>> <nramesh@linkedin.com.invalid
>> > wrote:
>>
>> > The NM (and hence, by extension the container) heartbeats to the RM, not
>> > the AM. AM gets notified of container status from the RM.
>> > The AM starts / stops /releases a container process by communicating to
>> the
>> > NM.
>> >
>> > Navina
>> >
>> >
>> > On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com>
>> wrote:
>> >
>> > > Ok, I thought there was some communication from the container to the
>> AM,
>> > > it sounds like you're saying it's in the other direction only?  Don't
>> > > containers heartbeat to the AM?  Regardless, even if we can't get a
>> > better
>> > > address for the AM from YARN, we could at least filter the addresses
>> we
>> > get
>> > > back from the JVM to exclude loopbacks.
>> > >
>> > > -Tommy
>> > > ________________________________________
>> > > From: Navina Ramesh [nramesh@linkedin.com.INVALID]
>> > > Sent: Thursday, July 30, 2015 8:40 PM
>> > > To: dev@samza.apache.org
>> > > Subject: Re: Coordinator URL always 127.0.0.1
>> > >
>> > > Hi Tommy,
>> > > Yi is right. Container start is coordinated by the AppMaster using an
>> > > NMClient. Container host name and port is provided by the RM during
>> > > allocation.
>> > > In Yarn (at least, afaik), when the node joins a cluster, the NM
>> > registers
>> > > itself with the RM. So, the NM might still be using
>> > > getLocalhost.getAddress().
>> > >
>> > > I don't know of any other way to programmatically fetch the machine's
>> > > hostname (apart from some hacky shell commands).
>> > >
>> > > Cheers,
>> > > Navina
>> > >
>> > > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:
>> > >
>> > > > Hi, Tommy,
>> > > >
>> > > > Yeah, I agree that the current implementation is not bullet-proof to
>> > any
>> > > > different networking configuration on the host. As for the AM <->
>> > > container
>> > > > communication, if I am not mistaken, it is through the NMClient and
>> the
>> > > > node HTTP address is wrapped within the Container object returned
>> from
>> > > RM.
>> > > > I am not very familiar with that part of source code. Navina may be
>> > able
>> > > to
>> > > > help more here.
>> > > >
>> > > > -Yi
>> > > >
>> > > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com>
>> > > wrote:
>> > > >
>> > > > > Hi Yi,
>> > > > > Thanks a lot for your reply.  I don't doubt we can get it to work
>> by
>> > > > > mucking with the networking configuration, but to me this feels
>> like
>> > a
>> > > > > workaround, not a solution.
>> > > InetAddress.getLocalHost().getHostAddress()
>> > > > is
>> > > > > not a reliable way of obtaining an IP that other machines can
>> connect
>> > > to.
>> > > > > Just today I tested on several Linux distros and it did not work
>> on
>> > any
>> > > > of
>> > > > > them.  Can we do something more robust here?  How does the
>> container
>> > > > > communicate status to the AM?
>> > > > >
>> > > > > -Tommy
>> > > > >
>> > > > > ________________________________________
>> > > > > From: Yi Pan [nickpan47@gmail.com]
>> > > > > Sent: Thursday, July 30, 2015 6:48 PM
>> > > > > To: dev@samza.apache.org
>> > > > > Subject: Re: Coordinator URL always 127.0.0.1
>> > > > >
>> > > > > Hi, Tommy,
>> > > > >
>> > > > > I think that it might be a commonly asked question regarding to
>> > > multiple
>> > > > > IPs on a single host. A common trick w/o changing code is (copied
>> > from
>> > > > SO:
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
>> > > > > )
>> > > > >
>> > > > > {code}
>> > > > >
>> > > > >    1.
>> > > > >
>> > > > >    Find your host name. Type: hostname. For example, you find your
>> > > > hostname
>> > > > >    is mycomputer.xzy.com
>> > > > >    2.
>> > > > >
>> > > > >    Put your host name in your hosts file. /etc/hosts . Such as
>> > > > >
>> > > > >    10.50.16.136 mycomputer.xzy.com
>> > > > >
>> > > > >
>> > > > > {code}
>> > > > >
>> > > > > -Yi
>> > > > >
>> > > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <tobecker@tivo.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > We are testing some jobs on a YARN grid and noticed they are
>> often
>> > > not
>> > > > > > starting up properly due to being unable to connect to the job
>> > > > > coordinator.
>> > > > > > After some investigation it seems as if the jobs are always
>> > getting a
>> > > > > > coordinator URL of http://127.0.0.1:<port>  But my
>> understanding
>> > is
>> > > > that
>> > > > > > the coordinator runs only in the AM, so I'd expect these URLs to
>> > more
>> > > > > often
>> > > > > > than not be to some other machine.  Looking at the code however,
>> > I'm
>> > > > not
>> > > > > > sure how that would ever happen since the URL for the
>> coordinator
>> > > > always
>> > > > > > comes from InetAddress.getLocalHost().getHostAddress() in
>> > > > > > org.apache.samza.coordinator.server.HttpServer#getUrl
>> > > > > >
>> > > > > > Am I off base here?  Because I don't see how this is ever going
>> to
>> > > work
>> > > > > in
>> > > > > > scenarios where the AM is on a different node than the
>> containers.
>> > > > > >
>> > > > > > --
>> > > > > > Tommy Becker
>> > > > > > Senior Software Engineer
>> > > > > >
>> > > > > > Digitalsmiths
>> > > > > > A TiVo Company
>> > > > > >
>> > > > > > www.digitalsmiths.com<http://www.digitalsmiths.com>
>> > > > > > tobecker@tivo.com<ma...@tivo.com>
>> > > > > >
>> > > > > > ________________________________
>> > > > > >
>> > > > > > This email and any attachments may contain confidential and
>> > > privileged
>> > > > > > material for the sole use of the intended recipient. Any review,
>> > > > copying,
>> > > > > > or distribution of this email (or any attachments) by others is
>> > > > > prohibited.
>> > > > > > If you are not the intended recipient, please contact the sender
>> > > > > > immediately and permanently delete this email and any
>> attachments.
>> > No
>> > > > > > employee or agent of TiVo Inc. is authorized to conclude any
>> > binding
>> > > > > > agreement on behalf of TiVo Inc. by email. Binding agreements
>> with
>> > > TiVo
>> > > > > > Inc. may only be made by a signed written agreement.
>> > > > > >
>> > > > >
>> > > > > ________________________________
>> > > > >
>> > > > > This email and any attachments may contain confidential and
>> > privileged
>> > > > > material for the sole use of the intended recipient. Any review,
>> > > copying,
>> > > > > or distribution of this email (or any attachments) by others is
>> > > > prohibited.
>> > > > > If you are not the intended recipient, please contact the sender
>> > > > > immediately and permanently delete this email and any
>> attachments. No
>> > > > > employee or agent of TiVo Inc. is authorized to conclude any
>> binding
>> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements with
>> > TiVo
>> > > > > Inc. may only be made by a signed written agreement.
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Navina R.
>> > >
>> > > ________________________________
>> > >
>> > > This email and any attachments may contain confidential and privileged
>> > > material for the sole use of the intended recipient. Any review,
>> copying,
>> > > or distribution of this email (or any attachments) by others is
>> > prohibited.
>> > > If you are not the intended recipient, please contact the sender
>> > > immediately and permanently delete this email and any attachments. No
>> > > employee or agent of TiVo Inc. is authorized to conclude any binding
>> > > agreement on behalf of TiVo Inc. by email. Binding agreements with
>> TiVo
>> > > Inc. may only be made by a signed written agreement.
>> > >
>> >
>> >
>> >
>> > --
>> > Navina R.
>> >
>>
>

Re: Coordinator URL always 127.0.0.1

Posted by Navina Ramesh <nr...@linkedin.com.INVALID>.
Yes, Yan. But that communication is initiated by the AM. Whether an
application's AM does it  or not, the NM always heartbeats the status of
its containers to the RM.
On Jul 30, 2015 6:40 PM, "Yan Fang" <ya...@gmail.com> wrote:

> Just one point to add:
>
> {quote}
> AM gets notified of container status from the RM.
> {quote}
>
> I think this is not 100% correct. AM can communicate with NM through
> NMClientAsync
> <
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html
> >
> to
> get container status, though Samza does not implement the CallbackHandler.
>
> Thanks,
>
> Fang, Yan
> yanfang724@gmail.com
>
> On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh
> <nramesh@linkedin.com.invalid
> > wrote:
>
> > The NM (and hence, by extension the container) heartbeats to the RM, not
> > the AM. AM gets notified of container status from the RM.
> > The AM starts / stops /releases a container process by communicating to
> the
> > NM.
> >
> > Navina
> >
> >
> > On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com>
> wrote:
> >
> > > Ok, I thought there was some communication from the container to the
> AM,
> > > it sounds like you're saying it's in the other direction only?  Don't
> > > containers heartbeat to the AM?  Regardless, even if we can't get a
> > better
> > > address for the AM from YARN, we could at least filter the addresses we
> > get
> > > back from the JVM to exclude loopbacks.
> > >
> > > -Tommy
> > > ________________________________________
> > > From: Navina Ramesh [nramesh@linkedin.com.INVALID]
> > > Sent: Thursday, July 30, 2015 8:40 PM
> > > To: dev@samza.apache.org
> > > Subject: Re: Coordinator URL always 127.0.0.1
> > >
> > > Hi Tommy,
> > > Yi is right. Container start is coordinated by the AppMaster using an
> > > NMClient. Container host name and port is provided by the RM during
> > > allocation.
> > > In Yarn (at least, afaik), when the node joins a cluster, the NM
> > registers
> > > itself with the RM. So, the NM might still be using
> > > getLocalhost.getAddress().
> > >
> > > I don't know of any other way to programmatically fetch the machine's
> > > hostname (apart from some hacky shell commands).
> > >
> > > Cheers,
> > > Navina
> > >
> > > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:
> > >
> > > > Hi, Tommy,
> > > >
> > > > Yeah, I agree that the current implementation is not bullet-proof to
> > any
> > > > different networking configuration on the host. As for the AM <->
> > > container
> > > > communication, if I am not mistaken, it is through the NMClient and
> the
> > > > node HTTP address is wrapped within the Container object returned
> from
> > > RM.
> > > > I am not very familiar with that part of source code. Navina may be
> > able
> > > to
> > > > help more here.
> > > >
> > > > -Yi
> > > >
> > > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com>
> > > wrote:
> > > >
> > > > > Hi Yi,
> > > > > Thanks a lot for your reply.  I don't doubt we can get it to work
> by
> > > > > mucking with the networking configuration, but to me this feels
> like
> > a
> > > > > workaround, not a solution.
> > > InetAddress.getLocalHost().getHostAddress()
> > > > is
> > > > > not a reliable way of obtaining an IP that other machines can
> connect
> > > to.
> > > > > Just today I tested on several Linux distros and it did not work on
> > any
> > > > of
> > > > > them.  Can we do something more robust here?  How does the
> container
> > > > > communicate status to the AM?
> > > > >
> > > > > -Tommy
> > > > >
> > > > > ________________________________________
> > > > > From: Yi Pan [nickpan47@gmail.com]
> > > > > Sent: Thursday, July 30, 2015 6:48 PM
> > > > > To: dev@samza.apache.org
> > > > > Subject: Re: Coordinator URL always 127.0.0.1
> > > > >
> > > > > Hi, Tommy,
> > > > >
> > > > > I think that it might be a commonly asked question regarding to
> > > multiple
> > > > > IPs on a single host. A common trick w/o changing code is (copied
> > from
> > > > SO:
> > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> > > > > )
> > > > >
> > > > > {code}
> > > > >
> > > > >    1.
> > > > >
> > > > >    Find your host name. Type: hostname. For example, you find your
> > > > hostname
> > > > >    is mycomputer.xzy.com
> > > > >    2.
> > > > >
> > > > >    Put your host name in your hosts file. /etc/hosts . Such as
> > > > >
> > > > >    10.50.16.136 mycomputer.xzy.com
> > > > >
> > > > >
> > > > > {code}
> > > > >
> > > > > -Yi
> > > > >
> > > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com>
> > > > wrote:
> > > > >
> > > > > > We are testing some jobs on a YARN grid and noticed they are
> often
> > > not
> > > > > > starting up properly due to being unable to connect to the job
> > > > > coordinator.
> > > > > > After some investigation it seems as if the jobs are always
> > getting a
> > > > > > coordinator URL of http://127.0.0.1:<port>  But my understanding
> > is
> > > > that
> > > > > > the coordinator runs only in the AM, so I'd expect these URLs to
> > more
> > > > > often
> > > > > > than not be to some other machine.  Looking at the code however,
> > I'm
> > > > not
> > > > > > sure how that would ever happen since the URL for the coordinator
> > > > always
> > > > > > comes from InetAddress.getLocalHost().getHostAddress() in
> > > > > > org.apache.samza.coordinator.server.HttpServer#getUrl
> > > > > >
> > > > > > Am I off base here?  Because I don't see how this is ever going
> to
> > > work
> > > > > in
> > > > > > scenarios where the AM is on a different node than the
> containers.
> > > > > >
> > > > > > --
> > > > > > Tommy Becker
> > > > > > Senior Software Engineer
> > > > > >
> > > > > > Digitalsmiths
> > > > > > A TiVo Company
> > > > > >
> > > > > > www.digitalsmiths.com<http://www.digitalsmiths.com>
> > > > > > tobecker@tivo.com<ma...@tivo.com>
> > > > > >
> > > > > > ________________________________
> > > > > >
> > > > > > This email and any attachments may contain confidential and
> > > privileged
> > > > > > material for the sole use of the intended recipient. Any review,
> > > > copying,
> > > > > > or distribution of this email (or any attachments) by others is
> > > > > prohibited.
> > > > > > If you are not the intended recipient, please contact the sender
> > > > > > immediately and permanently delete this email and any
> attachments.
> > No
> > > > > > employee or agent of TiVo Inc. is authorized to conclude any
> > binding
> > > > > > agreement on behalf of TiVo Inc. by email. Binding agreements
> with
> > > TiVo
> > > > > > Inc. may only be made by a signed written agreement.
> > > > > >
> > > > >
> > > > > ________________________________
> > > > >
> > > > > This email and any attachments may contain confidential and
> > privileged
> > > > > material for the sole use of the intended recipient. Any review,
> > > copying,
> > > > > or distribution of this email (or any attachments) by others is
> > > > prohibited.
> > > > > If you are not the intended recipient, please contact the sender
> > > > > immediately and permanently delete this email and any attachments.
> No
> > > > > employee or agent of TiVo Inc. is authorized to conclude any
> binding
> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements with
> > TiVo
> > > > > Inc. may only be made by a signed written agreement.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Navina R.
> > >
> > > ________________________________
> > >
> > > This email and any attachments may contain confidential and privileged
> > > material for the sole use of the intended recipient. Any review,
> copying,
> > > or distribution of this email (or any attachments) by others is
> > prohibited.
> > > If you are not the intended recipient, please contact the sender
> > > immediately and permanently delete this email and any attachments. No
> > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > > Inc. may only be made by a signed written agreement.
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>

Re: Coordinator URL always 127.0.0.1

Posted by Yan Fang <ya...@gmail.com>.
Just one point to add:

{quote}
AM gets notified of container status from the RM.
{quote}

I think this is not 100% correct. AM can communicate with NM through
NMClientAsync
<https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html>
to
get container status, though Samza does not implement the CallbackHandler.

Thanks,

Fang, Yan
yanfang724@gmail.com

On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <nramesh@linkedin.com.invalid
> wrote:

> The NM (and hence, by extension the container) heartbeats to the RM, not
> the AM. AM gets notified of container status from the RM.
> The AM starts / stops /releases a container process by communicating to the
> NM.
>
> Navina
>
>
> On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com> wrote:
>
> > Ok, I thought there was some communication from the container to the AM,
> > it sounds like you're saying it's in the other direction only?  Don't
> > containers heartbeat to the AM?  Regardless, even if we can't get a
> better
> > address for the AM from YARN, we could at least filter the addresses we
> get
> > back from the JVM to exclude loopbacks.
> >
> > -Tommy
> > ________________________________________
> > From: Navina Ramesh [nramesh@linkedin.com.INVALID]
> > Sent: Thursday, July 30, 2015 8:40 PM
> > To: dev@samza.apache.org
> > Subject: Re: Coordinator URL always 127.0.0.1
> >
> > Hi Tommy,
> > Yi is right. Container start is coordinated by the AppMaster using an
> > NMClient. Container host name and port is provided by the RM during
> > allocation.
> > In Yarn (at least, afaik), when the node joins a cluster, the NM
> registers
> > itself with the RM. So, the NM might still be using
> > getLocalhost.getAddress().
> >
> > I don't know of any other way to programmatically fetch the machine's
> > hostname (apart from some hacky shell commands).
> >
> > Cheers,
> > Navina
> >
> > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:
> >
> > > Hi, Tommy,
> > >
> > > Yeah, I agree that the current implementation is not bullet-proof to
> any
> > > different networking configuration on the host. As for the AM <->
> > container
> > > communication, if I am not mistaken, it is through the NMClient and the
> > > node HTTP address is wrapped within the Container object returned from
> > RM.
> > > I am not very familiar with that part of source code. Navina may be
> able
> > to
> > > help more here.
> > >
> > > -Yi
> > >
> > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com>
> > wrote:
> > >
> > > > Hi Yi,
> > > > Thanks a lot for your reply.  I don't doubt we can get it to work by
> > > > mucking with the networking configuration, but to me this feels like
> a
> > > > workaround, not a solution.
> > InetAddress.getLocalHost().getHostAddress()
> > > is
> > > > not a reliable way of obtaining an IP that other machines can connect
> > to.
> > > > Just today I tested on several Linux distros and it did not work on
> any
> > > of
> > > > them.  Can we do something more robust here?  How does the container
> > > > communicate status to the AM?
> > > >
> > > > -Tommy
> > > >
> > > > ________________________________________
> > > > From: Yi Pan [nickpan47@gmail.com]
> > > > Sent: Thursday, July 30, 2015 6:48 PM
> > > > To: dev@samza.apache.org
> > > > Subject: Re: Coordinator URL always 127.0.0.1
> > > >
> > > > Hi, Tommy,
> > > >
> > > > I think that it might be a commonly asked question regarding to
> > multiple
> > > > IPs on a single host. A common trick w/o changing code is (copied
> from
> > > SO:
> > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> > > > )
> > > >
> > > > {code}
> > > >
> > > >    1.
> > > >
> > > >    Find your host name. Type: hostname. For example, you find your
> > > hostname
> > > >    is mycomputer.xzy.com
> > > >    2.
> > > >
> > > >    Put your host name in your hosts file. /etc/hosts . Such as
> > > >
> > > >    10.50.16.136 mycomputer.xzy.com
> > > >
> > > >
> > > > {code}
> > > >
> > > > -Yi
> > > >
> > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com>
> > > wrote:
> > > >
> > > > > We are testing some jobs on a YARN grid and noticed they are often
> > not
> > > > > starting up properly due to being unable to connect to the job
> > > > coordinator.
> > > > > After some investigation it seems as if the jobs are always
> getting a
> > > > > coordinator URL of http://127.0.0.1:<port>  But my understanding
> is
> > > that
> > > > > the coordinator runs only in the AM, so I'd expect these URLs to
> more
> > > > often
> > > > > than not be to some other machine.  Looking at the code however,
> I'm
> > > not
> > > > > sure how that would ever happen since the URL for the coordinator
> > > always
> > > > > comes from InetAddress.getLocalHost().getHostAddress() in
> > > > > org.apache.samza.coordinator.server.HttpServer#getUrl
> > > > >
> > > > > Am I off base here?  Because I don't see how this is ever going to
> > work
> > > > in
> > > > > scenarios where the AM is on a different node than the containers.
> > > > >
> > > > > --
> > > > > Tommy Becker
> > > > > Senior Software Engineer
> > > > >
> > > > > Digitalsmiths
> > > > > A TiVo Company
> > > > >
> > > > > www.digitalsmiths.com<http://www.digitalsmiths.com>
> > > > > tobecker@tivo.com<ma...@tivo.com>
> > > > >
> > > > > ________________________________
> > > > >
> > > > > This email and any attachments may contain confidential and
> > privileged
> > > > > material for the sole use of the intended recipient. Any review,
> > > copying,
> > > > > or distribution of this email (or any attachments) by others is
> > > > prohibited.
> > > > > If you are not the intended recipient, please contact the sender
> > > > > immediately and permanently delete this email and any attachments.
> No
> > > > > employee or agent of TiVo Inc. is authorized to conclude any
> binding
> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements with
> > TiVo
> > > > > Inc. may only be made by a signed written agreement.
> > > > >
> > > >
> > > > ________________________________
> > > >
> > > > This email and any attachments may contain confidential and
> privileged
> > > > material for the sole use of the intended recipient. Any review,
> > copying,
> > > > or distribution of this email (or any attachments) by others is
> > > prohibited.
> > > > If you are not the intended recipient, please contact the sender
> > > > immediately and permanently delete this email and any attachments. No
> > > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > > agreement on behalf of TiVo Inc. by email. Binding agreements with
> TiVo
> > > > Inc. may only be made by a signed written agreement.
> > > >
> > >
> >
> >
> >
> > --
> > Navina R.
> >
> > ________________________________
> >
> > This email and any attachments may contain confidential and privileged
> > material for the sole use of the intended recipient. Any review, copying,
> > or distribution of this email (or any attachments) by others is
> prohibited.
> > If you are not the intended recipient, please contact the sender
> > immediately and permanently delete this email and any attachments. No
> > employee or agent of TiVo Inc. is authorized to conclude any binding
> > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > Inc. may only be made by a signed written agreement.
> >
>
>
>
> --
> Navina R.
>

Re: Coordinator URL always 127.0.0.1

Posted by Navina Ramesh <nr...@linkedin.com.INVALID>.
The NM (and hence, by extension the container) heartbeats to the RM, not
the AM. AM gets notified of container status from the RM.
The AM starts / stops /releases a container process by communicating to the
NM.

Navina


On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <to...@tivo.com> wrote:

> Ok, I thought there was some communication from the container to the AM,
> it sounds like you're saying it's in the other direction only?  Don't
> containers heartbeat to the AM?  Regardless, even if we can't get a better
> address for the AM from YARN, we could at least filter the addresses we get
> back from the JVM to exclude loopbacks.
>
> -Tommy
> ________________________________________
> From: Navina Ramesh [nramesh@linkedin.com.INVALID]
> Sent: Thursday, July 30, 2015 8:40 PM
> To: dev@samza.apache.org
> Subject: Re: Coordinator URL always 127.0.0.1
>
> Hi Tommy,
> Yi is right. Container start is coordinated by the AppMaster using an
> NMClient. Container host name and port is provided by the RM during
> allocation.
> In Yarn (at least, afaik), when the node joins a cluster, the NM registers
> itself with the RM. So, the NM might still be using
> getLocalhost.getAddress().
>
> I don't know of any other way to programmatically fetch the machine's
> hostname (apart from some hacky shell commands).
>
> Cheers,
> Navina
>
> On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:
>
> > Hi, Tommy,
> >
> > Yeah, I agree that the current implementation is not bullet-proof to any
> > different networking configuration on the host. As for the AM <->
> container
> > communication, if I am not mistaken, it is through the NMClient and the
> > node HTTP address is wrapped within the Container object returned from
> RM.
> > I am not very familiar with that part of source code. Navina may be able
> to
> > help more here.
> >
> > -Yi
> >
> > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com>
> wrote:
> >
> > > Hi Yi,
> > > Thanks a lot for your reply.  I don't doubt we can get it to work by
> > > mucking with the networking configuration, but to me this feels like a
> > > workaround, not a solution.
> InetAddress.getLocalHost().getHostAddress()
> > is
> > > not a reliable way of obtaining an IP that other machines can connect
> to.
> > > Just today I tested on several Linux distros and it did not work on any
> > of
> > > them.  Can we do something more robust here?  How does the container
> > > communicate status to the AM?
> > >
> > > -Tommy
> > >
> > > ________________________________________
> > > From: Yi Pan [nickpan47@gmail.com]
> > > Sent: Thursday, July 30, 2015 6:48 PM
> > > To: dev@samza.apache.org
> > > Subject: Re: Coordinator URL always 127.0.0.1
> > >
> > > Hi, Tommy,
> > >
> > > I think that it might be a commonly asked question regarding to
> multiple
> > > IPs on a single host. A common trick w/o changing code is (copied from
> > SO:
> > >
> > >
> >
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> > > )
> > >
> > > {code}
> > >
> > >    1.
> > >
> > >    Find your host name. Type: hostname. For example, you find your
> > hostname
> > >    is mycomputer.xzy.com
> > >    2.
> > >
> > >    Put your host name in your hosts file. /etc/hosts . Such as
> > >
> > >    10.50.16.136 mycomputer.xzy.com
> > >
> > >
> > > {code}
> > >
> > > -Yi
> > >
> > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com>
> > wrote:
> > >
> > > > We are testing some jobs on a YARN grid and noticed they are often
> not
> > > > starting up properly due to being unable to connect to the job
> > > coordinator.
> > > > After some investigation it seems as if the jobs are always getting a
> > > > coordinator URL of http://127.0.0.1:<port>  But my understanding is
> > that
> > > > the coordinator runs only in the AM, so I'd expect these URLs to more
> > > often
> > > > than not be to some other machine.  Looking at the code however, I'm
> > not
> > > > sure how that would ever happen since the URL for the coordinator
> > always
> > > > comes from InetAddress.getLocalHost().getHostAddress() in
> > > > org.apache.samza.coordinator.server.HttpServer#getUrl
> > > >
> > > > Am I off base here?  Because I don't see how this is ever going to
> work
> > > in
> > > > scenarios where the AM is on a different node than the containers.
> > > >
> > > > --
> > > > Tommy Becker
> > > > Senior Software Engineer
> > > >
> > > > Digitalsmiths
> > > > A TiVo Company
> > > >
> > > > www.digitalsmiths.com<http://www.digitalsmiths.com>
> > > > tobecker@tivo.com<ma...@tivo.com>
> > > >
> > > > ________________________________
> > > >
> > > > This email and any attachments may contain confidential and
> privileged
> > > > material for the sole use of the intended recipient. Any review,
> > copying,
> > > > or distribution of this email (or any attachments) by others is
> > > prohibited.
> > > > If you are not the intended recipient, please contact the sender
> > > > immediately and permanently delete this email and any attachments. No
> > > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > > agreement on behalf of TiVo Inc. by email. Binding agreements with
> TiVo
> > > > Inc. may only be made by a signed written agreement.
> > > >
> > >
> > > ________________________________
> > >
> > > This email and any attachments may contain confidential and privileged
> > > material for the sole use of the intended recipient. Any review,
> copying,
> > > or distribution of this email (or any attachments) by others is
> > prohibited.
> > > If you are not the intended recipient, please contact the sender
> > > immediately and permanently delete this email and any attachments. No
> > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > > Inc. may only be made by a signed written agreement.
> > >
> >
>
>
>
> --
> Navina R.
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>



-- 
Navina R.

RE: Coordinator URL always 127.0.0.1

Posted by Thomas Becker <to...@Tivo.com>.
Ok, I thought there was some communication from the container to the AM, it sounds like you're saying it's in the other direction only?  Don't containers heartbeat to the AM?  Regardless, even if we can't get a better address for the AM from YARN, we could at least filter the addresses we get back from the JVM to exclude loopbacks.

-Tommy
________________________________________
From: Navina Ramesh [nramesh@linkedin.com.INVALID]
Sent: Thursday, July 30, 2015 8:40 PM
To: dev@samza.apache.org
Subject: Re: Coordinator URL always 127.0.0.1

Hi Tommy,
Yi is right. Container start is coordinated by the AppMaster using an
NMClient. Container host name and port is provided by the RM during
allocation.
In Yarn (at least, afaik), when the node joins a cluster, the NM registers
itself with the RM. So, the NM might still be using
getLocalhost.getAddress().

I don't know of any other way to programmatically fetch the machine's
hostname (apart from some hacky shell commands).

Cheers,
Navina

On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:

> Hi, Tommy,
>
> Yeah, I agree that the current implementation is not bullet-proof to any
> different networking configuration on the host. As for the AM <-> container
> communication, if I am not mistaken, it is through the NMClient and the
> node HTTP address is wrapped within the Container object returned from RM.
> I am not very familiar with that part of source code. Navina may be able to
> help more here.
>
> -Yi
>
> On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com> wrote:
>
> > Hi Yi,
> > Thanks a lot for your reply.  I don't doubt we can get it to work by
> > mucking with the networking configuration, but to me this feels like a
> > workaround, not a solution.  InetAddress.getLocalHost().getHostAddress()
> is
> > not a reliable way of obtaining an IP that other machines can connect to.
> > Just today I tested on several Linux distros and it did not work on any
> of
> > them.  Can we do something more robust here?  How does the container
> > communicate status to the AM?
> >
> > -Tommy
> >
> > ________________________________________
> > From: Yi Pan [nickpan47@gmail.com]
> > Sent: Thursday, July 30, 2015 6:48 PM
> > To: dev@samza.apache.org
> > Subject: Re: Coordinator URL always 127.0.0.1
> >
> > Hi, Tommy,
> >
> > I think that it might be a commonly asked question regarding to multiple
> > IPs on a single host. A common trick w/o changing code is (copied from
> SO:
> >
> >
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> > )
> >
> > {code}
> >
> >    1.
> >
> >    Find your host name. Type: hostname. For example, you find your
> hostname
> >    is mycomputer.xzy.com
> >    2.
> >
> >    Put your host name in your hosts file. /etc/hosts . Such as
> >
> >    10.50.16.136 mycomputer.xzy.com
> >
> >
> > {code}
> >
> > -Yi
> >
> > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com>
> wrote:
> >
> > > We are testing some jobs on a YARN grid and noticed they are often not
> > > starting up properly due to being unable to connect to the job
> > coordinator.
> > > After some investigation it seems as if the jobs are always getting a
> > > coordinator URL of http://127.0.0.1:<port>  But my understanding is
> that
> > > the coordinator runs only in the AM, so I'd expect these URLs to more
> > often
> > > than not be to some other machine.  Looking at the code however, I'm
> not
> > > sure how that would ever happen since the URL for the coordinator
> always
> > > comes from InetAddress.getLocalHost().getHostAddress() in
> > > org.apache.samza.coordinator.server.HttpServer#getUrl
> > >
> > > Am I off base here?  Because I don't see how this is ever going to work
> > in
> > > scenarios where the AM is on a different node than the containers.
> > >
> > > --
> > > Tommy Becker
> > > Senior Software Engineer
> > >
> > > Digitalsmiths
> > > A TiVo Company
> > >
> > > www.digitalsmiths.com<http://www.digitalsmiths.com>
> > > tobecker@tivo.com<ma...@tivo.com>
> > >
> > > ________________________________
> > >
> > > This email and any attachments may contain confidential and privileged
> > > material for the sole use of the intended recipient. Any review,
> copying,
> > > or distribution of this email (or any attachments) by others is
> > prohibited.
> > > If you are not the intended recipient, please contact the sender
> > > immediately and permanently delete this email and any attachments. No
> > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > > Inc. may only be made by a signed written agreement.
> > >
> >
> > ________________________________
> >
> > This email and any attachments may contain confidential and privileged
> > material for the sole use of the intended recipient. Any review, copying,
> > or distribution of this email (or any attachments) by others is
> prohibited.
> > If you are not the intended recipient, please contact the sender
> > immediately and permanently delete this email and any attachments. No
> > employee or agent of TiVo Inc. is authorized to conclude any binding
> > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > Inc. may only be made by a signed written agreement.
> >
>



--
Navina R.

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

Re: Coordinator URL always 127.0.0.1

Posted by Navina Ramesh <nr...@linkedin.com.INVALID>.
Hi Tommy,
Yi is right. Container start is coordinated by the AppMaster using an
NMClient. Container host name and port is provided by the RM during
allocation.
In Yarn (at least, afaik), when the node joins a cluster, the NM registers
itself with the RM. So, the NM might still be using
getLocalhost.getAddress().

I don't know of any other way to programmatically fetch the machine's
hostname (apart from some hacky shell commands).

Cheers,
Navina

On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <ni...@gmail.com> wrote:

> Hi, Tommy,
>
> Yeah, I agree that the current implementation is not bullet-proof to any
> different networking configuration on the host. As for the AM <-> container
> communication, if I am not mistaken, it is through the NMClient and the
> node HTTP address is wrapped within the Container object returned from RM.
> I am not very familiar with that part of source code. Navina may be able to
> help more here.
>
> -Yi
>
> On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com> wrote:
>
> > Hi Yi,
> > Thanks a lot for your reply.  I don't doubt we can get it to work by
> > mucking with the networking configuration, but to me this feels like a
> > workaround, not a solution.  InetAddress.getLocalHost().getHostAddress()
> is
> > not a reliable way of obtaining an IP that other machines can connect to.
> > Just today I tested on several Linux distros and it did not work on any
> of
> > them.  Can we do something more robust here?  How does the container
> > communicate status to the AM?
> >
> > -Tommy
> >
> > ________________________________________
> > From: Yi Pan [nickpan47@gmail.com]
> > Sent: Thursday, July 30, 2015 6:48 PM
> > To: dev@samza.apache.org
> > Subject: Re: Coordinator URL always 127.0.0.1
> >
> > Hi, Tommy,
> >
> > I think that it might be a commonly asked question regarding to multiple
> > IPs on a single host. A common trick w/o changing code is (copied from
> SO:
> >
> >
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> > )
> >
> > {code}
> >
> >    1.
> >
> >    Find your host name. Type: hostname. For example, you find your
> hostname
> >    is mycomputer.xzy.com
> >    2.
> >
> >    Put your host name in your hosts file. /etc/hosts . Such as
> >
> >    10.50.16.136 mycomputer.xzy.com
> >
> >
> > {code}
> >
> > -Yi
> >
> > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com>
> wrote:
> >
> > > We are testing some jobs on a YARN grid and noticed they are often not
> > > starting up properly due to being unable to connect to the job
> > coordinator.
> > > After some investigation it seems as if the jobs are always getting a
> > > coordinator URL of http://127.0.0.1:<port>  But my understanding is
> that
> > > the coordinator runs only in the AM, so I'd expect these URLs to more
> > often
> > > than not be to some other machine.  Looking at the code however, I'm
> not
> > > sure how that would ever happen since the URL for the coordinator
> always
> > > comes from InetAddress.getLocalHost().getHostAddress() in
> > > org.apache.samza.coordinator.server.HttpServer#getUrl
> > >
> > > Am I off base here?  Because I don't see how this is ever going to work
> > in
> > > scenarios where the AM is on a different node than the containers.
> > >
> > > --
> > > Tommy Becker
> > > Senior Software Engineer
> > >
> > > Digitalsmiths
> > > A TiVo Company
> > >
> > > www.digitalsmiths.com<http://www.digitalsmiths.com>
> > > tobecker@tivo.com<ma...@tivo.com>
> > >
> > > ________________________________
> > >
> > > This email and any attachments may contain confidential and privileged
> > > material for the sole use of the intended recipient. Any review,
> copying,
> > > or distribution of this email (or any attachments) by others is
> > prohibited.
> > > If you are not the intended recipient, please contact the sender
> > > immediately and permanently delete this email and any attachments. No
> > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > > Inc. may only be made by a signed written agreement.
> > >
> >
> > ________________________________
> >
> > This email and any attachments may contain confidential and privileged
> > material for the sole use of the intended recipient. Any review, copying,
> > or distribution of this email (or any attachments) by others is
> prohibited.
> > If you are not the intended recipient, please contact the sender
> > immediately and permanently delete this email and any attachments. No
> > employee or agent of TiVo Inc. is authorized to conclude any binding
> > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > Inc. may only be made by a signed written agreement.
> >
>



-- 
Navina R.

Re: Coordinator URL always 127.0.0.1

Posted by Yi Pan <ni...@gmail.com>.
Hi, Tommy,

Yeah, I agree that the current implementation is not bullet-proof to any
different networking configuration on the host. As for the AM <-> container
communication, if I am not mistaken, it is through the NMClient and the
node HTTP address is wrapped within the Container object returned from RM.
I am not very familiar with that part of source code. Navina may be able to
help more here.

-Yi

On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <to...@tivo.com> wrote:

> Hi Yi,
> Thanks a lot for your reply.  I don't doubt we can get it to work by
> mucking with the networking configuration, but to me this feels like a
> workaround, not a solution.  InetAddress.getLocalHost().getHostAddress() is
> not a reliable way of obtaining an IP that other machines can connect to.
> Just today I tested on several Linux distros and it did not work on any of
> them.  Can we do something more robust here?  How does the container
> communicate status to the AM?
>
> -Tommy
>
> ________________________________________
> From: Yi Pan [nickpan47@gmail.com]
> Sent: Thursday, July 30, 2015 6:48 PM
> To: dev@samza.apache.org
> Subject: Re: Coordinator URL always 127.0.0.1
>
> Hi, Tommy,
>
> I think that it might be a commonly asked question regarding to multiple
> IPs on a single host. A common trick w/o changing code is (copied from SO:
>
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> )
>
> {code}
>
>    1.
>
>    Find your host name. Type: hostname. For example, you find your hostname
>    is mycomputer.xzy.com
>    2.
>
>    Put your host name in your hosts file. /etc/hosts . Such as
>
>    10.50.16.136 mycomputer.xzy.com
>
>
> {code}
>
> -Yi
>
> On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com> wrote:
>
> > We are testing some jobs on a YARN grid and noticed they are often not
> > starting up properly due to being unable to connect to the job
> coordinator.
> > After some investigation it seems as if the jobs are always getting a
> > coordinator URL of http://127.0.0.1:<port>  But my understanding is that
> > the coordinator runs only in the AM, so I'd expect these URLs to more
> often
> > than not be to some other machine.  Looking at the code however, I'm not
> > sure how that would ever happen since the URL for the coordinator always
> > comes from InetAddress.getLocalHost().getHostAddress() in
> > org.apache.samza.coordinator.server.HttpServer#getUrl
> >
> > Am I off base here?  Because I don't see how this is ever going to work
> in
> > scenarios where the AM is on a different node than the containers.
> >
> > --
> > Tommy Becker
> > Senior Software Engineer
> >
> > Digitalsmiths
> > A TiVo Company
> >
> > www.digitalsmiths.com<http://www.digitalsmiths.com>
> > tobecker@tivo.com<ma...@tivo.com>
> >
> > ________________________________
> >
> > This email and any attachments may contain confidential and privileged
> > material for the sole use of the intended recipient. Any review, copying,
> > or distribution of this email (or any attachments) by others is
> prohibited.
> > If you are not the intended recipient, please contact the sender
> > immediately and permanently delete this email and any attachments. No
> > employee or agent of TiVo Inc. is authorized to conclude any binding
> > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > Inc. may only be made by a signed written agreement.
> >
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

RE: Coordinator URL always 127.0.0.1

Posted by Thomas Becker <to...@Tivo.com>.
Hi Yi,
Thanks a lot for your reply.  I don't doubt we can get it to work by mucking with the networking configuration, but to me this feels like a workaround, not a solution.  InetAddress.getLocalHost().getHostAddress() is not a reliable way of obtaining an IP that other machines can connect to.  Just today I tested on several Linux distros and it did not work on any of them.  Can we do something more robust here?  How does the container communicate status to the AM?

-Tommy

________________________________________
From: Yi Pan [nickpan47@gmail.com]
Sent: Thursday, July 30, 2015 6:48 PM
To: dev@samza.apache.org
Subject: Re: Coordinator URL always 127.0.0.1

Hi, Tommy,

I think that it might be a commonly asked question regarding to multiple
IPs on a single host. A common trick w/o changing code is (copied from SO:
http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
)

{code}

   1.

   Find your host name. Type: hostname. For example, you find your hostname
   is mycomputer.xzy.com
   2.

   Put your host name in your hosts file. /etc/hosts . Such as

   10.50.16.136 mycomputer.xzy.com


{code}

-Yi

On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com> wrote:

> We are testing some jobs on a YARN grid and noticed they are often not
> starting up properly due to being unable to connect to the job coordinator.
> After some investigation it seems as if the jobs are always getting a
> coordinator URL of http://127.0.0.1:<port>  But my understanding is that
> the coordinator runs only in the AM, so I'd expect these URLs to more often
> than not be to some other machine.  Looking at the code however, I'm not
> sure how that would ever happen since the URL for the coordinator always
> comes from InetAddress.getLocalHost().getHostAddress() in
> org.apache.samza.coordinator.server.HttpServer#getUrl
>
> Am I off base here?  Because I don't see how this is ever going to work in
> scenarios where the AM is on a different node than the containers.
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com<http://www.digitalsmiths.com>
> tobecker@tivo.com<ma...@tivo.com>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

Re: Coordinator URL always 127.0.0.1

Posted by Yi Pan <ni...@gmail.com>.
Hi, Tommy,

I think that it might be a commonly asked question regarding to multiple
IPs on a single host. A common trick w/o changing code is (copied from SO:
http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
)

{code}

   1.

   Find your host name. Type: hostname. For example, you find your hostname
   is mycomputer.xzy.com
   2.

   Put your host name in your hosts file. /etc/hosts . Such as

   10.50.16.136 mycomputer.xzy.com


{code}

-Yi

On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <to...@tivo.com> wrote:

> We are testing some jobs on a YARN grid and noticed they are often not
> starting up properly due to being unable to connect to the job coordinator.
> After some investigation it seems as if the jobs are always getting a
> coordinator URL of http://127.0.0.1:<port>  But my understanding is that
> the coordinator runs only in the AM, so I'd expect these URLs to more often
> than not be to some other machine.  Looking at the code however, I'm not
> sure how that would ever happen since the URL for the coordinator always
> comes from InetAddress.getLocalHost().getHostAddress() in
> org.apache.samza.coordinator.server.HttpServer#getUrl
>
> Am I off base here?  Because I don't see how this is ever going to work in
> scenarios where the AM is on a different node than the containers.
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com<http://www.digitalsmiths.com>
> tobecker@tivo.com<ma...@tivo.com>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>