You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Johannes Zillmann <jz...@googlemail.com> on 2014/08/11 16:43:14 UTC

Problems accessing tez on ec2

Hey guys,

having a test-infrastructure for Hadoop on ec2. The client sits usually outside of ec2.
Using plain map-reduce on YARN everything works fine.
Using Tez i run into following exception:

INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2] (TezClient.java:507) - Failed to retrieve AM Status via proxy
com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: Call From ip-10-73-6-154.ec2.internal/10.73.6.154 to ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed out; For more details see:  http:
//wiki.apache.org/hadoop/SocketTimeout

        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
        at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
        at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
        at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)


I could resolve the problem for Tez changing the hostname of the instances to their public dns’. However, that is causing problems with other components.
Do you know of any place in Tez which is related to that ? Any tweak which could make chaining the hostname superfluous ?

best
Johannes

Re: Problems accessing tez on ec2

Posted by Johannes Zillmann <jz...@googlemail.com>.
Its https://issues.apache.org/jira/browse/TEZ-1422

Johannes

On 14 Aug 2014, at 11:53, Siddharth Seth <ss...@apache.org> wrote:

> Please create a jira to at least change the call to make use of NetUtils.
> 
> 
> Beyond this, I believe this is a use case which Tez should support.
> 
> 
> On Thu, Aug 14, 2014 at 12:59 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> Hey Sid,
> 
> On 13 Aug 2014, at 23:18, Siddharth Seth <ss...@apache.org> wrote:
> 
> > Johannes,
> > NetUtils.addStaticResolution() would be invoked in your client code, correct ? I'm not sure Tez can do anything useful with this, unless we add some other parameters which can be configured by users.
> 
> yes i would invoke that independently of Tez.
> 
> 
> >
> > Setting "hadoop.rpc.socket.factory.class.default" should take affect in Tez as well, as long as it's set in the configuration that is used to create the TezClient.
> Didn’t look like this is having an effect. Guess because the ip-resolving takes place before creating any socket.
> 
> > I'm not sure why   "yarn.ipc.client.factory.class"  needs to be changed, but that isn't relevant to Tez in any case.
> > Does the custom socket factory that you use take care of the static mapping ?
> 
> Yes. When creating a socket to an address it checks a map for the address translation.
> 
> Johannes
> 
> >
> >
> >
> > On Wed, Aug 13, 2014 at 4:56 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> > Hey Sid,
> >
> > as a background, for map-reduce we’re configuring
> >         yarn.ipc.client.factory.class
> >         hadoop.rpc.socket.factory.class.default
> > to an homegrown socket-factory which does the translation between ec2 internal to external addresses.
> >
> > For Tez this does not seem to have any effect, even if i’m using NetUtils.createSocketAddrForHost() like you suggested.
> > However, using NetUtils.createSocketAddrForHost() would allow to add the translation between the addresses through NetUtils.addStaticResolution() as i figured out. So doing this change would help.
> > Should i create a ticket for that ?
> >
> > Johannes
> >
> >
> > On 13 Aug 2014, at 10:30, Siddharth Seth <ss...@apache.org> wrote:
> >
> > > Johannes
> > > Getting the client to pick the correct IP to use is a little tricky. Hadoop itself has some utilities for this, which we could try using. Could you open a jira for this please ?, and we'll need some help trying it out.
> > >
> > > If you're building the Tez code base locally - could you try the following change
> > >
> > > TezClientUtils:820
> > > Replace
> > > final InetSocketAddress serviceAddr = new InetSocketAddress(amHost, amRpcPort);
> > > with
> > > final InetSocketAddress serviceAddr = NetUtils.createSocketAddrForHost(amHost, amRpcPort);
> > >
> > >
> > > On Wed, Aug 13, 2014 at 12:58 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> > > Hey Hitesh,
> > >
> > > so without chaining the hostname of the ec2 instances to their public dns the log looks like:
> > >         2014-08-13 03:53:15,310 INFO [ServiceThread:DAGClientRPCServer] org.apache.tez.dag.api.client.DAGClientServer: Instantiated DAGClientRPCServer at domU-12-31-39-0F-30-03/10.193.51.241:31000
> > >         2014-08-13 03:53:15,332 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> > >         2014-08-13 03:53:15,336 INFO [IPC Server listener on 50192] org.apache.hadoop.ipc.Server: IPC Server listener on 50192: starting
> > >
> > > and the exception on the client is then:
> > > com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "domU-12-31-39-0F-30-03":31000; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
> > >         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> > >         at com.sun.proxy.$Proxy37.getAMStatus(Unknown Source)
> > >         at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:503)
> > >         at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:576)
> > >
> > > HTH
> > > Johannes
> > >
> > > On 12 Aug 2014, at 22:05, Hitesh Shah <hi...@apache.org> wrote:
> > >
> > > > It seems like the AM is binding to the external/public hostname and not the internal IP.
> > > >
> > > > Could you look for this log message in the AM logs: "Instantiated DAGClientRPCServer at”. This will provide some information as to what the AM is binding to.
> > > >
> > > > thanks
> > > > — Hitesh
> > > >
> > > > On Aug 11, 2014, at 7:43 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> > > >
> > > >> Hey guys,
> > > >>
> > > >> having a test-infrastructure for Hadoop on ec2. The client sits usually outside of ec2.
> > > >> Using plain map-reduce on YARN everything works fine.
> > > >> Using Tez i run into following exception:
> > > >>
> > > >> INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2] (TezClient.java:507) - Failed to retrieve AM Status via proxy
> > > >> com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: Call From ip-10-73-6-154.ec2.internal/10.73.6.154 to ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed out; For more details see:  http:
> > > >> //wiki.apache.org/hadoop/SocketTimeout
> > > >>
> > > >>       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> > > >>       at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
> > > >>       at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
> > > >>       at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)
> > > >>
> > > >>
> > > >> I could resolve the problem for Tez changing the hostname of the instances to their public dns’. However, that is causing problems with other components.
> > > >> Do you know of any place in Tez which is related to that ? Any tweak which could make chaining the hostname superfluous ?
> > > >>
> > > >> best
> > > >> Johannes
> > > >
> > >
> > >
> >
> >
> 
> 


Re: Problems accessing tez on ec2

Posted by Siddharth Seth <ss...@apache.org>.
Please create a jira to at least change the call to make use of NetUtils.


Beyond this, I believe this is a use case which Tez should support.


On Thu, Aug 14, 2014 at 12:59 AM, Johannes Zillmann <
jzillmann@googlemail.com> wrote:

> Hey Sid,
>
> On 13 Aug 2014, at 23:18, Siddharth Seth <ss...@apache.org> wrote:
>
> > Johannes,
> > NetUtils.addStaticResolution() would be invoked in your client code,
> correct ? I'm not sure Tez can do anything useful with this, unless we add
> some other parameters which can be configured by users.
>
> yes i would invoke that independently of Tez.
>
>
> >
> > Setting "hadoop.rpc.socket.factory.class.default" should take affect in
> Tez as well, as long as it's set in the configuration that is used to
> create the TezClient.
> Didn’t look like this is having an effect. Guess because the ip-resolving
> takes place before creating any socket.
>
> > I'm not sure why   "yarn.ipc.client.factory.class"  needs to be changed,
> but that isn't relevant to Tez in any case.
> > Does the custom socket factory that you use take care of the static
> mapping ?
>
> Yes. When creating a socket to an address it checks a map for the address
> translation.
>
> Johannes
>
> >
> >
> >
> > On Wed, Aug 13, 2014 at 4:56 AM, Johannes Zillmann <
> jzillmann@googlemail.com> wrote:
> > Hey Sid,
> >
> > as a background, for map-reduce we’re configuring
> >         yarn.ipc.client.factory.class
> >         hadoop.rpc.socket.factory.class.default
> > to an homegrown socket-factory which does the translation between ec2
> internal to external addresses.
> >
> > For Tez this does not seem to have any effect, even if i’m using
> NetUtils.createSocketAddrForHost() like you suggested.
> > However, using NetUtils.createSocketAddrForHost() would allow to add the
> translation between the addresses through NetUtils.addStaticResolution() as
> i figured out. So doing this change would help.
> > Should i create a ticket for that ?
> >
> > Johannes
> >
> >
> > On 13 Aug 2014, at 10:30, Siddharth Seth <ss...@apache.org> wrote:
> >
> > > Johannes
> > > Getting the client to pick the correct IP to use is a little tricky.
> Hadoop itself has some utilities for this, which we could try using. Could
> you open a jira for this please ?, and we'll need some help trying it out.
> > >
> > > If you're building the Tez code base locally - could you try the
> following change
> > >
> > > TezClientUtils:820
> > > Replace
> > > final InetSocketAddress serviceAddr = new InetSocketAddress(amHost,
> amRpcPort);
> > > with
> > > final InetSocketAddress serviceAddr =
> NetUtils.createSocketAddrForHost(amHost, amRpcPort);
> > >
> > >
> > > On Wed, Aug 13, 2014 at 12:58 AM, Johannes Zillmann <
> jzillmann@googlemail.com> wrote:
> > > Hey Hitesh,
> > >
> > > so without chaining the hostname of the ec2 instances to their public
> dns the log looks like:
> > >         2014-08-13 03:53:15,310 INFO
> [ServiceThread:DAGClientRPCServer]
> org.apache.tez.dag.api.client.DAGClientServer: Instantiated
> DAGClientRPCServer at domU-12-31-39-0F-30-03/10.193.51.241:31000
> > >         2014-08-13 03:53:15,332 INFO [IPC Server Responder]
> org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> > >         2014-08-13 03:53:15,336 INFO [IPC Server listener on 50192]
> org.apache.hadoop.ipc.Server: IPC Server listener on 50192: starting
> > >
> > > and the exception on the client is then:
> > > com.google.protobuf.ServiceException: java.net.UnknownHostException:
> Invalid host name: local host is: (unknown); destination host is:
> "domU-12-31-39-0F-30-03":31000; java.net.UnknownHostException; For more
> details see:  http://wiki.apache.org/hadoop/UnknownHost
> > >         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> > >         at com.sun.proxy.$Proxy37.getAMStatus(Unknown Source)
> > >         at
> org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:503)
> > >         at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:576)
> > >
> > > HTH
> > > Johannes
> > >
> > > On 12 Aug 2014, at 22:05, Hitesh Shah <hi...@apache.org> wrote:
> > >
> > > > It seems like the AM is binding to the external/public hostname and
> not the internal IP.
> > > >
> > > > Could you look for this log message in the AM logs: "Instantiated
> DAGClientRPCServer at”. This will provide some information as to what the
> AM is binding to.
> > > >
> > > > thanks
> > > > — Hitesh
> > > >
> > > > On Aug 11, 2014, at 7:43 AM, Johannes Zillmann <
> jzillmann@googlemail.com> wrote:
> > > >
> > > >> Hey guys,
> > > >>
> > > >> having a test-infrastructure for Hadoop on ec2. The client sits
> usually outside of ec2.
> > > >> Using plain map-reduce on YARN everything works fine.
> > > >> Using Tez i run into following exception:
> > > >>
> > > >> INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2]
> (TezClient.java:507) - Failed to retrieve AM Status via proxy
> > > >> com.google.protobuf.ServiceException:
> org.apache.hadoop.net.ConnectTimeoutException: Call From
> ip-10-73-6-154.ec2.internal/10.73.6.154 to
> ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout
> exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed
> out; For more details see:  http:
> > > >> //wiki.apache.org/hadoop/SocketTimeout
> > > >>
> > > >>       at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> > > >>       at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
> > > >>       at
> org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
> > > >>       at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)
> > > >>
> > > >>
> > > >> I could resolve the problem for Tez changing the hostname of the
> instances to their public dns’. However, that is causing problems with
> other components.
> > > >> Do you know of any place in Tez which is related to that ? Any
> tweak which could make chaining the hostname superfluous ?
> > > >>
> > > >> best
> > > >> Johannes
> > > >
> > >
> > >
> >
> >
>
>

Re: Problems accessing tez on ec2

Posted by Johannes Zillmann <jz...@googlemail.com>.
Hey Sid,

On 13 Aug 2014, at 23:18, Siddharth Seth <ss...@apache.org> wrote:

> Johannes,
> NetUtils.addStaticResolution() would be invoked in your client code, correct ? I'm not sure Tez can do anything useful with this, unless we add some other parameters which can be configured by users.

yes i would invoke that independently of Tez.


> 
> Setting "hadoop.rpc.socket.factory.class.default" should take affect in Tez as well, as long as it's set in the configuration that is used to create the TezClient.
Didn’t look like this is having an effect. Guess because the ip-resolving takes place before creating any socket.

> I'm not sure why   "yarn.ipc.client.factory.class"  needs to be changed, but that isn't relevant to Tez in any case.
> Does the custom socket factory that you use take care of the static mapping ?

Yes. When creating a socket to an address it checks a map for the address translation.

Johannes

> 
> 
> 
> On Wed, Aug 13, 2014 at 4:56 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> Hey Sid,
> 
> as a background, for map-reduce we’re configuring
>         yarn.ipc.client.factory.class
>         hadoop.rpc.socket.factory.class.default
> to an homegrown socket-factory which does the translation between ec2 internal to external addresses.
> 
> For Tez this does not seem to have any effect, even if i’m using NetUtils.createSocketAddrForHost() like you suggested.
> However, using NetUtils.createSocketAddrForHost() would allow to add the translation between the addresses through NetUtils.addStaticResolution() as i figured out. So doing this change would help.
> Should i create a ticket for that ?
> 
> Johannes
> 
> 
> On 13 Aug 2014, at 10:30, Siddharth Seth <ss...@apache.org> wrote:
> 
> > Johannes
> > Getting the client to pick the correct IP to use is a little tricky. Hadoop itself has some utilities for this, which we could try using. Could you open a jira for this please ?, and we'll need some help trying it out.
> >
> > If you're building the Tez code base locally - could you try the following change
> >
> > TezClientUtils:820
> > Replace
> > final InetSocketAddress serviceAddr = new InetSocketAddress(amHost, amRpcPort);
> > with
> > final InetSocketAddress serviceAddr = NetUtils.createSocketAddrForHost(amHost, amRpcPort);
> >
> >
> > On Wed, Aug 13, 2014 at 12:58 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> > Hey Hitesh,
> >
> > so without chaining the hostname of the ec2 instances to their public dns the log looks like:
> >         2014-08-13 03:53:15,310 INFO [ServiceThread:DAGClientRPCServer] org.apache.tez.dag.api.client.DAGClientServer: Instantiated DAGClientRPCServer at domU-12-31-39-0F-30-03/10.193.51.241:31000
> >         2014-08-13 03:53:15,332 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> >         2014-08-13 03:53:15,336 INFO [IPC Server listener on 50192] org.apache.hadoop.ipc.Server: IPC Server listener on 50192: starting
> >
> > and the exception on the client is then:
> > com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "domU-12-31-39-0F-30-03":31000; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
> >         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> >         at com.sun.proxy.$Proxy37.getAMStatus(Unknown Source)
> >         at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:503)
> >         at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:576)
> >
> > HTH
> > Johannes
> >
> > On 12 Aug 2014, at 22:05, Hitesh Shah <hi...@apache.org> wrote:
> >
> > > It seems like the AM is binding to the external/public hostname and not the internal IP.
> > >
> > > Could you look for this log message in the AM logs: "Instantiated DAGClientRPCServer at”. This will provide some information as to what the AM is binding to.
> > >
> > > thanks
> > > — Hitesh
> > >
> > > On Aug 11, 2014, at 7:43 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> > >
> > >> Hey guys,
> > >>
> > >> having a test-infrastructure for Hadoop on ec2. The client sits usually outside of ec2.
> > >> Using plain map-reduce on YARN everything works fine.
> > >> Using Tez i run into following exception:
> > >>
> > >> INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2] (TezClient.java:507) - Failed to retrieve AM Status via proxy
> > >> com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: Call From ip-10-73-6-154.ec2.internal/10.73.6.154 to ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed out; For more details see:  http:
> > >> //wiki.apache.org/hadoop/SocketTimeout
> > >>
> > >>       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> > >>       at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
> > >>       at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
> > >>       at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)
> > >>
> > >>
> > >> I could resolve the problem for Tez changing the hostname of the instances to their public dns’. However, that is causing problems with other components.
> > >> Do you know of any place in Tez which is related to that ? Any tweak which could make chaining the hostname superfluous ?
> > >>
> > >> best
> > >> Johannes
> > >
> >
> >
> 
> 


Re: Problems accessing tez on ec2

Posted by Siddharth Seth <ss...@apache.org>.
Johannes,
NetUtils.addStaticResolution() would be invoked in your client code,
correct ? I'm not sure Tez can do anything useful with this, unless we add
some other parameters which can be configured by users.

Setting "hadoop.rpc.socket.factory.class.default" should take affect in Tez
as well, as long as it's set in the configuration that is used to create
the TezClient. I'm not sure why   "yarn.ipc.client.factory.class"  needs to
be changed, but that isn't relevant to Tez in any case.
Does the custom socket factory that you use take care of the static mapping
?



On Wed, Aug 13, 2014 at 4:56 AM, Johannes Zillmann <jzillmann@googlemail.com
> wrote:

> Hey Sid,
>
> as a background, for map-reduce we’re configuring
>         yarn.ipc.client.factory.class
>         hadoop.rpc.socket.factory.class.default
> to an homegrown socket-factory which does the translation between ec2
> internal to external addresses.
>
> For Tez this does not seem to have any effect, even if i’m using
> NetUtils.createSocketAddrForHost() like you suggested.
> However, using NetUtils.createSocketAddrForHost() would allow to add the
> translation between the addresses through NetUtils.addStaticResolution() as
> i figured out. So doing this change would help.
> Should i create a ticket for that ?
>
> Johannes
>
>
> On 13 Aug 2014, at 10:30, Siddharth Seth <ss...@apache.org> wrote:
>
> > Johannes
> > Getting the client to pick the correct IP to use is a little tricky.
> Hadoop itself has some utilities for this, which we could try using. Could
> you open a jira for this please ?, and we'll need some help trying it out.
> >
> > If you're building the Tez code base locally - could you try the
> following change
> >
> > TezClientUtils:820
> > Replace
> > final InetSocketAddress serviceAddr = new InetSocketAddress(amHost,
> amRpcPort);
> > with
> > final InetSocketAddress serviceAddr =
> NetUtils.createSocketAddrForHost(amHost, amRpcPort);
> >
> >
> > On Wed, Aug 13, 2014 at 12:58 AM, Johannes Zillmann <
> jzillmann@googlemail.com> wrote:
> > Hey Hitesh,
> >
> > so without chaining the hostname of the ec2 instances to their public
> dns the log looks like:
> >         2014-08-13 03:53:15,310 INFO [ServiceThread:DAGClientRPCServer]
> org.apache.tez.dag.api.client.DAGClientServer: Instantiated
> DAGClientRPCServer at domU-12-31-39-0F-30-03/10.193.51.241:31000
> >         2014-08-13 03:53:15,332 INFO [IPC Server Responder]
> org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> >         2014-08-13 03:53:15,336 INFO [IPC Server listener on 50192]
> org.apache.hadoop.ipc.Server: IPC Server listener on 50192: starting
> >
> > and the exception on the client is then:
> > com.google.protobuf.ServiceException: java.net.UnknownHostException:
> Invalid host name: local host is: (unknown); destination host is:
> "domU-12-31-39-0F-30-03":31000; java.net.UnknownHostException; For more
> details see:  http://wiki.apache.org/hadoop/UnknownHost
> >         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> >         at com.sun.proxy.$Proxy37.getAMStatus(Unknown Source)
> >         at
> org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:503)
> >         at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:576)
> >
> > HTH
> > Johannes
> >
> > On 12 Aug 2014, at 22:05, Hitesh Shah <hi...@apache.org> wrote:
> >
> > > It seems like the AM is binding to the external/public hostname and
> not the internal IP.
> > >
> > > Could you look for this log message in the AM logs: "Instantiated
> DAGClientRPCServer at”. This will provide some information as to what the
> AM is binding to.
> > >
> > > thanks
> > > — Hitesh
> > >
> > > On Aug 11, 2014, at 7:43 AM, Johannes Zillmann <
> jzillmann@googlemail.com> wrote:
> > >
> > >> Hey guys,
> > >>
> > >> having a test-infrastructure for Hadoop on ec2. The client sits
> usually outside of ec2.
> > >> Using plain map-reduce on YARN everything works fine.
> > >> Using Tez i run into following exception:
> > >>
> > >> INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2] (TezClient.java:507)
> - Failed to retrieve AM Status via proxy
> > >> com.google.protobuf.ServiceException:
> org.apache.hadoop.net.ConnectTimeoutException: Call From
> ip-10-73-6-154.ec2.internal/10.73.6.154 to
> ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout
> exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed
> out; For more details see:  http:
> > >> //wiki.apache.org/hadoop/SocketTimeout
> > >>
> > >>       at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> > >>       at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
> > >>       at
> org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
> > >>       at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)
> > >>
> > >>
> > >> I could resolve the problem for Tez changing the hostname of the
> instances to their public dns’. However, that is causing problems with
> other components.
> > >> Do you know of any place in Tez which is related to that ? Any tweak
> which could make chaining the hostname superfluous ?
> > >>
> > >> best
> > >> Johannes
> > >
> >
> >
>
>

Re: Problems accessing tez on ec2

Posted by Johannes Zillmann <jz...@googlemail.com>.
Hey Sid,

as a background, for map-reduce we’re configuring
	yarn.ipc.client.factory.class
	hadoop.rpc.socket.factory.class.default
to an homegrown socket-factory which does the translation between ec2 internal to external addresses.

For Tez this does not seem to have any effect, even if i’m using NetUtils.createSocketAddrForHost() like you suggested.
However, using NetUtils.createSocketAddrForHost() would allow to add the translation between the addresses through NetUtils.addStaticResolution() as i figured out. So doing this change would help.
Should i create a ticket for that ?

Johannes


On 13 Aug 2014, at 10:30, Siddharth Seth <ss...@apache.org> wrote:

> Johannes
> Getting the client to pick the correct IP to use is a little tricky. Hadoop itself has some utilities for this, which we could try using. Could you open a jira for this please ?, and we'll need some help trying it out.
> 
> If you're building the Tez code base locally - could you try the following change
> 
> TezClientUtils:820
> Replace
> final InetSocketAddress serviceAddr = new InetSocketAddress(amHost, amRpcPort);
> with 
> final InetSocketAddress serviceAddr = NetUtils.createSocketAddrForHost(amHost, amRpcPort);
> 
> 
> On Wed, Aug 13, 2014 at 12:58 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> Hey Hitesh,
> 
> so without chaining the hostname of the ec2 instances to their public dns the log looks like:
>         2014-08-13 03:53:15,310 INFO [ServiceThread:DAGClientRPCServer] org.apache.tez.dag.api.client.DAGClientServer: Instantiated DAGClientRPCServer at domU-12-31-39-0F-30-03/10.193.51.241:31000
>         2014-08-13 03:53:15,332 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
>         2014-08-13 03:53:15,336 INFO [IPC Server listener on 50192] org.apache.hadoop.ipc.Server: IPC Server listener on 50192: starting
> 
> and the exception on the client is then:
> com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "domU-12-31-39-0F-30-03":31000; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
>         at com.sun.proxy.$Proxy37.getAMStatus(Unknown Source)
>         at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:503)
>         at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:576)
> 
> HTH
> Johannes
> 
> On 12 Aug 2014, at 22:05, Hitesh Shah <hi...@apache.org> wrote:
> 
> > It seems like the AM is binding to the external/public hostname and not the internal IP.
> >
> > Could you look for this log message in the AM logs: "Instantiated DAGClientRPCServer at”. This will provide some information as to what the AM is binding to.
> >
> > thanks
> > — Hitesh
> >
> > On Aug 11, 2014, at 7:43 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> >
> >> Hey guys,
> >>
> >> having a test-infrastructure for Hadoop on ec2. The client sits usually outside of ec2.
> >> Using plain map-reduce on YARN everything works fine.
> >> Using Tez i run into following exception:
> >>
> >> INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2] (TezClient.java:507) - Failed to retrieve AM Status via proxy
> >> com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: Call From ip-10-73-6-154.ec2.internal/10.73.6.154 to ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed out; For more details see:  http:
> >> //wiki.apache.org/hadoop/SocketTimeout
> >>
> >>       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> >>       at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
> >>       at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
> >>       at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)
> >>
> >>
> >> I could resolve the problem for Tez changing the hostname of the instances to their public dns’. However, that is causing problems with other components.
> >> Do you know of any place in Tez which is related to that ? Any tweak which could make chaining the hostname superfluous ?
> >>
> >> best
> >> Johannes
> >
> 
> 


Re: Problems accessing tez on ec2

Posted by Siddharth Seth <ss...@apache.org>.
Johannes
Getting the client to pick the correct IP to use is a little tricky. Hadoop
itself has some utilities for this, which we could try using. Could you
open a jira for this please ?, and we'll need some help trying it out.

If you're building the Tez code base locally - could you try the following
change

TezClientUtils:820
Replace
final InetSocketAddress serviceAddr = new InetSocketAddress(amHost,
amRpcPort);
with
final InetSocketAddress serviceAddr =
NetUtils.createSocketAddrForHost(amHost, amRpcPort);


On Wed, Aug 13, 2014 at 12:58 AM, Johannes Zillmann <
jzillmann@googlemail.com> wrote:

> Hey Hitesh,
>
> so without chaining the hostname of the ec2 instances to their public dns
> the log looks like:
>         2014-08-13 03:53:15,310 INFO [ServiceThread:DAGClientRPCServer]
> org.apache.tez.dag.api.client.DAGClientServer: Instantiated
> DAGClientRPCServer at domU-12-31-39-0F-30-03/10.193.51.241:31000
>         2014-08-13 03:53:15,332 INFO [IPC Server Responder]
> org.apache.hadoop.ipc.Server: IPC Server Responder: starting
>         2014-08-13 03:53:15,336 INFO [IPC Server listener on 50192]
> org.apache.hadoop.ipc.Server: IPC Server listener on 50192: starting
>
> and the exception on the client is then:
> com.google.protobuf.ServiceException: java.net.UnknownHostException:
> Invalid host name: local host is: (unknown); destination host is:
> "domU-12-31-39-0F-30-03":31000; java.net.UnknownHostException; For more
> details see:  http://wiki.apache.org/hadoop/UnknownHost
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
>         at com.sun.proxy.$Proxy37.getAMStatus(Unknown Source)
>         at
> org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:503)
>         at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:576)
>
> HTH
> Johannes
>
> On 12 Aug 2014, at 22:05, Hitesh Shah <hi...@apache.org> wrote:
>
> > It seems like the AM is binding to the external/public hostname and not
> the internal IP.
> >
> > Could you look for this log message in the AM logs: "Instantiated
> DAGClientRPCServer at”. This will provide some information as to what the
> AM is binding to.
> >
> > thanks
> > — Hitesh
> >
> > On Aug 11, 2014, at 7:43 AM, Johannes Zillmann <jz...@googlemail.com>
> wrote:
> >
> >> Hey guys,
> >>
> >> having a test-infrastructure for Hadoop on ec2. The client sits usually
> outside of ec2.
> >> Using plain map-reduce on YARN everything works fine.
> >> Using Tez i run into following exception:
> >>
> >> INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2] (TezClient.java:507) -
> Failed to retrieve AM Status via proxy
> >> com.google.protobuf.ServiceException:
> org.apache.hadoop.net.ConnectTimeoutException: Call From
> ip-10-73-6-154.ec2.internal/10.73.6.154 to
> ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout
> exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed
> out; For more details see:  http:
> >> //wiki.apache.org/hadoop/SocketTimeout
> >>
> >>       at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
> >>       at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
> >>       at
> org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
> >>       at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)
> >>
> >>
> >> I could resolve the problem for Tez changing the hostname of the
> instances to their public dns’. However, that is causing problems with
> other components.
> >> Do you know of any place in Tez which is related to that ? Any tweak
> which could make chaining the hostname superfluous ?
> >>
> >> best
> >> Johannes
> >
>
>

Re: Problems accessing tez on ec2

Posted by Johannes Zillmann <jz...@googlemail.com>.
Hey Hitesh,

so without chaining the hostname of the ec2 instances to their public dns the log looks like:
	2014-08-13 03:53:15,310 INFO [ServiceThread:DAGClientRPCServer] org.apache.tez.dag.api.client.DAGClientServer: Instantiated DAGClientRPCServer at domU-12-31-39-0F-30-03/10.193.51.241:31000
	2014-08-13 03:53:15,332 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
	2014-08-13 03:53:15,336 INFO [IPC Server listener on 50192] org.apache.hadoop.ipc.Server: IPC Server listener on 50192: starting

and the exception on the client is then:
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "domU-12-31-39-0F-30-03":31000; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
	at com.sun.proxy.$Proxy37.getAMStatus(Unknown Source)
	at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:503)
	at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:576)

HTH
Johannes

On 12 Aug 2014, at 22:05, Hitesh Shah <hi...@apache.org> wrote:

> It seems like the AM is binding to the external/public hostname and not the internal IP.
> 
> Could you look for this log message in the AM logs: "Instantiated DAGClientRPCServer at”. This will provide some information as to what the AM is binding to. 
> 
> thanks
> — Hitesh
> 
> On Aug 11, 2014, at 7:43 AM, Johannes Zillmann <jz...@googlemail.com> wrote:
> 
>> Hey guys,
>> 
>> having a test-infrastructure for Hadoop on ec2. The client sits usually outside of ec2.
>> Using plain map-reduce on YARN everything works fine.
>> Using Tez i run into following exception:
>> 
>> INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2] (TezClient.java:507) - Failed to retrieve AM Status via proxy
>> com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: Call From ip-10-73-6-154.ec2.internal/10.73.6.154 to ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed out; For more details see:  http:
>> //wiki.apache.org/hadoop/SocketTimeout
>> 
>>       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
>>       at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
>>       at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
>>       at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)
>> 
>> 
>> I could resolve the problem for Tez changing the hostname of the instances to their public dns’. However, that is causing problems with other components.
>> Do you know of any place in Tez which is related to that ? Any tweak which could make chaining the hostname superfluous ?
>> 
>> best
>> Johannes
> 


Re: Problems accessing tez on ec2

Posted by Hitesh Shah <hi...@apache.org>.
It seems like the AM is binding to the external/public hostname and not the internal IP.

Could you look for this log message in the AM logs: "Instantiated DAGClientRPCServer at”. This will provide some information as to what the AM is binding to. 
 
thanks
— Hitesh

On Aug 11, 2014, at 7:43 AM, Johannes Zillmann <jz...@googlemail.com> wrote:

> Hey guys,
> 
> having a test-infrastructure for Hadoop on ec2. The client sits usually outside of ec2.
> Using plain map-reduce on YARN everything works fine.
> Using Tez i run into following exception:
> 
> INFO [2014-07-29 00:09:06.653] [MrPlanRunnerV2] (TezClient.java:507) - Failed to retrieve AM Status via proxy
> com.google.protobuf.ServiceException: org.apache.hadoop.net.ConnectTimeoutException: Call From ip-10-73-6-154.ec2.internal/10.73.6.154 to ec2-54-81-245-144.compute-1.amazonaws.com:60914 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: connect timed out; For more details see:  http:
> //wiki.apache.org/hadoop/SocketTimeout
> 
>        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
>        at com.sun.proxy.$Proxy116.getAMStatus(Unknown Source)
>        at org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:500)
>        at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:586)
> 
> 
> I could resolve the problem for Tez changing the hostname of the instances to their public dns’. However, that is causing problems with other components.
> Do you know of any place in Tez which is related to that ? Any tweak which could make chaining the hostname superfluous ?
> 
> best
> Johannes