You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Bojan Babic <gb...@gmail.com> on 2014/10/17 22:01:52 UTC
Issue with Giraph on multinode cluster
Hi guys,
I'm risking to post issue that has been already issued, but I'll take risk
to be ridiculed :)
I have small hadoop cluster on Digital Ocean (1 master 4 nodes). I was
able to setup cluster and run word count example as well as single node
sample from Quick start.
As I introduce more nodes into play, I get issue where Task Tracker spawns
Child process
hduser@hdnode-2:~# jps
> 13839 TaskTracker
> 13697 DataNode
> 14067 Jps
> 13962 Child
*13961 Child*
that listen on looback interface
Proto Recv-Q Send-Q Local Address Foreign Address State
> User Inode PID/Program name
> tcp 0 0 127.0.0.1:1337 0.0.0.0:*
> LISTEN root 21544925 29912/python
> tcp 0 0 0.0.0.0:50010 0.0.0.0:*
> LISTEN hduser 21691552 13697/java
> tcp 0 0 127.0.0.1:30011 0.0.0.0:*
> LISTEN hduser 21693578 13962/java
> tcp 0 0 0.0.0.0:50075 0.0.0.0:*
> LISTEN hduser 21691554 13697/java
> tcp 0 0 0.0.0.0:50020 0.0.0.0:*
> LISTEN hduser 21691557 13697/java
> tcp 0 0 127.0.0.1:50118 0.0.0.0:*
> LISTEN hduser 21691870 13839/java
> tcp 0 0 0.0.0.0:41640 0.0.0.0:*
> LISTEN hduser 21691296 13697/java
> tcp 0 0 127.0.0.1:31337 0.0.0.0:*
> LISTEN root 20432660 1514/python
> tcp 0 0 0.0.0.0:50060 0.0.0.0:*
> LISTEN hduser 21692144 13839/java
> tcp 0 0 0.0.0.0:http-alt 0.0.0.0:*
> LISTEN root 20431897 1421/python
>
>
> *tcp 0 0 127.0.0.1:30001 <http://127.0.0.1:30001/>
> 0.0.0.0:* LISTEN hduser 21370004 7856/ssh
> tcp 0 0 127.0.0.1:30003 <http://127.0.0.1:30003/>
> 0.0.0.0:* LISTEN hduser 21693562 13961/java *tcp
> 0 0 127.0.0.1:58741 0.0.0.0:* LISTEN
> hduser 21370000 7856/ssh
> tcp 0 0 127.0.0.1:58742 0.0.0.0:*
> LISTEN hduser 21369982 7845/autossh
> tcp 0 0 0.0.0.0:ssh 0.0.0.0:*
> LISTEN root 9130 834/sshd
> tcp6 0 0 ::1:30001 :::*
> LISTEN hduser 21370003 7856/ssh
> tcp6 0 0 ::1:58741 :::*
> LISTEN hduser 21369999 7856/ssh
> tcp6 0 0 :::ssh :::*
> LISTEN root 9165 834/sshd
instead of all interfaces (0.0.0.0)
This results in node being unreachable from other nodes. ie hdnode02:
>
> 2014-10-17 14:10:31,146 WARN org.apache.giraph.comm.netty.NettyClient:
> 2014-10-17 14:10:31,159 WARN org.apache.giraph.comm.netty.NettyClient:
> connectAllAddresses: Future failed to connect with
> hdnode-2/XXX.XXX.XXX.XXX:30003 with 1 failures because of
> java.net.ConnectException: Connection refused:
> *hdnode-2/XXX.XXX.XXX.XXX:30003*
> 2014-10-17 14:10:31,159 INFO org.apache.giraph.comm.netty.NettyClient:
> connectAllAddresses: Successfully added 1 connections, (1 total connected)
> 2 failed, 2 failures total.
If I stop all processes and start nc on 30003, I can telnet to hdnode2.
Question here is if there is any setup that will configure Child process to
listen on 0.0.0.0 instead of loopback interface?
Thanks in advance
Re: Issue with Giraph on multinode cluster
Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
Please create a JIRA and attach your patch to it.
Thanks,
Roman.
On Mon, Oct 20, 2014 at 2:23 AM, Bojan Babic <gb...@gmail.com> wrote:
> I've made a patch that worked for me. Not sure, if I should post JIRA issue.
> In attach, you can find hack.
>
>
>
> On Fri, Oct 17, 2014 at 5:52 PM, Bojan Babic <gb...@gmail.com> wrote:
>>
>> I'm using giraph 1.1.0-SNAPSHOT for hadoop 1.2.1
>>
>> On Fri, Oct 17, 2014 at 4:01 PM, Bojan Babic <gb...@gmail.com> wrote:
>>>
>>> Hi guys,
>>>
>>> I'm risking to post issue that has been already issued, but I'll take
>>> risk to be ridiculed :)
>>>
>>> I have small hadoop cluster on Digital Ocean (1 master 4 nodes). I was
>>> able to setup cluster and run word count example as well as single node
>>> sample from Quick start.
>>>
>>> As I introduce more nodes into play, I get issue where Task Tracker
>>> spawns Child process
>>>
>>>> hduser@hdnode-2:~# jps
>>>> 13839 TaskTracker
>>>> 13697 DataNode
>>>> 14067 Jps
>>>> 13962 Child
>>>>
>>>> 13961 Child
>>>
>>>
>>> that listen on looback interface
>>>
>>>> Proto Recv-Q Send-Q Local Address Foreign Address
>>>> State User Inode PID/Program name
>>>> tcp 0 0 127.0.0.1:1337 0.0.0.0:*
>>>> LISTEN root 21544925 29912/python
>>>> tcp 0 0 0.0.0.0:50010 0.0.0.0:*
>>>> LISTEN hduser 21691552 13697/java
>>>> tcp 0 0 127.0.0.1:30011 0.0.0.0:*
>>>> LISTEN hduser 21693578 13962/java
>>>> tcp 0 0 0.0.0.0:50075 0.0.0.0:*
>>>> LISTEN hduser 21691554 13697/java
>>>> tcp 0 0 0.0.0.0:50020 0.0.0.0:*
>>>> LISTEN hduser 21691557 13697/java
>>>> tcp 0 0 127.0.0.1:50118 0.0.0.0:*
>>>> LISTEN hduser 21691870 13839/java
>>>> tcp 0 0 0.0.0.0:41640 0.0.0.0:*
>>>> LISTEN hduser 21691296 13697/java
>>>> tcp 0 0 127.0.0.1:31337 0.0.0.0:*
>>>> LISTEN root 20432660 1514/python
>>>> tcp 0 0 0.0.0.0:50060 0.0.0.0:*
>>>> LISTEN hduser 21692144 13839/java
>>>> tcp 0 0 0.0.0.0:http-alt 0.0.0.0:*
>>>> LISTEN root 20431897 1421/python
>>>> tcp 0 0 127.0.0.1:30001 0.0.0.0:*
>>>> LISTEN hduser 21370004 7856/ssh
>>>> tcp 0 0 127.0.0.1:30003 0.0.0.0:*
>>>> LISTEN hduser 21693562 13961/java
>>>> tcp 0 0 127.0.0.1:58741 0.0.0.0:*
>>>> LISTEN hduser 21370000 7856/ssh
>>>> tcp 0 0 127.0.0.1:58742 0.0.0.0:*
>>>> LISTEN hduser 21369982 7845/autossh
>>>> tcp 0 0 0.0.0.0:ssh 0.0.0.0:*
>>>> LISTEN root 9130 834/sshd
>>>> tcp6 0 0 ::1:30001 :::*
>>>> LISTEN hduser 21370003 7856/ssh
>>>> tcp6 0 0 ::1:58741 :::*
>>>> LISTEN hduser 21369999 7856/ssh
>>>> tcp6 0 0 :::ssh :::*
>>>> LISTEN root 9165 834/sshd
>>>
>>>
>>> instead of all interfaces (0.0.0.0)
>>>
>>> This results in node being unreachable from other nodes. ie hdnode02:
>>>>
>>>>
>>>> 2014-10-17 14:10:31,146 WARN org.apache.giraph.comm.netty.NettyClient:
>>>> 2014-10-17 14:10:31,159 WARN org.apache.giraph.comm.netty.NettyClient:
>>>> connectAllAddresses: Future failed to connect with
>>>> hdnode-2/XXX.XXX.XXX.XXX:30003 with 1 failures because of
>>>> java.net.ConnectException: Connection refused:
>>>> hdnode-2/XXX.XXX.XXX.XXX:30003
>>>> 2014-10-17 14:10:31,159 INFO org.apache.giraph.comm.netty.NettyClient:
>>>> connectAllAddresses: Successfully added 1 connections, (1 total connected) 2
>>>> failed, 2 failures total.
>>>
>>>
>>> If I stop all processes and start nc on 30003, I can telnet to hdnode2.
>>>
>>> Question here is if there is any setup that will configure Child process
>>> to listen on 0.0.0.0 instead of loopback interface?
>>>
>>> Thanks in advance
>>>
>>
>>
>>
>> --
>> --------------------------------
>> Bojan Babic, M.Sc.E.E
>> Software developer
>> twitter: @bojanbabic
>> mobile: +1312 8602944
>>
>
>
>
> --
> --------------------------------
> Bojan Babic, M.Sc.E.E
> Software developer
> twitter: @bojanbabic
> mobile: +1312 8602944
>
Re: Issue with Giraph on multinode cluster
Posted by Bojan Babic <gb...@gmail.com>.
I've made a patch that worked for me. Not sure, if I should post JIRA
issue. In attach, you can find hack.
On Fri, Oct 17, 2014 at 5:52 PM, Bojan Babic <gb...@gmail.com> wrote:
> I'm using giraph 1.1.0-SNAPSHOT for hadoop 1.2.1
>
> On Fri, Oct 17, 2014 at 4:01 PM, Bojan Babic <gb...@gmail.com> wrote:
>
>> Hi guys,
>>
>> I'm risking to post issue that has been already issued, but I'll take
>> risk to be ridiculed :)
>>
>> I have small hadoop cluster on Digital Ocean (1 master 4 nodes). I was
>> able to setup cluster and run word count example as well as single node
>> sample from Quick start.
>>
>> As I introduce more nodes into play, I get issue where Task Tracker
>> spawns Child process
>>
>> hduser@hdnode-2:~# jps
>>> 13839 TaskTracker
>>> 13697 DataNode
>>> 14067 Jps
>>> 13962 Child
>>
>> *13961 Child*
>>
>>
>> that listen on looback interface
>>
>> Proto Recv-Q Send-Q Local Address Foreign Address State
>>> User Inode PID/Program name
>>> tcp 0 0 127.0.0.1:1337 0.0.0.0:*
>>> LISTEN root 21544925 29912/python
>>> tcp 0 0 0.0.0.0:50010 0.0.0.0:*
>>> LISTEN hduser 21691552 13697/java
>>> tcp 0 0 127.0.0.1:30011 0.0.0.0:*
>>> LISTEN hduser 21693578 13962/java
>>> tcp 0 0 0.0.0.0:50075 0.0.0.0:*
>>> LISTEN hduser 21691554 13697/java
>>> tcp 0 0 0.0.0.0:50020 0.0.0.0:*
>>> LISTEN hduser 21691557 13697/java
>>> tcp 0 0 127.0.0.1:50118 0.0.0.0:*
>>> LISTEN hduser 21691870 13839/java
>>> tcp 0 0 0.0.0.0:41640 0.0.0.0:*
>>> LISTEN hduser 21691296 13697/java
>>> tcp 0 0 127.0.0.1:31337 0.0.0.0:*
>>> LISTEN root 20432660 1514/python
>>> tcp 0 0 0.0.0.0:50060 0.0.0.0:*
>>> LISTEN hduser 21692144 13839/java
>>> tcp 0 0 0.0.0.0:http-alt 0.0.0.0:*
>>> LISTEN root 20431897 1421/python
>>>
>>>
>>> *tcp 0 0 127.0.0.1:30001 <http://127.0.0.1:30001/>
>>> 0.0.0.0:* LISTEN hduser 21370004 7856/ssh
>>> tcp 0 0 127.0.0.1:30003 <http://127.0.0.1:30003/>
>>> 0.0.0.0:* LISTEN hduser 21693562 13961/java *tcp
>>> 0 0 127.0.0.1:58741 0.0.0.0:* LISTEN
>>> hduser 21370000 7856/ssh
>>> tcp 0 0 127.0.0.1:58742 0.0.0.0:*
>>> LISTEN hduser 21369982 7845/autossh
>>> tcp 0 0 0.0.0.0:ssh 0.0.0.0:*
>>> LISTEN root 9130 834/sshd
>>> tcp6 0 0 ::1:30001 :::*
>>> LISTEN hduser 21370003 7856/ssh
>>> tcp6 0 0 ::1:58741 :::*
>>> LISTEN hduser 21369999 7856/ssh
>>> tcp6 0 0 :::ssh :::*
>>> LISTEN root 9165 834/sshd
>>
>>
>> instead of all interfaces (0.0.0.0)
>>
>> This results in node being unreachable from other nodes. ie hdnode02:
>>
>>>
>>> 2014-10-17 14:10:31,146 WARN org.apache.giraph.comm.netty.NettyClient:
>>> 2014-10-17 14:10:31,159 WARN org.apache.giraph.comm.netty.NettyClient:
>>> connectAllAddresses: Future failed to connect with
>>> hdnode-2/XXX.XXX.XXX.XXX:30003 with 1 failures because of
>>> java.net.ConnectException: Connection refused:
>>> *hdnode-2/XXX.XXX.XXX.XXX:30003*
>>> 2014-10-17 14:10:31,159 INFO org.apache.giraph.comm.netty.NettyClient:
>>> connectAllAddresses: Successfully added 1 connections, (1 total connected)
>>> 2 failed, 2 failures total.
>>
>>
>> If I stop all processes and start nc on 30003, I can telnet to hdnode2.
>>
>> Question here is if there is any setup that will configure Child process
>> to listen on 0.0.0.0 instead of loopback interface?
>>
>> Thanks in advance
>>
>>
>
>
> --
> --------------------------------
> Bojan Babic, M.Sc.E.E
> Software developer
> twitter: @bojanbabic
> mobile: +1312 8602944
>
>
--
--------------------------------
Bojan Babic, M.Sc.E.E
Software developer
twitter: @bojanbabic
mobile: +1312 8602944
Re: Issue with Giraph on multinode cluster
Posted by Bojan Babic <gb...@gmail.com>.
I'm using giraph 1.1.0-SNAPSHOT for hadoop 1.2.1
On Fri, Oct 17, 2014 at 4:01 PM, Bojan Babic <gb...@gmail.com> wrote:
> Hi guys,
>
> I'm risking to post issue that has been already issued, but I'll take risk
> to be ridiculed :)
>
> I have small hadoop cluster on Digital Ocean (1 master 4 nodes). I was
> able to setup cluster and run word count example as well as single node
> sample from Quick start.
>
> As I introduce more nodes into play, I get issue where Task Tracker spawns
> Child process
>
> hduser@hdnode-2:~# jps
>> 13839 TaskTracker
>> 13697 DataNode
>> 14067 Jps
>> 13962 Child
>
> *13961 Child*
>
>
> that listen on looback interface
>
> Proto Recv-Q Send-Q Local Address Foreign Address State
>> User Inode PID/Program name
>> tcp 0 0 127.0.0.1:1337 0.0.0.0:*
>> LISTEN root 21544925 29912/python
>> tcp 0 0 0.0.0.0:50010 0.0.0.0:*
>> LISTEN hduser 21691552 13697/java
>> tcp 0 0 127.0.0.1:30011 0.0.0.0:*
>> LISTEN hduser 21693578 13962/java
>> tcp 0 0 0.0.0.0:50075 0.0.0.0:*
>> LISTEN hduser 21691554 13697/java
>> tcp 0 0 0.0.0.0:50020 0.0.0.0:*
>> LISTEN hduser 21691557 13697/java
>> tcp 0 0 127.0.0.1:50118 0.0.0.0:*
>> LISTEN hduser 21691870 13839/java
>> tcp 0 0 0.0.0.0:41640 0.0.0.0:*
>> LISTEN hduser 21691296 13697/java
>> tcp 0 0 127.0.0.1:31337 0.0.0.0:*
>> LISTEN root 20432660 1514/python
>> tcp 0 0 0.0.0.0:50060 0.0.0.0:*
>> LISTEN hduser 21692144 13839/java
>> tcp 0 0 0.0.0.0:http-alt 0.0.0.0:*
>> LISTEN root 20431897 1421/python
>>
>>
>> *tcp 0 0 127.0.0.1:30001 <http://127.0.0.1:30001/>
>> 0.0.0.0:* LISTEN hduser 21370004 7856/ssh
>> tcp 0 0 127.0.0.1:30003 <http://127.0.0.1:30003/>
>> 0.0.0.0:* LISTEN hduser 21693562 13961/java *tcp
>> 0 0 127.0.0.1:58741 0.0.0.0:* LISTEN
>> hduser 21370000 7856/ssh
>> tcp 0 0 127.0.0.1:58742 0.0.0.0:*
>> LISTEN hduser 21369982 7845/autossh
>> tcp 0 0 0.0.0.0:ssh 0.0.0.0:*
>> LISTEN root 9130 834/sshd
>> tcp6 0 0 ::1:30001 :::*
>> LISTEN hduser 21370003 7856/ssh
>> tcp6 0 0 ::1:58741 :::*
>> LISTEN hduser 21369999 7856/ssh
>> tcp6 0 0 :::ssh :::*
>> LISTEN root 9165 834/sshd
>
>
> instead of all interfaces (0.0.0.0)
>
> This results in node being unreachable from other nodes. ie hdnode02:
>
>>
>> 2014-10-17 14:10:31,146 WARN org.apache.giraph.comm.netty.NettyClient:
>> 2014-10-17 14:10:31,159 WARN org.apache.giraph.comm.netty.NettyClient:
>> connectAllAddresses: Future failed to connect with
>> hdnode-2/XXX.XXX.XXX.XXX:30003 with 1 failures because of
>> java.net.ConnectException: Connection refused:
>> *hdnode-2/XXX.XXX.XXX.XXX:30003*
>> 2014-10-17 14:10:31,159 INFO org.apache.giraph.comm.netty.NettyClient:
>> connectAllAddresses: Successfully added 1 connections, (1 total connected)
>> 2 failed, 2 failures total.
>
>
> If I stop all processes and start nc on 30003, I can telnet to hdnode2.
>
> Question here is if there is any setup that will configure Child process
> to listen on 0.0.0.0 instead of loopback interface?
>
> Thanks in advance
>
>
--
--------------------------------
Bojan Babic, M.Sc.E.E
Software developer
twitter: @bojanbabic
mobile: +1312 8602944