You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Jonathan Hsieh <jo...@cloudera.com> on 2011/08/05 00:00:19 UTC

Fwd: Multiple Flume Masters on EC2

Forwarding direct response to flume-user@incubator.apache.org and
cdh-user@cloudera.org.

---------- Forwarded message ----------
From: flume collector <fl...@collector.org>
Date: Wed, Aug 3, 2011 at 5:56 AM
Subject: Multiple Flume Masters on EC2
To: jon@cloudera.com


Jon,

Thanks for replying to
https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/6a8c44a782518a88#.
I wasn't able to reply in the google group since this email is not a google
account.
Just to follow up though, I added the internal ip address and there are/were
no extra spaces in the flume.master.servers:

   <property>
    <name>flume.master.servers</name>
    <value>10.192.122.191,10.254.23.16,10.2.31.65</value>
    <description>This is the address for the config servers status
    server (http) </description>
  </property>

Also, each master has it's own master server id.  ex:

  <property>
    <name>flume.master.serverid</name>
    <value>2</value>
    <description>The unique identifier for a machine in a
      Flume Master ensemble. Must be different on every
      master instance.</description>
  </property>

Not sure how ec2 sets up their machines, there may be multiple nic's on
them.

Below are the errors I get with the ip addresses as the values within
flume.master.servers:

Thanks,
-Steve


Master 1:

java.net.SocketTimeoutException
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
    at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
    at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
    at
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
    at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2011-08-03 08:33:07,060 INFO
org.apache.zookeeper.server.quorum.FastLeaderElection: Notification time
out: 400
2011-08-03 08:33:08,893 INFO org.apache.zookeeper.server.NIOServerCnxn:
Accepted socket connection from /10.192.122.191:49107
2011-08-03 08:33:08,893 INFO org.apache.zookeeper.server.NIOServerCnxn:
Processing stat command from /10.192.122.191:49107
2011-08-03 08:33:08,894 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /10.192.122.191:49107 (no session
established for client)
2011-08-03 08:33:10,896 INFO org.apache.zookeeper.server.NIOServerCnxn:
Accepted socket connection from /10.192.122.191:49108
2011-08-03 08:33:10,897 INFO org.apache.zookeeper.server.NIOServerCnxn:
Processing stat command from /10.192.122.191:49108
2011-08-03 08:33:10,897 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /10.192.122.191:49108 (no session
established for client)
2011-08-03 08:33:12,060 WARN
org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to
2 at election address /10.2.31.65:3183



 Master 2:

557 [main] INFO com.cloudera.flume.master.FlumeMaster - Inferred master
server index 1
607 [main] INFO com.cloudera.flume.master.FlumeMaster - Distributed master,
disabling all config translations
618 [main] INFO com.cloudera.flume.master.FlumeMaster - Starting flume
master on: domU-12-31-39-00-16-E2.compute-1.internal
618 [main] INFO com.cloudera.flume.master.FlumeMaster -  Working Directory
is: /usr/lib/flume/.
621 [main] INFO com.cloudera.flume.master.ZooKeeperService - Starting
ZooKeeper server as part of ensemble
631 [main] INFO com.cloudera.flume.master.ZKInProcessServer - Creating
/var/flumedata/zk/server-1/myid
632 [main] INFO com.cloudera.flume.master.ZKInProcessServer - configuration:
{server.2=10.2.31.65:3182:3183, server.1=10.254.23.16:3182:3183,
server.0=10.192.122.191:3182:3183, initLimit=10, syncLimit=10,
maxClientCnxns=0, clientPort=3181, tickTime=2000, electionAlg=3,
dataDir=/var/flumedata/zk/server-1}
649 [main] INFO com.cloudera.flume.master.ZKInProcessServer - server
0.0.0.0:3181 not up yet
649 [ZooKeeper thread] INFO com.cloudera.flume.master.ZKInProcessServer -
Starting ZooKeeper server
16689 [main] ERROR com.cloudera.flume.master.FlumeMaster - IO problem:
ZooKeeper server did not come up within 15 seconds



Master 3:

425 [main] INFO com.cloudera.flume.master.FlumeMaster - Inferred master
server index 2
463 [main] INFO com.cloudera.flume.master.FlumeMaster - Distributed master,
disabling all config translations
473 [main] INFO com.cloudera.flume.master.FlumeMaster - Starting flume
master on: ip-10-2-31-65.ec2.internal
475 [main] INFO com.cloudera.flume.master.FlumeMaster -  Working Directory
is: /usr/lib/flume/.
478 [main] INFO com.cloudera.flume.master.ZooKeeperService - Starting
ZooKeeper server as part of ensemble
491 [main] INFO com.cloudera.flume.master.ZKInProcessServer - Creating
/var/flumedata/zk/server-2/myid
493 [main] INFO com.cloudera.flume.master.ZKInProcessServer - configuration:
{server.2=10.2.31.65:3182:3183, server.1=10.254.23.16:3182:3183,
server.0=10.192.122.191:3182:3183, initLimit=10, syncLimit=10,
maxClientCnxns=0, clientPort=3181, tickTime=2000, electionAlg=3,
dataDir=/var/flumedata/zk/server-2}
501 [ZooKeeper thread] INFO com.cloudera.flume.master.ZKInProcessServer -
Starting ZooKeeper server
505 [main] INFO com.cloudera.flume.master.ZKInProcessServer - server
0.0.0.0:3181 not up yet
16532 [main] ERROR com.cloudera.flume.master.FlumeMaster - IO problem:
ZooKeeper server did not come up within 15 seconds




-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: Multiple Flume Masters on EC2

Posted by Jonathan Hsieh <jo...@cloudera.com>.
Steve,

Please note that the multiple masters is still in early status so we haven't
ironed out all the problems here yet.

Have you tried restarting the masters that don't connect?  Does it
eventually succeed?

Jon.

On Thu, Aug 4, 2011 at 3:00 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> Forwarding direct response to flume-user@incubator.apache.org and
> cdh-user@cloudera.org.
>
> ---------- Forwarded message ----------
> From: flume collector <fl...@collector.org>
> Date: Wed, Aug 3, 2011 at 5:56 AM
> Subject: Multiple Flume Masters on EC2
> To: jon@cloudera.com
>
>
> Jon,
>
> Thanks for replying to
> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/6a8c44a782518a88#.
> I wasn't able to reply in the google group since this email is not a google
> account.
> Just to follow up though, I added the internal ip address and there
> are/were no extra spaces in the flume.master.servers:
>
>    <property>
>     <name>flume.master.servers</name>
>     <value>10.192.122.191,10.254.23.16,10.2.31.65</value>
>     <description>This is the address for the config servers status
>     server (http) </description>
>   </property>
>
> Also, each master has it's own master server id.  ex:
>
>   <property>
>     <name>flume.master.serverid</name>
>     <value>2</value>
>     <description>The unique identifier for a machine in a
>       Flume Master ensemble. Must be different on every
>       master instance.</description>
>   </property>
>
> Not sure how ec2 sets up their machines, there may be multiple nic's on
> them.
>
> Below are the errors I get with the ip addresses as the values within
> flume.master.servers:
>
> Thanks,
> -Steve
>
>
> Master 1:
>
> java.net.SocketTimeoutException
>     at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
>     at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
>     at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
>     at
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
>     at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
> 2011-08-03 08:33:07,060 INFO
> org.apache.zookeeper.server.quorum.FastLeaderElection: Notification time
> out: 400
> 2011-08-03 08:33:08,893 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Accepted socket connection from /10.192.122.191:49107
> 2011-08-03 08:33:08,893 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Processing stat command from /10.192.122.191:49107
> 2011-08-03 08:33:08,894 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Closed socket connection for client /10.192.122.191:49107 (no session
> established for client)
> 2011-08-03 08:33:10,896 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Accepted socket connection from /10.192.122.191:49108
> 2011-08-03 08:33:10,897 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Processing stat command from /10.192.122.191:49108
> 2011-08-03 08:33:10,897 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Closed socket connection for client /10.192.122.191:49108 (no session
> established for client)
> 2011-08-03 08:33:12,060 WARN
> org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to
> 2 at election address /10.2.31.65:3183
>
>
>
>  Master 2:
>
> 557 [main] INFO com.cloudera.flume.master.FlumeMaster - Inferred master
> server index 1
> 607 [main] INFO com.cloudera.flume.master.FlumeMaster - Distributed master,
> disabling all config translations
> 618 [main] INFO com.cloudera.flume.master.FlumeMaster - Starting flume
> master on: domU-12-31-39-00-16-E2.compute-1.internal
> 618 [main] INFO com.cloudera.flume.master.FlumeMaster -  Working Directory
> is: /usr/lib/flume/.
> 621 [main] INFO com.cloudera.flume.master.ZooKeeperService - Starting
> ZooKeeper server as part of ensemble
> 631 [main] INFO com.cloudera.flume.master.ZKInProcessServer - Creating
> /var/flumedata/zk/server-1/myid
> 632 [main] INFO com.cloudera.flume.master.ZKInProcessServer -
> configuration: {server.2=10.2.31.65:3182:3183, server.1=10.254.23.16:3182:3183,
> server.0=10.192.122.191:3182:3183, initLimit=10, syncLimit=10,
> maxClientCnxns=0, clientPort=3181, tickTime=2000, electionAlg=3,
> dataDir=/var/flumedata/zk/server-1}
> 649 [main] INFO com.cloudera.flume.master.ZKInProcessServer - server
> 0.0.0.0:3181 not up yet
> 649 [ZooKeeper thread] INFO com.cloudera.flume.master.ZKInProcessServer -
> Starting ZooKeeper server
> 16689 [main] ERROR com.cloudera.flume.master.FlumeMaster - IO problem:
> ZooKeeper server did not come up within 15 seconds
>
>
>
> Master 3:
>
> 425 [main] INFO com.cloudera.flume.master.FlumeMaster - Inferred master
> server index 2
> 463 [main] INFO com.cloudera.flume.master.FlumeMaster - Distributed master,
> disabling all config translations
> 473 [main] INFO com.cloudera.flume.master.FlumeMaster - Starting flume
> master on: ip-10-2-31-65.ec2.internal
> 475 [main] INFO com.cloudera.flume.master.FlumeMaster -  Working Directory
> is: /usr/lib/flume/.
> 478 [main] INFO com.cloudera.flume.master.ZooKeeperService - Starting
> ZooKeeper server as part of ensemble
> 491 [main] INFO com.cloudera.flume.master.ZKInProcessServer - Creating
> /var/flumedata/zk/server-2/myid
> 493 [main] INFO com.cloudera.flume.master.ZKInProcessServer -
> configuration: {server.2=10.2.31.65:3182:3183, server.1=10.254.23.16:3182:3183,
> server.0=10.192.122.191:3182:3183, initLimit=10, syncLimit=10,
> maxClientCnxns=0, clientPort=3181, tickTime=2000, electionAlg=3,
> dataDir=/var/flumedata/zk/server-2}
> 501 [ZooKeeper thread] INFO com.cloudera.flume.master.ZKInProcessServer -
> Starting ZooKeeper server
> 505 [main] INFO com.cloudera.flume.master.ZKInProcessServer - server
> 0.0.0.0:3181 not up yet
> 16532 [main] ERROR com.cloudera.flume.master.FlumeMaster - IO problem:
> ZooKeeper server did not come up within 15 seconds
>
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com