You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Jason Stelzer <ja...@gmail.com> on 2010/09/24 15:16:15 UTC

Looking for advice on getting clustering to work

Hi, I'm reaching out for a little help and pointers with regard to
qpid clustering.

I'm coming into this with nearly zero qpid experience so I will try to
be as complete as possible. I am attempting to set up a qpid cluster
so that we can scale out our qpid clients across multiple qpid
servers. Is it best practice to have a primary enqueue node and
dequeue from the secondary nodes in the cluster?

My understanding is that replication is geared more for fault
tolerance and disaster recovery, and that clustering is geared towards
supporting large numbers of concurrent activity.

I am currently working on getting qpid clustering working as described here:
https://cwiki.apache.org/qpid/starting-a-cluster.html

I am running qpid v 0.5 on Fedora 12. I have the following rpms installed:
qpidc-0.5.829175-2.fc12.x86_64
qpidd-0.5.829175-2.fc12.x86_64
qpidd-cluster-0.5.829175-2.fc12.x86_64

When I start qpidd and pass the --cluster-name=TEST_CLUSTER option,
qpidd aborts with the following error:
Starting Qpid AMQP daemon: Daemon startup failed: Cannot join CPG
group DEV_CLUSTER: try again (6)

I believe I have corosync and pacemaker working.

If I start corosync, it takes a bit of time before the crm commands
work, but once everything spins up I don't see any warnings when I
run:

crm_verify -L
(no output/warnings)

crm configure show
node edisondev3
property $id="cib-bootstrap-options" \
        dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        stonith-enable="false"



I've double checked my bindnetaddress in corosync.conf. It lines up
with the wiki article and agrees with the output of /sbin/route.

I double checked my uidgid.d/qpid file. Initially I had the uid wrong
and was getting a security error when I started qpid. Now that I have
the correct uid/gid, I am seeing the 'try again' error above.

Any tips would be appreciated.

-- 
J.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by Jason Stelzer <ja...@gmail.com>.
On Fri, Sep 24, 2010 at 10:16 AM, Gordon Sim <gs...@redhat.com> wrote:

> Some things I've tripped up on in the past in case any of these help:
>
> * does firewall allow UDP on the desired port?

There's no firewall (phsical or iptables). This is on an open lan segment.

> * is SELinux in use?

/etc/selinux/config contains
SELINUX=disabled

So selinux should not be an issue.

> * is the bind address correct for the network mask?
I think so, it agrees with what /sbin/route thinks the local network
is as well as what the wiki indicated you should base the bindnetaddr
on.




> (* is multicast enabled?)
ifconfig claims the following for the interface in question, so yes.
Multicast seems good.
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1


-- 
J.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by Gordon Sim <gs...@redhat.com>.
On 09/24/2010 02:31 PM, Jason Stelzer wrote:
> I'm looking to create a cluster of qpidd servers that are in a
> synchronized state so that I can allow many agents to listen for and
> consume events that are enqueued by another system.
>
> So, to borrow from some concepts that I've used to cluster mysql into
> writer/reader nodes, if I were to create a 2 node cluster I would
> imagine it would work something like this:
>
> Node A would be the enqueue node. System A would send events to Node A
> where they would be enqueued. Essentially, this node is write only.
>
> Node B would be the dequeue node. The clustering would take care of
> the message propagation and all listeners in System B would dequeue
> messages out of the cluster via Node B.
>
> My current problem is that I'm not sure what step I have done
> incorrectly. I'm fairly sure that I've done all I can as far as the
> wiki goes. But, it could be a misconfiguration of a lower level
> service since I am not yet that experienced with the underlying
> corosync/heartbeat software.

Some things I've tripped up on in the past in case any of these help:

* does firewall allow UDP on the desired port?
* is SELinux in use?
* is the bind address correct for the network mask?
(* is multicast enabled?)

The security setup was another one, but you've explicitly ruled that out.

>
>
> On Fri, Sep 24, 2010 at 9:25 AM, Lahiru Gunathilake<gl...@gmail.com>  wrote:
>> Hi Jason,
>>
>> By clustering what are you going to achieve ? You want to replicate the Qpid
>> state (I mean your message store) among the cluster nodes ? I am actually
>> asking for my clarification because I am trying to find an easier solution
>> for Qpid clustering ?
>>
>> Lahiru
>>
>> On Fri, Sep 24, 2010 at 6:46 PM, Jason Stelzer<ja...@gmail.com>wrote:
>>
>>> Hi, I'm reaching out for a little help and pointers with regard to
>>> qpid clustering.
>>>
>>> I'm coming into this with nearly zero qpid experience so I will try to
>>> be as complete as possible. I am attempting to set up a qpid cluster
>>> so that we can scale out our qpid clients across multiple qpid
>>> servers. Is it best practice to have a primary enqueue node and
>>> dequeue from the secondary nodes in the cluster?
>>>
>>> My understanding is that replication is geared more for fault
>>> tolerance and disaster recovery, and that clustering is geared towards
>>> supporting large numbers of concurrent activity.
>>>
>>> I am currently working on getting qpid clustering working as described
>>> here:
>>> https://cwiki.apache.org/qpid/starting-a-cluster.html
>>>
>>> I am running qpid v 0.5 on Fedora 12. I have the following rpms installed:
>>> qpidc-0.5.829175-2.fc12.x86_64
>>> qpidd-0.5.829175-2.fc12.x86_64
>>> qpidd-cluster-0.5.829175-2.fc12.x86_64
>>>
>>> When I start qpidd and pass the --cluster-name=TEST_CLUSTER option,
>>> qpidd aborts with the following error:
>>> Starting Qpid AMQP daemon: Daemon startup failed: Cannot join CPG
>>> group DEV_CLUSTER: try again (6)
>>>
>>> I believe I have corosync and pacemaker working.
>>>
>>> If I start corosync, it takes a bit of time before the crm commands
>>> work, but once everything spins up I don't see any warnings when I
>>> run:
>>>
>>> crm_verify -L
>>> (no output/warnings)
>>>
>>> crm configure show
>>> node edisondev3
>>> property $id="cib-bootstrap-options" \
>>>         dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \
>>>         cluster-infrastructure="openais" \
>>>         expected-quorum-votes="2" \
>>>         stonith-enabled="false" \
>>>         stonith-enable="false"
>>>
>>>
>>>
>>> I've double checked my bindnetaddress in corosync.conf. It lines up
>>> with the wiki article and agrees with the output of /sbin/route.
>>>
>>> I double checked my uidgid.d/qpid file. Initially I had the uid wrong
>>> and was getting a security error when I started qpid. Now that I have
>>> the correct uid/gid, I am seeing the 'try again' error above.
>>>
>>> Any tips would be appreciated.
>>>
>>> --
>>> J.
>>>
>>> ---------------------------------------------------------------------
>>> Apache Qpid - AMQP Messaging Implementation
>>> Project:      http://qpid.apache.org
>>> Use/Interact: mailto:users-subscribe@qpid.apache.org
>>>
>>>
>>
>
>
>


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by Jason Stelzer <ja...@gmail.com>.
I'm looking to create a cluster of qpidd servers that are in a
synchronized state so that I can allow many agents to listen for and
consume events that are enqueued by another system.

So, to borrow from some concepts that I've used to cluster mysql into
writer/reader nodes, if I were to create a 2 node cluster I would
imagine it would work something like this:

Node A would be the enqueue node. System A would send events to Node A
where they would be enqueued. Essentially, this node is write only.

Node B would be the dequeue node. The clustering would take care of
the message propagation and all listeners in System B would dequeue
messages out of the cluster via Node B.

My current problem is that I'm not sure what step I have done
incorrectly. I'm fairly sure that I've done all I can as far as the
wiki goes. But, it could be a misconfiguration of a lower level
service since I am not yet that experienced with the underlying
corosync/heartbeat software.


On Fri, Sep 24, 2010 at 9:25 AM, Lahiru Gunathilake <gl...@gmail.com> wrote:
> Hi Jason,
>
> By clustering what are you going to achieve ? You want to replicate the Qpid
> state (I mean your message store) among the cluster nodes ? I am actually
> asking for my clarification because I am trying to find an easier solution
> for Qpid clustering ?
>
> Lahiru
>
> On Fri, Sep 24, 2010 at 6:46 PM, Jason Stelzer <ja...@gmail.com>wrote:
>
>> Hi, I'm reaching out for a little help and pointers with regard to
>> qpid clustering.
>>
>> I'm coming into this with nearly zero qpid experience so I will try to
>> be as complete as possible. I am attempting to set up a qpid cluster
>> so that we can scale out our qpid clients across multiple qpid
>> servers. Is it best practice to have a primary enqueue node and
>> dequeue from the secondary nodes in the cluster?
>>
>> My understanding is that replication is geared more for fault
>> tolerance and disaster recovery, and that clustering is geared towards
>> supporting large numbers of concurrent activity.
>>
>> I am currently working on getting qpid clustering working as described
>> here:
>> https://cwiki.apache.org/qpid/starting-a-cluster.html
>>
>> I am running qpid v 0.5 on Fedora 12. I have the following rpms installed:
>> qpidc-0.5.829175-2.fc12.x86_64
>> qpidd-0.5.829175-2.fc12.x86_64
>> qpidd-cluster-0.5.829175-2.fc12.x86_64
>>
>> When I start qpidd and pass the --cluster-name=TEST_CLUSTER option,
>> qpidd aborts with the following error:
>> Starting Qpid AMQP daemon: Daemon startup failed: Cannot join CPG
>> group DEV_CLUSTER: try again (6)
>>
>> I believe I have corosync and pacemaker working.
>>
>> If I start corosync, it takes a bit of time before the crm commands
>> work, but once everything spins up I don't see any warnings when I
>> run:
>>
>> crm_verify -L
>> (no output/warnings)
>>
>> crm configure show
>> node edisondev3
>> property $id="cib-bootstrap-options" \
>>        dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \
>>        cluster-infrastructure="openais" \
>>        expected-quorum-votes="2" \
>>        stonith-enabled="false" \
>>        stonith-enable="false"
>>
>>
>>
>> I've double checked my bindnetaddress in corosync.conf. It lines up
>> with the wiki article and agrees with the output of /sbin/route.
>>
>> I double checked my uidgid.d/qpid file. Initially I had the uid wrong
>> and was getting a security error when I started qpid. Now that I have
>> the correct uid/gid, I am seeing the 'try again' error above.
>>
>> Any tips would be appreciated.
>>
>> --
>> J.
>>
>> ---------------------------------------------------------------------
>> Apache Qpid - AMQP Messaging Implementation
>> Project:      http://qpid.apache.org
>> Use/Interact: mailto:users-subscribe@qpid.apache.org
>>
>>
>



-- 
J.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by Lahiru Gunathilake <gl...@gmail.com>.
Hi Jason,

By clustering what are you going to achieve ? You want to replicate the Qpid
state (I mean your message store) among the cluster nodes ? I am actually
asking for my clarification because I am trying to find an easier solution
for Qpid clustering ?

Lahiru

On Fri, Sep 24, 2010 at 6:46 PM, Jason Stelzer <ja...@gmail.com>wrote:

> Hi, I'm reaching out for a little help and pointers with regard to
> qpid clustering.
>
> I'm coming into this with nearly zero qpid experience so I will try to
> be as complete as possible. I am attempting to set up a qpid cluster
> so that we can scale out our qpid clients across multiple qpid
> servers. Is it best practice to have a primary enqueue node and
> dequeue from the secondary nodes in the cluster?
>
> My understanding is that replication is geared more for fault
> tolerance and disaster recovery, and that clustering is geared towards
> supporting large numbers of concurrent activity.
>
> I am currently working on getting qpid clustering working as described
> here:
> https://cwiki.apache.org/qpid/starting-a-cluster.html
>
> I am running qpid v 0.5 on Fedora 12. I have the following rpms installed:
> qpidc-0.5.829175-2.fc12.x86_64
> qpidd-0.5.829175-2.fc12.x86_64
> qpidd-cluster-0.5.829175-2.fc12.x86_64
>
> When I start qpidd and pass the --cluster-name=TEST_CLUSTER option,
> qpidd aborts with the following error:
> Starting Qpid AMQP daemon: Daemon startup failed: Cannot join CPG
> group DEV_CLUSTER: try again (6)
>
> I believe I have corosync and pacemaker working.
>
> If I start corosync, it takes a bit of time before the crm commands
> work, but once everything spins up I don't see any warnings when I
> run:
>
> crm_verify -L
> (no output/warnings)
>
> crm configure show
> node edisondev3
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        stonith-enable="false"
>
>
>
> I've double checked my bindnetaddress in corosync.conf. It lines up
> with the wiki article and agrees with the output of /sbin/route.
>
> I double checked my uidgid.d/qpid file. Initially I had the uid wrong
> and was getting a security error when I started qpid. Now that I have
> the correct uid/gid, I am seeing the 'try again' error above.
>
> Any tips would be appreciated.
>
> --
> J.
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
>
>

Re: Looking for advice on getting clustering to work

Posted by Alan Conway <ac...@redhat.com>.
On 09/24/2010 02:09 PM, David Hawthorne wrote:
> I myself am more worried about the scaling aspect.  How is that solved?

Federation might help you, it allows messages to be automatically routed between 
brokers which could be used to distribute load.

Partitioning your application is another thing to think about: i.e. separate the 
load into chunks that can be run on separate brokers.

If you need both scaling and reliability, think of a qpid cluster as a *single 
reliable broker* and then federate or partition over multiple clusters.

I'm very interested in discussion about how the reliability and scalability 
aspects could live more harmoniously together. I'm thinking about some reworking 
to make the cluster more flexible and I think there will be scope to address 
scalability+reliability with a more coherent solution, the cluster effort has 
been very reliability-focused till now, so it would do me good to think a bit 
more about the scalability aspect while I'm thinking about improved cluster design.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by David Hawthorne <dh...@3crowd.com>.
I myself am more worried about the scaling aspect.  How is that solved?


On Sep 24, 2010, at 10:22 AM, Jason Stelzer wrote:

> On Fri, Sep 24, 2010 at 10:41 AM, Alan Conway <ac...@redhat.com> wrote:
>> 
>> The cluster is active-active, you can enqueue and dequeue on any nodes in
>> the cluster.
>> 
> 
> Nice. Thanks for clarifying. Given my needs, the scaling aspect isn't
> nearly as important as fault tolerance.
> 
>> Did you check your firewall and selinux settings? If it still doesnt work
>> send the follow:
>> 
>> 
> 
> Thanks for the thorough list of questions. It was, point of fact, the
> firewall. I was unaware of the fact that there were rules configured
> rules that were preventing the multicast traffic from working.
> 
> Thanks a bunch to everyone who chimed in. I feel like a pinhead for
> not double checking everything sooner.
> 
> It all works now, I'll resume lurking :)
> 
> 
> -- 
> J.
> 
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:users-subscribe@qpid.apache.org
> 


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by Jason Stelzer <ja...@gmail.com>.
On Fri, Sep 24, 2010 at 5:19 PM, Alan Conway <ac...@redhat.com> wrote:
> Lots of newcomers stumble over this. I occasionally forget something myself
> when setting up on a new host. Shout if you have any ideas on how the doc
> could be improved so that you would have figured this out more quickly.
>

This doesn't really seem like a qpid issue so much as an issue where
if you don't understand what the relationship is between corosync,
multicast traffic and qpid then you will more than likely miss this
kind of configuration issue. Perhaps calling it out explicitly on the
wiki page with some links to the related projects would be a good
addition for those of us new to the whole suite of software.

I was struggling to understand how everything hooked together never
having used corosync or qpid. In a perfect world, there would be a way
to test corosync and multicast traffic besides running something as
complex as qpid.

A tool that did some kind of multicast ping/heads up just to show
traffic was routed correctly might be a handy tool. Then the install
process would essentially consist of ensuring things were correctly
configured each step of the way rather than waiting until you touched
a bunch of files and installed a bunch of packages to see if it all
works perfectly together.

I guess what I'm saying is that solving this problem at the qpid level
is fantastic for qpid, but there are lots of other things that need
corosync/heartbeat configured correctly. Tools to better support and
troubleshoot the underlying functionality of corosync would make
setting up a vip and stonith much easier (for instance).

To be fair, such tools may exist and I am simply ignorant of them. So
if you know of resources, I'm happy to hear about them.

Either way, thanks again for pointing me in the right direction. I
have a bunch more to read up on. These tools open up a lot of
possibilities for me.

-- 
J.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by Alan Conway <ac...@redhat.com>.
On 09/24/2010 01:22 PM, Jason Stelzer wrote:
> On Fri, Sep 24, 2010 at 10:41 AM, Alan Conway<ac...@redhat.com>  wrote:
>>
>> The cluster is active-active, you can enqueue and dequeue on any nodes in
>> the cluster.
>>
>
> Nice. Thanks for clarifying. Given my needs, the scaling aspect isn't
> nearly as important as fault tolerance.
>
>> Did you check your firewall and selinux settings? If it still doesnt work
>> send the follow:
>>
>>
>
> Thanks for the thorough list of questions. It was, point of fact, the
> firewall. I was unaware of the fact that there were rules configured
> rules that were preventing the multicast traffic from working.
>
> Thanks a bunch to everyone who chimed in. I feel like a pinhead for
> not double checking everything sooner.
>

Lots of newcomers stumble over this. I occasionally forget something myself when 
setting up on a new host. Shout if you have any ideas on how the doc could be 
improved so that you would have figured this out more quickly.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by Jason Stelzer <ja...@gmail.com>.
On Fri, Sep 24, 2010 at 10:41 AM, Alan Conway <ac...@redhat.com> wrote:
>
> The cluster is active-active, you can enqueue and dequeue on any nodes in
> the cluster.
>

Nice. Thanks for clarifying. Given my needs, the scaling aspect isn't
nearly as important as fault tolerance.

> Did you check your firewall and selinux settings? If it still doesnt work
> send the follow:
>
>

Thanks for the thorough list of questions. It was, point of fact, the
firewall. I was unaware of the fact that there were rules configured
rules that were preventing the multicast traffic from working.

Thanks a bunch to everyone who chimed in. I feel like a pinhead for
not double checking everything sooner.

It all works now, I'll resume lurking :)


-- 
J.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Re: Looking for advice on getting clustering to work

Posted by Alan Conway <ac...@redhat.com>.
On 09/24/2010 09:16 AM, Jason Stelzer wrote:
> Hi, I'm reaching out for a little help and pointers with regard to
> qpid clustering.
>
> I'm coming into this with nearly zero qpid experience so I will try to
> be as complete as possible. I am attempting to set up a qpid cluster
> so that we can scale out our qpid clients across multiple qpid
> servers. Is it best practice to have a primary enqueue node and
> dequeue from the secondary nodes in the cluster?

The cluster is active-active, you can enqueue and dequeue on any nodes in the 
cluster.

> My understanding is that replication is geared more for fault
> tolerance and disaster recovery, and that clustering is geared towards
> supporting large numbers of concurrent activity.

The cluster is geared towards fault tolerance rater than load sharing. All the 
brokers effectively do all the work for every client of the cluster regardless 
of where the client is connected, so adding more nodes won't enable the cluster 
to handle a greater volume of messages - it will increase the clusters tolerance 
for failures however.

> I am currently working on getting qpid clustering working as described here:
> https://cwiki.apache.org/qpid/starting-a-cluster.html
>
> I am running qpid v 0.5 on Fedora 12. I have the following rpms installed:
> qpidc-0.5.829175-2.fc12.x86_64
> qpidd-0.5.829175-2.fc12.x86_64
> qpidd-cluster-0.5.829175-2.fc12.x86_64
>
> When I start qpidd and pass the --cluster-name=TEST_CLUSTER option,
> qpidd aborts with the following error:
> Starting Qpid AMQP daemon: Daemon startup failed: Cannot join CPG
> group DEV_CLUSTER: try again (6)
>
> I believe I have corosync and pacemaker working.
>
> If I start corosync, it takes a bit of time before the crm commands
> work, but once everything spins up I don't see any warnings when I
> run:
>
> crm_verify -L
> (no output/warnings)
>
> crm configure show
> node edisondev3
> property $id="cib-bootstrap-options" \
>          dc-version="1.0.5-ee19d8e83c2a5d45988f1cee36d334a631d84fc7" \
>          cluster-infrastructure="openais" \
>          expected-quorum-votes="2" \
>          stonith-enabled="false" \
>          stonith-enable="false"
>
>
>
> I've double checked my bindnetaddress in corosync.conf. It lines up
> with the wiki article and agrees with the output of /sbin/route.
>
> I double checked my uidgid.d/qpid file. Initially I had the uid wrong
> and was getting a security error when I started qpid. Now that I have
> the correct uid/gid, I am seeing the 'try again' error above.
>
> Any tips would be appreciated.
>

Did you check your firewall and selinux settings? If it still doesnt work send 
the follow:

# getenforce
# iptables -L
# ifconfig
# cat /etc/corosync
# tail -n +0 /etc/corosync/corosync.conf /etc/corosync/uidgid.d/*

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org