You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Ondřej Černoš <ce...@gmail.com> on 2013/05/02 17:35:03 UTC

Extending Ec2Snitch for custom availability zone format

Hi all,

We use Cassandra in mixed Ec2/OpenStack environment. Unfortunately due to
decisions made long ago the OpenStack availability zone name obtainable
through http://169.254.169.254/latest/meta-data/placement/availability-zone is
not compatible with Cassandra's parsing in o.a.c.locator.Ec2Snitch - the
format uses dot instead of minus as field separator. Currently I manage my
own fork of Cassandra's snitches, which is error prone. I thought I might
patch Cassandra so that it understands custom formats:

- make the format a regex configurable in cassandra.yaml with defaults
(option not set at all) set to current implementation
- make it easy - presume three groups (us-east-1a,
openstack.something-computenode and the like) where the first two groups
form datacenter name and the last one the rack (plus keeping CASSANDRA-4026
functionality in place)

For users not configuring the regex nothing will change, others, like me,
will have the option to parse different availability zone names.

What do you think? Does it have a chance being accepted?

regards,

ondřej černoš

Re: Extending Ec2Snitch for custom availability zone format

Posted by Jonathan Ellis <jb...@gmail.com>.
On Fri, May 3, 2013 at 3:38 AM, Ondřej Černoš <ce...@gmail.com> wrote:
> My setup currently uses my own snitch which extends my fork
> of Ec2MultiRegionSnitch. It tries to guess which datacenter the snitch is
> running on and parses the availability zone according to the guess, using
> AWS or our OpenStack specific regex.

It sounds like this is the crucial piece.  If this works reliably,
then we don't need to add any configurability since we can just build
your OpenStack compatibility into the mainline.  If it doesn't, then
adding configurable regexps won't help.

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: Extending Ec2Snitch for custom availability zone format

Posted by Ondřej Černoš <ce...@gmail.com>.
Hi Jonathan,

let me explain my itch.

Our Cassandra deployment consists of 2 data centers, AWS us-east and
private cloud at Rackspace built on OpenStack. OpenStack emulates EC2 APIs
pretty well and Cassandra's EC2 support works on top of it - using
Ec2MultiRegionSnitch all the private/public IPs handling works, cluster
gets connected well etc. The only problem is in naming conventions presumed
by Ec2Snitch. The snitch presumes the AWS naming conventions and splits the
availability zone name using "-" as token separator - see the constructor
of Ec2Snitch. The other DC of ours, the OpenStack one, doesn't respect
these naming conventions (a decision made long ago and set to stone). The
availability zone is in almost the same format as the EC2 one, but slightly
different - it follows this regex: ^(na)\.(.*)-.*$. Please mind the dot
between the first and second group. If stock Cassandra EC2 support is used
Cassandra incorrectly uses the whole availability zone as the DC name,
which results in all my nodes in the OpenStack based DC to be handled as in
different DCs.

My setup currently uses my own snitch which extends my fork
of Ec2MultiRegionSnitch. It tries to guess which datacenter the snitch is
running on and parses the availability zone according to the guess, using
AWS or our OpenStack specific regex. Besides parsing the availability zone
name the snitch does nothing and delegates all the real work to the
hierarchy above. Unfortunately I had to fork Ec2MultiRegionSnitch
and Ec2Snitch in order to avoid code duplication - in original versions a
lot of work is done in constructors and there is no clean way to extend the
classes.

I'd love to get rid of the fork of these classes I have to maintain with
every Cassandra release (for instance
https://issues.apache.org/jira/browse/CASSANDRA-5432). What I suggest is
the following:

- make the Ec2 snitch parsing format configurable with default parser being
the current (so that pure Ec2 users don't have to do anything and the
support just works as today)
- keep it simple - let the parser always presume three groups as in
us-east-1a or our naming na.prod-hostname
- add the format to an optional configuration parameter in cassandra.yaml

If done this way, my configuration would use Ec2MultiRegionSnitch as is on
AWS side and configured with custom regex on the OpenStack side.

If accepted, cassandra will support more deployment use cases, I will get
rid of my private fork and current users will not be hit.

I will do the coding.

regards,
ondřej černoš



On Fri, May 3, 2013 at 3:04 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> I don't understand what you're trying to solve.  A snitch can get
> asked for an endpoint from any DC, so you can't just configure
> different nodes with different snitches and figure it will all be
> good.
>
> On Thu, May 2, 2013 at 10:35 AM, Ondřej Černoš <ce...@gmail.com> wrote:
> > Hi all,
> >
> > We use Cassandra in mixed Ec2/OpenStack environment. Unfortunately due to
> > decisions made long ago the OpenStack availability zone name obtainable
> > through
> http://169.254.169.254/latest/meta-data/placement/availability-zone is
> > not compatible with Cassandra's parsing in o.a.c.locator.Ec2Snitch - the
> > format uses dot instead of minus as field separator. Currently I manage
> my
> > own fork of Cassandra's snitches, which is error prone. I thought I might
> > patch Cassandra so that it understands custom formats:
> >
> > - make the format a regex configurable in cassandra.yaml with defaults
> > (option not set at all) set to current implementation
> > - make it easy - presume three groups (us-east-1a,
> > openstack.something-computenode and the like) where the first two groups
> > form datacenter name and the last one the rack (plus keeping
> CASSANDRA-4026
> > functionality in place)
> >
> > For users not configuring the regex nothing will change, others, like me,
> > will have the option to parse different availability zone names.
> >
> > What do you think? Does it have a chance being accepted?
> >
> > regards,
> >
> > ondřej černoš
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Re: Extending Ec2Snitch for custom availability zone format

Posted by Jonathan Ellis <jb...@gmail.com>.
I don't understand what you're trying to solve.  A snitch can get
asked for an endpoint from any DC, so you can't just configure
different nodes with different snitches and figure it will all be
good.

On Thu, May 2, 2013 at 10:35 AM, Ondřej Černoš <ce...@gmail.com> wrote:
> Hi all,
>
> We use Cassandra in mixed Ec2/OpenStack environment. Unfortunately due to
> decisions made long ago the OpenStack availability zone name obtainable
> through http://169.254.169.254/latest/meta-data/placement/availability-zone is
> not compatible with Cassandra's parsing in o.a.c.locator.Ec2Snitch - the
> format uses dot instead of minus as field separator. Currently I manage my
> own fork of Cassandra's snitches, which is error prone. I thought I might
> patch Cassandra so that it understands custom formats:
>
> - make the format a regex configurable in cassandra.yaml with defaults
> (option not set at all) set to current implementation
> - make it easy - presume three groups (us-east-1a,
> openstack.something-computenode and the like) where the first two groups
> form datacenter name and the last one the rack (plus keeping CASSANDRA-4026
> functionality in place)
>
> For users not configuring the regex nothing will change, others, like me,
> will have the option to parse different availability zone names.
>
> What do you think? Does it have a chance being accepted?
>
> regards,
>
> ondřej černoš



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced