You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by William Oberman <ob...@civicscience.com> on 2011/04/12 20:15:55 UTC

Ec2Snitch + NetworkTopologyStrategy if only in one region?

Hi,

I'm getting closer to commiting to cassandra, and now I'm in system/IT
issues and questions.  I'm in the amazon EC2 cloud.  I previously used this
forum to discover the best practice for disk layouts (large instance + the
two ephemeral disks in RAID0 for data + root volume for everything else).
Now I'm hoping to confirm bits and pieces of things I've read about for
snitch/replication strategies.  I was thinking of using
endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
(for people hitting this from the mailing list or google, I feel obligated
to note that the former setting is in cassandra.yaml, and the latter is an
option on a keyspace).

But, I'm only in one region. Is using the amazon snitch/networktopology
overkill given everything I have is in one DC (I believe region==DC and
availability_zone==rack).  I'm using multiple availability zones for some
level of redundancy, I'm just not yet to the point I'm using multiple
regions.  If someday I move to using multiple regions, would that change the
answer?

Thanks!

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) oberman@civicscience.com

Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?

Posted by William Oberman <ob...@civicscience.com>.
Also for the new users like me, don't assume DC1 is a keyword like I did.  A
working example of a keyspace in EC2 is:

create keyspace test with replication_factor=3 and strategy_options =
[{us-east:3}] and
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy';

For a single DC in EC2 deployment.  I felt silly afterwards, but I couldn't
find official docs on the structure of strategy_options anywhere.

will

On Wed, Apr 13, 2011 at 5:14 PM, William Oberman
<ob...@civicscience.com>wrote:

> One last coda, for other noobs to cassandra like me.  If you use
> NetworkTopologyStrategy with replication_factor > 1, make sure you have EC2
> instance in multiple availability zones.  I was doing baby steps, and tried
> doing a cluster in one AZ (before spreading to multiple AZs) and was getting
> the most baffling errors ("cassandra_UnavailableException").  I finally
> thought to check the cassandra server logs (after debugging the client code,
> firewalls, etc... painstakingly for connectivity problems), and it ends up
> my cassandra cluster was considering itself "unavailable" as it couldn't
> replicate as much as it wanted to.  I kind of wish a different word than
> "unavailable" was chosen for this error condition :-)
>
> will
>
>
> On Tue, Apr 12, 2011 at 10:37 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> If you can use standard + encoded I would go with that.
>>
>> Aaron
>>
>> On 13 Apr 2011, at 07:07, William Oberman wrote:
>>
>> Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I
>> found this out before digging a hole).
>>
>> The other issue I've been pondering is a normal column family of encoded
>> objects (in my case JSON) vs. a super column.  Based on my use case, things
>> I've read, etc...  right now I'm coming down on normal + encoded.
>>
>> will
>>
>> On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis <jb...@gmail.com>wrote:
>>
>>> NTS is overkill in the sense that it doesn't really benefit you in a
>>> single DC, but if you think you may expand to another DC in the future
>>> it's much simpler if you were already using NTS, than first migrating
>>> to NTS (changing strategy is painful).
>>>
>>> I can't think of any downsides to using NTS in a single-DC
>>> environment, so that's the "safe" option.
>>>
>>> On Tue, Apr 12, 2011 at 1:15 PM, William Oberman
>>> <ob...@civicscience.com> wrote:
>>> > Hi,
>>> >
>>> > I'm getting closer to commiting to cassandra, and now I'm in system/IT
>>> > issues and questions.  I'm in the amazon EC2 cloud.  I previously used
>>> this
>>> > forum to discover the best practice for disk layouts (large instance +
>>> the
>>> > two ephemeral disks in RAID0 for data + root volume for everything
>>> else).
>>> > Now I'm hoping to confirm bits and pieces of things I've read about for
>>> > snitch/replication strategies.  I was thinking of using
>>> > endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch
>>> >
>>> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
>>> > (for people hitting this from the mailing list or google, I feel
>>> obligated
>>> > to note that the former setting is in cassandra.yaml, and the latter is
>>> an
>>> > option on a keyspace).
>>> >
>>> > But, I'm only in one region. Is using the amazon snitch/networktopology
>>> > overkill given everything I have is in one DC (I believe region==DC and
>>> > availability_zone==rack).  I'm using multiple availability zones for
>>> some
>>> > level of redundancy, I'm just not yet to the point I'm using multiple
>>> > regions.  If someday I move to using multiple regions, would that
>>> change the
>>> > answer?
>>> >
>>> > Thanks!
>>> >
>>> > --
>>> > Will Oberman
>>> > Civic Science, Inc.
>>> > 3030 Penn Avenue., First Floor
>>> > Pittsburgh, PA 15201
>>> > (M) 412-480-7835
>>> > (E) oberman@civicscience.com
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>>
>>
>> --
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) oberman@civicscience.com
>>
>>
>>
>
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) oberman@civicscience.com
>



-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) oberman@civicscience.com

Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?

Posted by William Oberman <ob...@civicscience.com>.
One last coda, for other noobs to cassandra like me.  If you use
NetworkTopologyStrategy with replication_factor > 1, make sure you have EC2
instance in multiple availability zones.  I was doing baby steps, and tried
doing a cluster in one AZ (before spreading to multiple AZs) and was getting
the most baffling errors ("cassandra_UnavailableException").  I finally
thought to check the cassandra server logs (after debugging the client code,
firewalls, etc... painstakingly for connectivity problems), and it ends up
my cassandra cluster was considering itself "unavailable" as it couldn't
replicate as much as it wanted to.  I kind of wish a different word than
"unavailable" was chosen for this error condition :-)

will

On Tue, Apr 12, 2011 at 10:37 PM, aaron morton <aa...@thelastpickle.com>wrote:

> If you can use standard + encoded I would go with that.
>
> Aaron
>
> On 13 Apr 2011, at 07:07, William Oberman wrote:
>
> Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I
> found this out before digging a hole).
>
> The other issue I've been pondering is a normal column family of encoded
> objects (in my case JSON) vs. a super column.  Based on my use case, things
> I've read, etc...  right now I'm coming down on normal + encoded.
>
> will
>
> On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> NTS is overkill in the sense that it doesn't really benefit you in a
>> single DC, but if you think you may expand to another DC in the future
>> it's much simpler if you were already using NTS, than first migrating
>> to NTS (changing strategy is painful).
>>
>> I can't think of any downsides to using NTS in a single-DC
>> environment, so that's the "safe" option.
>>
>> On Tue, Apr 12, 2011 at 1:15 PM, William Oberman
>> <ob...@civicscience.com> wrote:
>> > Hi,
>> >
>> > I'm getting closer to commiting to cassandra, and now I'm in system/IT
>> > issues and questions.  I'm in the amazon EC2 cloud.  I previously used
>> this
>> > forum to discover the best practice for disk layouts (large instance +
>> the
>> > two ephemeral disks in RAID0 for data + root volume for everything
>> else).
>> > Now I'm hoping to confirm bits and pieces of things I've read about for
>> > snitch/replication strategies.  I was thinking of using
>> > endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch
>> >
>> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
>> > (for people hitting this from the mailing list or google, I feel
>> obligated
>> > to note that the former setting is in cassandra.yaml, and the latter is
>> an
>> > option on a keyspace).
>> >
>> > But, I'm only in one region. Is using the amazon snitch/networktopology
>> > overkill given everything I have is in one DC (I believe region==DC and
>> > availability_zone==rack).  I'm using multiple availability zones for
>> some
>> > level of redundancy, I'm just not yet to the point I'm using multiple
>> > regions.  If someday I move to using multiple regions, would that change
>> the
>> > answer?
>> >
>> > Thanks!
>> >
>> > --
>> > Will Oberman
>> > Civic Science, Inc.
>> > 3030 Penn Avenue., First Floor
>> > Pittsburgh, PA 15201
>> > (M) 412-480-7835
>> > (E) oberman@civicscience.com
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) oberman@civicscience.com
>
>
>


-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) oberman@civicscience.com

Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?

Posted by aaron morton <aa...@thelastpickle.com>.
If you can use standard + encoded I would go with that. 

Aaron

On 13 Apr 2011, at 07:07, William Oberman wrote:

> Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I found this out before digging a hole).
> 
> The other issue I've been pondering is a normal column family of encoded objects (in my case JSON) vs. a super column.  Based on my use case, things I've read, etc...  right now I'm coming down on normal + encoded.
> 
> will
> 
> On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> NTS is overkill in the sense that it doesn't really benefit you in a
> single DC, but if you think you may expand to another DC in the future
> it's much simpler if you were already using NTS, than first migrating
> to NTS (changing strategy is painful).
> 
> I can't think of any downsides to using NTS in a single-DC
> environment, so that's the "safe" option.
> 
> On Tue, Apr 12, 2011 at 1:15 PM, William Oberman
> <ob...@civicscience.com> wrote:
> > Hi,
> >
> > I'm getting closer to commiting to cassandra, and now I'm in system/IT
> > issues and questions.  I'm in the amazon EC2 cloud.  I previously used this
> > forum to discover the best practice for disk layouts (large instance + the
> > two ephemeral disks in RAID0 for data + root volume for everything else).
> > Now I'm hoping to confirm bits and pieces of things I've read about for
> > snitch/replication strategies.  I was thinking of using
> > endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch
> > placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
> > (for people hitting this from the mailing list or google, I feel obligated
> > to note that the former setting is in cassandra.yaml, and the latter is an
> > option on a keyspace).
> >
> > But, I'm only in one region. Is using the amazon snitch/networktopology
> > overkill given everything I have is in one DC (I believe region==DC and
> > availability_zone==rack).  I'm using multiple availability zones for some
> > level of redundancy, I'm just not yet to the point I'm using multiple
> > regions.  If someday I move to using multiple regions, would that change the
> > answer?
> >
> > Thanks!
> >
> > --
> > Will Oberman
> > Civic Science, Inc.
> > 3030 Penn Avenue., First Floor
> > Pittsburgh, PA 15201
> > (M) 412-480-7835
> > (E) oberman@civicscience.com
> >
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 
> 
> 
> -- 
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) oberman@civicscience.com


Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?

Posted by William Oberman <ob...@civicscience.com>.
Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I
found this out before digging a hole).

The other issue I've been pondering is a normal column family of encoded
objects (in my case JSON) vs. a super column.  Based on my use case, things
I've read, etc...  right now I'm coming down on normal + encoded.

will

On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> NTS is overkill in the sense that it doesn't really benefit you in a
> single DC, but if you think you may expand to another DC in the future
> it's much simpler if you were already using NTS, than first migrating
> to NTS (changing strategy is painful).
>
> I can't think of any downsides to using NTS in a single-DC
> environment, so that's the "safe" option.
>
> On Tue, Apr 12, 2011 at 1:15 PM, William Oberman
> <ob...@civicscience.com> wrote:
> > Hi,
> >
> > I'm getting closer to commiting to cassandra, and now I'm in system/IT
> > issues and questions.  I'm in the amazon EC2 cloud.  I previously used
> this
> > forum to discover the best practice for disk layouts (large instance +
> the
> > two ephemeral disks in RAID0 for data + root volume for everything else).
> > Now I'm hoping to confirm bits and pieces of things I've read about for
> > snitch/replication strategies.  I was thinking of using
> > endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch
> > placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
> > (for people hitting this from the mailing list or google, I feel
> obligated
> > to note that the former setting is in cassandra.yaml, and the latter is
> an
> > option on a keyspace).
> >
> > But, I'm only in one region. Is using the amazon snitch/networktopology
> > overkill given everything I have is in one DC (I believe region==DC and
> > availability_zone==rack).  I'm using multiple availability zones for some
> > level of redundancy, I'm just not yet to the point I'm using multiple
> > regions.  If someday I move to using multiple regions, would that change
> the
> > answer?
> >
> > Thanks!
> >
> > --
> > Will Oberman
> > Civic Science, Inc.
> > 3030 Penn Avenue., First Floor
> > Pittsburgh, PA 15201
> > (M) 412-480-7835
> > (E) oberman@civicscience.com
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) oberman@civicscience.com

Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?

Posted by Jonathan Ellis <jb...@gmail.com>.
NTS is overkill in the sense that it doesn't really benefit you in a
single DC, but if you think you may expand to another DC in the future
it's much simpler if you were already using NTS, than first migrating
to NTS (changing strategy is painful).

I can't think of any downsides to using NTS in a single-DC
environment, so that's the "safe" option.

On Tue, Apr 12, 2011 at 1:15 PM, William Oberman
<ob...@civicscience.com> wrote:
> Hi,
>
> I'm getting closer to commiting to cassandra, and now I'm in system/IT
> issues and questions.  I'm in the amazon EC2 cloud.  I previously used this
> forum to discover the best practice for disk layouts (large instance + the
> two ephemeral disks in RAID0 for data + root volume for everything else).
> Now I'm hoping to confirm bits and pieces of things I've read about for
> snitch/replication strategies.  I was thinking of using
> endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
> (for people hitting this from the mailing list or google, I feel obligated
> to note that the former setting is in cassandra.yaml, and the latter is an
> option on a keyspace).
>
> But, I'm only in one region. Is using the amazon snitch/networktopology
> overkill given everything I have is in one DC (I believe region==DC and
> availability_zone==rack).  I'm using multiple availability zones for some
> level of redundancy, I'm just not yet to the point I'm using multiple
> regions.  If someday I move to using multiple regions, would that change the
> answer?
>
> Thanks!
>
> --
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) oberman@civicscience.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com