You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jedd Rashbrooke <je...@visualdna.com> on 2011/03/10 13:06:37 UTC

On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

 Howdi,

 Assortment of questions relating to an upgrade combined with a
 possible migration between Data Centers (or perhaps a multi-DC
 redesign).  Apologies if some of these have been asked before - I
 have kept half an eye on the list in recent times but haven't seen
 anything covering these particular aspects.


 Upgrade path:
 We're running a 16 node cluster on Amazon EC2, in a single DC
 (US) using 0.6.6.  We didn't do the 0.6.x upgrades mostly because
 things have 'just worked' (and it took a while to get to that stage).
 My question is whether it's considered safer to upgrade via 0.6.12
 to 0.7, or if a direct 0.6.6 -> 0.7 upgrade is safe enough?


 Copying a cluster between AWS DC's:
 We have ~ 150-250GB per node, with a Replication Factor of 4.
 I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
 minimise that outage period I was wondering if it's possible to
 drain & stop the cluster, then copy over only the 1st, 5th, 9th,
 and 13th nodes' worth of data (which should be a full copy of
 all our actual data - we are nicely partitioned, despite the
 disparity in GB per node) and have Cassandra re-populate the
 new destination 16 nodes from those four data sets.  If this is
 feasible, is it likely to be more expensive (in terms of time the
 new cluster is unresponsive as it rebuilds) than just copying
 across all 16 sets of data - about 2.7TB.


 Chattiness / gossip traffic requirements on DC-aware:
 I haven't pondered deeply on a 7 design yet, so this question is
 even more nebulous.  We're seeing growth (raw) of about 100GB
 per month on our 16 node RF4 cluster - say about 25GB of 'actual'
 data growth.  We don't delete (much) data.  Amazon's calculator
 suggests even 100GB in/out of a data center is modestly priced,
 but I'm cautious in case the replication traffic is particularly chatty
 or excessive.  And how expensive (in terms of traffic) a compaction
 or repair would be across data centers.  Has anyone had any
 experience with an EC2 cluster running 0.7 and traversing the
 pond?  Either in terms of traffic to cluster size, or $-cost to cluster
 size ratios would be fantastic.

 taa,
 Jedd.

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Posted by David Boxenhorn <da...@taotown.com>.

How do you write to two versions of Cassandra from the same client? Two
versions of Hector?

On Mon, Mar 14, 2011 at 6:46 PM, Robert Coli <rc...@digg.com> wrote:

> On Mon, Mar 14, 2011 at 8:39 AM, Jedd Rashbrooke <je...@visualdna.com>
> wrote:
> >  But more importantly for us it would mean we'd have just the
> >  one major outage, rather than two (relocation and 0.6 -> 0.7)
>
> Take zero major outages instead? :D
>
> a) Set up new cluster on new version.
> b) Fork application writes, so all writes go to both clusters.
> c) Backfill old data to new cluster via API writes.
> d) Flip the switch to read from the new cluster.
> e) Turn off old cluster.
>
> =Rob
>

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Posted by Robert Coli <rc...@digg.com>.

On Mon, Mar 14, 2011 at 8:39 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>  But more importantly for us it would mean we'd have just the
>  one major outage, rather than two (relocation and 0.6 -> 0.7)

Take zero major outages instead? :D

a) Set up new cluster on new version.
b) Fork application writes, so all writes go to both clusters.
c) Backfill old data to new cluster via API writes.
d) Flip the switch to read from the new cluster.
e) Turn off old cluster.

=Rob

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Posted by Jonathan Ellis <jb...@gmail.com>.

Right.

Only subtlety is the system keyspace; cleanest is to just start from
scratch there (which means rebuilding the schema) but you could also
start with a copy of an existing node's (just one) and start up with
-Dcassandra.load_ring_state=false.

On Fri, Mar 18, 2011 at 2:29 PM, Jeremiah Jordan
<JE...@morningstar.com> wrote:
> So can one just take all of the *.db files from all the machines in a cluster, put them in a folder together (renaming ones with the same number?) and start up a node which will then have access to all the data?
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Wednesday, March 16, 2011 1:59 PM
> To: user@cassandra.apache.org
> Cc: Jedd Rashbrooke
> Subject: Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer
>
> That should work then, assuming SimpleStrategy/RackUnawareStrategy.
> Otherwise figuring out which machines share which data gets
> complicated.
>
> Note that if you have room on the machines, it's going to be faster to
> copy the entire data set to each machine and run cleanup, than to have
> repair fix 3 of 4 replicas from scratch.  Repair would work,
> eventually, but it's kind of a worst-case scenario for it.
>
> On Mon, Mar 14, 2011 at 10:39 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>>  Jonathon, thank you for your answers here.
>>
>>  To explain this bit ...
>>
>> On 11 March 2011 20:46, Jonathan Ellis <jb...@gmail.com> wrote:
>>> On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>>>>  Copying a cluster between AWS DC's:
>>>>  We have ~ 150-250GB per node, with a Replication Factor of 4.
>>>>  I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
>>>>  minimise that outage period I was wondering if it's possible to
>>>>  drain & stop the cluster, then copy over only the 1st, 5th, 9th,
>>>>  and 13th nodes' worth of data (which should be a full copy of
>>>>  all our actual data - we are nicely partitioned, despite the
>>>>  disparity in GB per node) and have Cassandra re-populate the
>>>>  new destination 16 nodes from those four data sets.  If this is
>>>>  feasible, is it likely to be more expensive (in terms of time the
>>>>  new cluster is unresponsive as it rebuilds) than just copying
>>>>  across all 16 sets of data - about 2.7TB.
>>>
>>> I'm confused.  You're trying to upgrade and add a DC at the same time?
>>
>>  Yeah, I know, it's probably not the sanest route - but the hardware
>>  (virtualised, Amazonish EC2 that it is) will be the same between
>>  the two sites, so that reduces some of the usual roll in / roll out
>>  migration risk.
>>
>>  But more importantly for us it would mean we'd have just the
>>  one major outage, rather than two (relocation and 0.6 -> 0.7)
>>
>>  cheers,
>>  Jedd.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

RE: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Posted by Jeremiah Jordan <JE...@morningstar.com>.

So can one just take all of the *.db files from all the machines in a cluster, put them in a folder together (renaming ones with the same number?) and start up a node which will then have access to all the data?

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Wednesday, March 16, 2011 1:59 PM
To: user@cassandra.apache.org
Cc: Jedd Rashbrooke
Subject: Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

That should work then, assuming SimpleStrategy/RackUnawareStrategy.
Otherwise figuring out which machines share which data gets
complicated.

Note that if you have room on the machines, it's going to be faster to
copy the entire data set to each machine and run cleanup, than to have
repair fix 3 of 4 replicas from scratch.  Repair would work,
eventually, but it's kind of a worst-case scenario for it.

On Mon, Mar 14, 2011 at 10:39 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>  Jonathon, thank you for your answers here.
>
>  To explain this bit ...
>
> On 11 March 2011 20:46, Jonathan Ellis <jb...@gmail.com> wrote:
>> On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>>>  Copying a cluster between AWS DC's:
>>>  We have ~ 150-250GB per node, with a Replication Factor of 4.
>>>  I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
>>>  minimise that outage period I was wondering if it's possible to
>>>  drain & stop the cluster, then copy over only the 1st, 5th, 9th,
>>>  and 13th nodes' worth of data (which should be a full copy of
>>>  all our actual data - we are nicely partitioned, despite the
>>>  disparity in GB per node) and have Cassandra re-populate the
>>>  new destination 16 nodes from those four data sets.  If this is
>>>  feasible, is it likely to be more expensive (in terms of time the
>>>  new cluster is unresponsive as it rebuilds) than just copying
>>>  across all 16 sets of data - about 2.7TB.
>>
>> I'm confused.  You're trying to upgrade and add a DC at the same time?
>
>  Yeah, I know, it's probably not the sanest route - but the hardware
>  (virtualised, Amazonish EC2 that it is) will be the same between
>  the two sites, so that reduces some of the usual roll in / roll out
>  migration risk.
>
>  But more importantly for us it would mean we'd have just the
>  one major outage, rather than two (relocation and 0.6 -> 0.7)
>
>  cheers,
>  Jedd.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Posted by Jonathan Ellis <jb...@gmail.com>.

That should work then, assuming SimpleStrategy/RackUnawareStrategy.
Otherwise figuring out which machines share which data gets
complicated.

Note that if you have room on the machines, it's going to be faster to
copy the entire data set to each machine and run cleanup, than to have
repair fix 3 of 4 replicas from scratch.  Repair would work,
eventually, but it's kind of a worst-case scenario for it.

On Mon, Mar 14, 2011 at 10:39 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>  Jonathon, thank you for your answers here.
>
>  To explain this bit ...
>
> On 11 March 2011 20:46, Jonathan Ellis <jb...@gmail.com> wrote:
>> On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>>>  Copying a cluster between AWS DC's:
>>>  We have ~ 150-250GB per node, with a Replication Factor of 4.
>>>  I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
>>>  minimise that outage period I was wondering if it's possible to
>>>  drain & stop the cluster, then copy over only the 1st, 5th, 9th,
>>>  and 13th nodes' worth of data (which should be a full copy of
>>>  all our actual data - we are nicely partitioned, despite the
>>>  disparity in GB per node) and have Cassandra re-populate the
>>>  new destination 16 nodes from those four data sets.  If this is
>>>  feasible, is it likely to be more expensive (in terms of time the
>>>  new cluster is unresponsive as it rebuilds) than just copying
>>>  across all 16 sets of data - about 2.7TB.
>>
>> I'm confused.  You're trying to upgrade and add a DC at the same time?
>
>  Yeah, I know, it's probably not the sanest route - but the hardware
>  (virtualised, Amazonish EC2 that it is) will be the same between
>  the two sites, so that reduces some of the usual roll in / roll out
>  migration risk.
>
>  But more importantly for us it would mean we'd have just the
>  one major outage, rather than two (relocation and 0.6 -> 0.7)
>
>  cheers,
>  Jedd.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Posted by Jedd Rashbrooke <je...@visualdna.com>.

 Jonathon, thank you for your answers here.

 To explain this bit ...

On 11 March 2011 20:46, Jonathan Ellis <jb...@gmail.com> wrote:
> On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>>  Copying a cluster between AWS DC's:
>>  We have ~ 150-250GB per node, with a Replication Factor of 4.
>>  I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
>>  minimise that outage period I was wondering if it's possible to
>>  drain & stop the cluster, then copy over only the 1st, 5th, 9th,
>>  and 13th nodes' worth of data (which should be a full copy of
>>  all our actual data - we are nicely partitioned, despite the
>>  disparity in GB per node) and have Cassandra re-populate the
>>  new destination 16 nodes from those four data sets.  If this is
>>  feasible, is it likely to be more expensive (in terms of time the
>>  new cluster is unresponsive as it rebuilds) than just copying
>>  across all 16 sets of data - about 2.7TB.
>
> I'm confused.  You're trying to upgrade and add a DC at the same time?

 Yeah, I know, it's probably not the sanest route - but the hardware
 (virtualised, Amazonish EC2 that it is) will be the same between
 the two sites, so that reduces some of the usual roll in / roll out
 migration risk.

 But more importantly for us it would mean we'd have just the
 one major outage, rather than two (relocation and 0.6 -> 0.7)

 cheers,
 Jedd.

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Posted by Chris Burroughs <ch...@gmail.com>.

On 03/11/2011 03:46 PM, Jonathan Ellis wrote:
> Repairs is not yet WAN-optimized but is still cheap if your replicas
> are close to consistent since only merkle trees + inconsistent ranges
> are sent over the network.
> 

What is the ticket number for WAN optimized repair?

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <je...@visualdna.com> wrote:
>  My question is whether it's considered safer to upgrade via 0.6.12
>  to 0.7, or if a direct 0.6.6 -> 0.7 upgrade is safe enough?

You don't need latest 0.6 before upgrading.

>  Copying a cluster between AWS DC's:
>  We have ~ 150-250GB per node, with a Replication Factor of 4.
>  I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
>  minimise that outage period I was wondering if it's possible to
>  drain & stop the cluster, then copy over only the 1st, 5th, 9th,
>  and 13th nodes' worth of data (which should be a full copy of
>  all our actual data - we are nicely partitioned, despite the
>  disparity in GB per node) and have Cassandra re-populate the
>  new destination 16 nodes from those four data sets.  If this is
>  feasible, is it likely to be more expensive (in terms of time the
>  new cluster is unresponsive as it rebuilds) than just copying
>  across all 16 sets of data - about 2.7TB.

I'm confused.  You're trying to upgrade and add a DC at the same time?

>  Chattiness / gossip traffic requirements on DC-aware:
>  I haven't pondered deeply on a 7 design yet, so this question is
>  even more nebulous.  We're seeing growth (raw) of about 100GB
>  per month on our 16 node RF4 cluster - say about 25GB of 'actual'
>  data growth.  We don't delete (much) data.  Amazon's calculator
>  suggests even 100GB in/out of a data center is modestly priced,
>  but I'm cautious in case the replication traffic is particularly chatty
>  or excessive.  And how expensive (in terms of traffic) a compaction
>  or repair would be across data centers.

Compactions are node-local.

Normal writes are optimized for the WAN (only one copy will be sent
between DCs; the recipient in the other DC will then forward it to
other replicas there).

Repairs is not yet WAN-optimized but is still cheap if your replicas
are close to consistent since only merkle trees + inconsistent ranges
are sent over the network.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com