You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Matthew Stump <mr...@gmail.com> on 2010/03/29 19:31:06 UTC

How reliable is cassandra?

Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

Re: How reliable is cassandra?

Posted by Tatu Saloranta <ts...@gmail.com>.
On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump <mr...@gmail.com> wrote:
> Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

Back to original question: in my completely personal opinion, you
probably should only consider this if you have a very good reason to
have to do it.
Cassandra is a very promising system, but given all the development, I
would not risk my career by storing absolutely-never-to-be-lost data.
One benefit of existing RDBMSs is that they have good data
export/backup tools; DBAs that can take over responsibility of
managing these aspects and son.

One important follow-up questions is this: do you really only need one
system? Most succeful large-scale systems I have worked on use
two-part division:

(a) Management system on RDBMS (oracle etc) -- ACID modifications, backups
(b) Delivery system possibly using something else (extensive caching;
key/value store; read-only replicas)

and with one-way synchronization from (a) to (b), with some delay --
be that via message queue, snapshots, whatever. Main benefit: linear
scalability for (b); ACID guarantees for (a).
"one system to store it all" view tends to be overvalued nowadays,
over "right tool for the job".

And for (b) I would absolutely seriously consider Cassandra or other
nosql systems.
But at least for now, not use it for (a).

Just my 2c,

-+ Tatu +-

Re: How reliable is cassandra?

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Mon, 29 Mar 2010 10:31:06 -0700 Matthew Stump <mr...@gmail.com> wrote: 

MS> Am I crazy to want to switch our server's primary data store from
MS> postgres to cassandra?  This is a system used by banks and
MS> governments to store crypto keys which absolutely can not be lost.

Run a test pilot for N months (depending on internal factors, N can be
3-12).  I think you'll find out more that way than by asking people who
have not seen your environment, your data, or your code.

Ted


Re: How reliable is cassandra?

Posted by Avinash Lakshman <av...@gmail.com>.
We use ZK for some incrementing counters and this is method that does it
(this is wrapped in a Thrift call) :

 public long getNextSequenceId()
    {
      Stat stat = null;
      String path = "/<cluster name>/" +  "/SequenceId";
      try
      {
        stat = zk_.setData( path , new byte[0] , -1);
      }
      catch (InterruptedException ie)
      {
        // handle it.
      }
      catch (KeeperException ke)
      {
        // handle it.
      }
      return stat.getMzxid();
    }


Avinash




On Mon, Mar 29, 2010 at 11:40 AM, Eric Hauser <ew...@gmail.com> wrote:

> That's good to know.  I've often seen high latency between availability
> zones.
>
> BTW, does anyone from Digg patrol the list?  I'm really interested in some
> additional the implementation of atomic counters with ZooKeeper.
>
>
> On Mon, Mar 29, 2010 at 1:58 PM, Joe Stump <jo...@joestump.net> wrote:
>
>>
>> On Mar 29, 2010, at 11:55 AM, Eric Hauser wrote:
>>
>> > Does the information is the below link about Cassandra and replication
>> over WAN have any merit or is it just FUD?
>>
>> I can attest Cassandra works fine over inter-DC connections. We have ~20
>> nodes spread across three Amazon "Availability Zones". I know Digg is doing
>> bi-coastal Cassandra as well without issues.
>>
>> Cassandra was specifically built to work well over WAN / intermittently
>> failing connections / setups.
>>
>> --Joe
>>
>>
>

Re: Write times

Posted by Carlos Sanchez <ca...@riskmetrics.com>.
Thanks a lot David


On Mar 29, 2010, at 6:53 PM, David Strauss wrote:

> The partitioner *is* the method by which Cassandra selects the node to
> write to. Even if the client picks a node and requests a write there,
> Cassandra will still do the write where it knows it belongs. Every node
> is a gateway to do anything, anywhere in the cluster.
>
> On 2010-03-29 23:31, Carlos Sanchez wrote:
>> Would it be best then for the client to select the node to write to when using OPP in order to evenly distributes the keys?
>>
>> On Mar 29, 2010, at 6:05 PM, David Timothy Strauss wrote:
>>
>>> OPP should only affect write speed if OPP's tendency to unevenly distribute load causes some nodes to be overworked.
>>>
>>> In other words, OPP vs. RP on a single node system should have no real effect.
>>>
>>> -----Original Message-----
>>> From: Carlos Sanchez <ca...@riskmetrics.com>
>>> Date: Mon, 29 Mar 2010 18:58:50
>>> To: user@cassandra.apache.org<us...@cassandra.apache.org>
>>> Subject: Write times
>>>
>>> Are writes on OrderPreservingPartitioner always slower than RandomPartitioner? Is the replication factor a 'factor' in the write times?
>
> --
> David Strauss
>   | david@fourkitchens.com
>   | +1 512 577 5827 [mobile]
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>


This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: Write times

Posted by David Strauss <da...@fourkitchens.com>.
The partitioner *is* the method by which Cassandra selects the node to
write to. Even if the client picks a node and requests a write there,
Cassandra will still do the write where it knows it belongs. Every node
is a gateway to do anything, anywhere in the cluster.

On 2010-03-29 23:31, Carlos Sanchez wrote:
> Would it be best then for the client to select the node to write to when using OPP in order to evenly distributes the keys?
> 
> On Mar 29, 2010, at 6:05 PM, David Timothy Strauss wrote:
> 
>> OPP should only affect write speed if OPP's tendency to unevenly distribute load causes some nodes to be overworked.
>>
>> In other words, OPP vs. RP on a single node system should have no real effect.
>>
>> -----Original Message-----
>> From: Carlos Sanchez <ca...@riskmetrics.com>
>> Date: Mon, 29 Mar 2010 18:58:50
>> To: user@cassandra.apache.org<us...@cassandra.apache.org>
>> Subject: Write times
>>
>> Are writes on OrderPreservingPartitioner always slower than RandomPartitioner? Is the replication factor a 'factor' in the write times?

-- 
David Strauss
   | david@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]


Re: Write times

Posted by Carlos Sanchez <ca...@riskmetrics.com>.
Would it be best then for the client to select the node to write to when using OPP in order to evenly distributes the keys?

On Mar 29, 2010, at 6:05 PM, David Timothy Strauss wrote:

> OPP should only affect write speed if OPP's tendency to unevenly distribute load causes some nodes to be overworked.
>
> In other words, OPP vs. RP on a single node system should have no real effect.
>
> -----Original Message-----
> From: Carlos Sanchez <ca...@riskmetrics.com>
> Date: Mon, 29 Mar 2010 18:58:50
> To: user@cassandra.apache.org<us...@cassandra.apache.org>
> Subject: Write times
>
> Are writes on OrderPreservingPartitioner always slower than RandomPartitioner? Is the replication factor a 'factor' in the write times?
>
> Thanks,
>
> Carlos
>
> This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.


This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: Write times

Posted by David Timothy Strauss <da...@fourkitchens.com>.
OPP should only affect write speed if OPP's tendency to unevenly distribute load causes some nodes to be overworked.

In other words, OPP vs. RP on a single node system should have no real effect.

-----Original Message-----
From: Carlos Sanchez <ca...@riskmetrics.com>
Date: Mon, 29 Mar 2010 18:58:50 
To: user@cassandra.apache.org<us...@cassandra.apache.org>
Subject: Write times

Are writes on OrderPreservingPartitioner always slower than RandomPartitioner? Is the replication factor a 'factor' in the write times?

Thanks,

Carlos

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Write times

Posted by Carlos Sanchez <ca...@riskmetrics.com>.
Are writes on OrderPreservingPartitioner always slower than RandomPartitioner? Is the replication factor a 'factor' in the write times?

Thanks,

Carlos

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: How reliable is cassandra?

Posted by Eric Hauser <ew...@gmail.com>.
Thanks to all that responded.  That was helpful information.


On Mon, Mar 29, 2010 at 3:45 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Mon, Mar 29, 2010 at 2:41 PM, Joe Stump <jo...@joestump.net> wrote:
> > I know at least three Diggers patrol the list and one of them is a
> committer to Cassandra. Last I heard from my former coworkers at Digg was
> that ZooKeeper can be more overhead than wanted when doing locks in a high
> write environment.
>
> ZK basically does 2PC for each write, so all the ZK members need to be
> in the same datacenter.  Digg is operating 2 data centers now, so
> that's not a good fit -- either you have bad latency every time a
> client in the 2nd DC needs to access ZK or everyone has terrible write
> performance.
>
> ZK should be fine in a single datacenter though.
>
> -Jonathan
>

Re: How reliable is cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, Mar 29, 2010 at 2:41 PM, Joe Stump <jo...@joestump.net> wrote:
> I know at least three Diggers patrol the list and one of them is a committer to Cassandra. Last I heard from my former coworkers at Digg was that ZooKeeper can be more overhead than wanted when doing locks in a high write environment.

ZK basically does 2PC for each write, so all the ZK members need to be
in the same datacenter.  Digg is operating 2 data centers now, so
that's not a good fit -- either you have bad latency every time a
client in the 2nd DC needs to access ZK or everyone has terrible write
performance.

ZK should be fine in a single datacenter though.

-Jonathan

Re: How reliable is cassandra?

Posted by Joe Stump <jo...@joestump.net>.
On Mar 29, 2010, at 12:40 PM, Eric Hauser wrote:

> BTW, does anyone from Digg patrol the list?  I'm really interested in some additional the implementation of atomic counters with ZooKeeper.

I know at least three Diggers patrol the list and one of them is a committer to Cassandra. Last I heard from my former coworkers at Digg was that ZooKeeper can be more overhead than wanted when doing locks in a high write environment. I think Ian quoted something obscene like 100ms+ to get a lock. I've also heard the Python implementation is lacking.

That being said, I've heard lots of good things from many people about ZK.

--Joe


Re: How reliable is cassandra?

Posted by Eric Hauser <ew...@gmail.com>.
That's good to know.  I've often seen high latency between availability
zones.

BTW, does anyone from Digg patrol the list?  I'm really interested in some
additional the implementation of atomic counters with ZooKeeper.


On Mon, Mar 29, 2010 at 1:58 PM, Joe Stump <jo...@joestump.net> wrote:

>
> On Mar 29, 2010, at 11:55 AM, Eric Hauser wrote:
>
> > Does the information is the below link about Cassandra and replication
> over WAN have any merit or is it just FUD?
>
> I can attest Cassandra works fine over inter-DC connections. We have ~20
> nodes spread across three Amazon "Availability Zones". I know Digg is doing
> bi-coastal Cassandra as well without issues.
>
> Cassandra was specifically built to work well over WAN / intermittently
> failing connections / setups.
>
> --Joe
>
>

Re: How reliable is cassandra?

Posted by Joe Stump <jo...@joestump.net>.
On Mar 29, 2010, at 11:55 AM, Eric Hauser wrote:

> Does the information is the below link about Cassandra and replication over WAN have any merit or is it just FUD?

I can attest Cassandra works fine over inter-DC connections. We have ~20 nodes spread across three Amazon "Availability Zones". I know Digg is doing bi-coastal Cassandra as well without issues.

Cassandra was specifically built to work well over WAN / intermittently failing connections / setups.

--Joe


Re: How reliable is cassandra?

Posted by Benjamin Black <b...@b3k.us>.
That post is nonsense, start to finish.  Disregard everything it says
about both Cassandra and HBase.

On Mon, Mar 29, 2010 at 10:55 AM, Eric Hauser <ew...@gmail.com> wrote:
> Does the information is the below link about Cassandra and replication over
> WAN have any merit or is it just FUD?
> http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
>
>
> On Mon, Mar 29, 2010 at 1:51 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> Cassandra is an excellent choice for systems that Can't Lose Data.
>>
>>  - real single-server durability (set CommitLogSync to "batch"), not
>> just "hope it replicates somewhere before you lose power"
>>  - best multi-DC replication anywhere
>>  - immutable data files mean it's very difficult to introduce corruption
>> bugs
>>  - SnapshotBeforeCompaction option allows taking a snapshot of all
>> your data files before replacing them with newer ones, so you can roll
>> back to just before any compaction (as long as you have enough disk
>> space to hold the snapshots of course)
>>
>> -Jonathan
>>
>> On Mon, Mar 29, 2010 at 12:31 PM, Matthew Stump <mr...@gmail.com>
>> wrote:
>> > Am I crazy to want to switch our server's primary data store from
>> > postgres to cassandra?  This is a system used by banks and governments to
>> > store crypto keys which absolutely can not be lost.
>
>

Re: How reliable is cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.
FUD is a good description of that piece to use in polite company. :)

On Mon, Mar 29, 2010 at 12:55 PM, Eric Hauser <ew...@gmail.com> wrote:
> Does the information is the below link about Cassandra and replication over
> WAN have any merit or is it just FUD?
> http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
>
>
> On Mon, Mar 29, 2010 at 1:51 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> Cassandra is an excellent choice for systems that Can't Lose Data.
>>
>>  - real single-server durability (set CommitLogSync to "batch"), not
>> just "hope it replicates somewhere before you lose power"
>>  - best multi-DC replication anywhere
>>  - immutable data files mean it's very difficult to introduce corruption
>> bugs
>>  - SnapshotBeforeCompaction option allows taking a snapshot of all
>> your data files before replacing them with newer ones, so you can roll
>> back to just before any compaction (as long as you have enough disk
>> space to hold the snapshots of course)
>>
>> -Jonathan
>>
>> On Mon, Mar 29, 2010 at 12:31 PM, Matthew Stump <mr...@gmail.com>
>> wrote:
>> > Am I crazy to want to switch our server's primary data store from
>> > postgres to cassandra?  This is a system used by banks and governments to
>> > store crypto keys which absolutely can not be lost.
>
>

Re: How reliable is cassandra?

Posted by Eric Hauser <ew...@gmail.com>.
Does the information is the below link about Cassandra and replication over
WAN have any merit or is it just FUD?

http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/


On Mon, Mar 29, 2010 at 1:51 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Cassandra is an excellent choice for systems that Can't Lose Data.
>
>  - real single-server durability (set CommitLogSync to "batch"), not
> just "hope it replicates somewhere before you lose power"
>  - best multi-DC replication anywhere
>  - immutable data files mean it's very difficult to introduce corruption
> bugs
>  - SnapshotBeforeCompaction option allows taking a snapshot of all
> your data files before replacing them with newer ones, so you can roll
> back to just before any compaction (as long as you have enough disk
> space to hold the snapshots of course)
>
> -Jonathan
>
> On Mon, Mar 29, 2010 at 12:31 PM, Matthew Stump <mr...@gmail.com>
> wrote:
> > Am I crazy to want to switch our server's primary data store from
> postgres to cassandra?  This is a system used by banks and governments to
> store crypto keys which absolutely can not be lost.
>

Re: How reliable is cassandra?

Posted by Jonathan Ellis <jb...@gmail.com>.
Cassandra is an excellent choice for systems that Can't Lose Data.

 - real single-server durability (set CommitLogSync to "batch"), not
just "hope it replicates somewhere before you lose power"
 - best multi-DC replication anywhere
 - immutable data files mean it's very difficult to introduce corruption bugs
 - SnapshotBeforeCompaction option allows taking a snapshot of all
your data files before replacing them with newer ones, so you can roll
back to just before any compaction (as long as you have enough disk
space to hold the snapshots of course)

-Jonathan

On Mon, Mar 29, 2010 at 12:31 PM, Matthew Stump <mr...@gmail.com> wrote:
> Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

Re: How reliable is cassandra?

Posted by Matthew Stump <mr...@gmail.com>.
We are actually fairly write heavy.  User enrollment, auditing, grouping, key maintenance all involve writing a fair amount of meta data to disk.  If we were performing mostly read operations then postgres/clustering performance wouldn't be an issue.

On Mar 29, 2010, at 4:49 PM, David Strauss wrote:

On 2010-03-29 17:31, Matthew Stump wrote:
> Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

This sounds like an LDAP problem. There are very nice LDAP systems
available that support multi-master replication and are commonly used as
key stores. Also, I imagine reads are more important than writes for
you, and LDAP tools tend to be optimized for reads.

-- 
David Strauss
  | david@fourkitchens.com
Four Kitchens
  | http://fourkitchens.com
  | +1 512 454 6659 [office]
  | +1 512 870 8453 [direct]



Re: How reliable is cassandra?

Posted by David Strauss <da...@fourkitchens.com>.
On 2010-03-29 17:31, Matthew Stump wrote:
> Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

This sounds like an LDAP problem. There are very nice LDAP systems
available that support multi-master replication and are commonly used as
key stores. Also, I imagine reads are more important than writes for
you, and LDAP tools tend to be optimized for reads.

-- 
David Strauss
   | david@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]


Re: How reliable is cassandra?

Posted by Matthew Stump <mr...@gmail.com>.
* Higher write throughput is one benefit.  User enrollment, auditing, keeping track of client state and replication all generate a fair number of writes which degrades postgres performance.

* Built in clustering.  Postgres clustering is immature and even when things start to settle down, probably next year, we will still be left with a cluster that can only provide us with single master write.  Right now we have our own clustering tech which has it's plusses and minus; the minuses being that we have to maintain the code and even though our clustering works fairly well it's not our core competency.

* Lower cost of deployment.  In order to scale postgres you need fast disk solutions and allot of memory.  If we were to switch to another database (Oracle, DB2) the cost of deployment goes up even further.  With cassandra we can use commodity hardware.

* Map/reduce will be in 0.6, allowing us to better distribute jobs such as key maintenance which is fairly expensive computationally.

* Postgres can't scale well to the demands of our largest customers.  We need to rejigger our storage architecture anyways, so now is a good time to look at what Cassandra can offer.


On Mar 29, 2010, at 10:47 AM, Joe Van Dyk wrote:

On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump <mr...@gmail.com> wrote:
> Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

What benefits would you get from switching?

Joe


Re: How reliable is cassandra?

Posted by Ned Wolpert <ne...@imemories.com>.
The real question is can you handle 'eventual consistency' in this
situation? Cassandra is not designed to lose data... quite the opposite.

On Mon, Mar 29, 2010 at 10:47 AM, Joe Van Dyk <jo...@gmail.com> wrote:

> On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump <mr...@gmail.com>
> wrote:
> > Am I crazy to want to switch our server's primary data store from
> postgres to cassandra?  This is a system used by banks and governments to
> store crypto keys which absolutely can not be lost.
>
> What benefits would you get from switching?
>
> Joe
>



-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe

Re: How reliable is cassandra?

Posted by Joe Van Dyk <jo...@gmail.com>.
On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump <mr...@gmail.com> wrote:
> Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

What benefits would you get from switching?

Joe

Re: How reliable is cassandra?

Posted by Matthew Stump <mr...@gmail.com>.
I'm not too worried about ACLs, I'm going to have to tunnel Cassandra through SSL and for most deployments the data that matters will be encrypted using fairly large key sizes.  The nodes that aren't allowed to store private keys will probably access data through a Thrift API which will use our own permission scheme.


On Mar 29, 2010, at 10:33 AM, Joe Stump wrote:


On Mar 29, 2010, at 11:31 AM, Matthew Stump wrote:

> Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

You might be crazy. PostgreSQL has all sorts of ACLs and other features for locking down various portions of the system. Cassandra has none of that, that I'm aware of.

That being said, if you're looking for fault-tolerance across data centers Cassandra is probably a better option. Just make sure you're writing and reading with the highest consistency level possible.

Finally, you have a really rad last name. I approve.

--Joe


Re: How reliable is cassandra?

Posted by Joe Stump <jo...@joestump.net>.
On Mar 29, 2010, at 11:31 AM, Matthew Stump wrote:

> Am I crazy to want to switch our server's primary data store from postgres to cassandra?  This is a system used by banks and governments to store crypto keys which absolutely can not be lost.

You might be crazy. PostgreSQL has all sorts of ACLs and other features for locking down various portions of the system. Cassandra has none of that, that I'm aware of.

That being said, if you're looking for fault-tolerance across data centers Cassandra is probably a better option. Just make sure you're writing and reading with the highest consistency level possible.

Finally, you have a really rad last name. I approve.

--Joe