You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Craig Ching <cr...@gmail.com> on 2010/10/14 21:45:57 UTC

Does Cassandra fit my requirements?

Hi!

I've been reading the wiki and some posts to this mailing list and writing
some tests to discover if Cassandra can be made to fit my needs.

For the most part, things are looking good.  However, I have one issue that
I am currently having problems with and it's making me think that maybe
Cassandra isn't a good fit for my needs.  For some of my larger users,
having a cluster of cassandra nodes that need to be separately maintained
might not be a huge issue.  However, for some smaller users, having to have
a separately maintained cassandra cluster will be too much of a management
headache.

For those users, what I was thinking I could do is run an embedded cassandra
node in our application in replicated mode.  So, there are two problems that
I'm seeing with this:

1.  Embedding doesn't seem to be a normal use-case for people (at least
that's the impression I got from the wiki in the section on embedding).
 This probably isn't a huge issue, I could just install a cassandra node
along with our application, but I'd like to hear more if people think
embedding is a good idea or what problems I might run into if I do choose to
embed.

2.  What we really want for small users is just a replicated data store with
some caching.  Each of our application instances would have all the data
available to it.  If one application instance goes down, the other nodes
should continue to function just fine.

So, with number 2, I've run into a situation that I don't think is going to
work for me, but hopefully I'm just missing something.  To test 2., I
installed a 2 node cluster.  On node A, I ran some code to populate one of
the nodes.  On both A and B, I can get data.  Now, I shutdown node A and
request data from Node B, I get a timed out exception from hector.  If I
leave A running, but shutdown B, I get an IllegalStateException: replication
factor (2) exceeds number of endpoints (1).  I have set my
replication_factor to 2.

Can I do what I want with Cassandra?  I appreciate any advice!  And I
apologize in advance if I've missed something obvious, I have all of about
1.5 days experience and I'm trying to learn fast ;-)

Cheers,
Craig

Re: Does Cassandra fit my requirements?

Posted by Craig Ching <cr...@gmail.com>.
Hi Peter, thanks for the response!  I appreciate your validation of our
needs.  Sorry it's taken me so long to get back, I've been knocked on my ass
with a nasty cold since late last week.

On Sat, Oct 16, 2010 at 7:37 AM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > 2.  What we really want for small users is just a replicated data store
> with
> > some caching.  Each of our application instances would have all the data
> > available to it.  If one application instance goes down, the other nodes
> > should continue to function just fine.
>
> Sounds reasonable, but make sure your application logic is okay to
> handle concurrent requests to multiple nodes for the same data. I.e.,
> that there is no problem associated with your application not having
> the only authoritative view of the data in-memory.
>
>
Yeah, understood, I think we're ok with that, we won't have a lot of writes
of data, we're more read-centric which concerns me a bit hearing that
Cassandra is more write-centric.  But we do have a need to store large
amounts of data, so I think Cassandra is right for us based on that.



> > So, with number 2, I've run into a situation that I don't think is going
> to
> > work for me, but hopefully I'm just missing something.  To test 2., I
> > installed a 2 node cluster.  On node A, I ran some code to populate one
> of
> > the nodes.  On both A and B, I can get data.  Now, I shutdown node A and
> > request data from Node B, I get a timed out exception from hector.  If I
> > leave A running, but shutdown B, I get an IllegalStateException:
> replication
> > factor (2) exceeds number of endpoints (1).  I have set my
> > replication_factor to 2.
> > Can I do what I want with Cassandra?  I appreciate any advice!  And I
> > apologize in advance if I've missed something obvious, I have all of
> about
> > 1.5 days experience and I'm trying to learn fast ;-)
>
> Does both A and B agree about the ring state (i.e., they both know
> about both nodes) prior to you shutting one of them down?
>
>
Actually, I figured out the problem on Friday night (sort of).  My problem
was either a bug in Hector (which I think is more likely at this point) or a
misunderstanding of Hector on my part.  It seems Hector wasn't taking my
ConsistencyLevelPolicy (or whatever it's called in Hector).  I switched to
Pelops and things were working much more as I thought they should.  So it
does appear that Cassandra will do what I need thus far.


> --
> / Peter Schuller
>

Re: Does Cassandra fit my requirements?

Posted by Peter Schuller <pe...@infidyne.com>.
> 2.  What we really want for small users is just a replicated data store with
> some caching.  Each of our application instances would have all the data
> available to it.  If one application instance goes down, the other nodes
> should continue to function just fine.

Sounds reasonable, but make sure your application logic is okay to
handle concurrent requests to multiple nodes for the same data. I.e.,
that there is no problem associated with your application not having
the only authoritative view of the data in-memory.

> So, with number 2, I've run into a situation that I don't think is going to
> work for me, but hopefully I'm just missing something.  To test 2., I
> installed a 2 node cluster.  On node A, I ran some code to populate one of
> the nodes.  On both A and B, I can get data.  Now, I shutdown node A and
> request data from Node B, I get a timed out exception from hector.  If I
> leave A running, but shutdown B, I get an IllegalStateException: replication
> factor (2) exceeds number of endpoints (1).  I have set my
> replication_factor to 2.
> Can I do what I want with Cassandra?  I appreciate any advice!  And I
> apologize in advance if I've missed something obvious, I have all of about
> 1.5 days experience and I'm trying to learn fast ;-)

Does both A and B agree about the ring state (i.e., they both know
about both nodes) prior to you shutting one of them down?

-- 
/ Peter Schuller