You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mubarak Seyed <mu...@gmail.com> on 2010/11/08 02:05:50 UTC
Design Question
Hi All,
Can someone please validate and recommend a solution for the given design
problem?
*Problem statement:* Need to de-queue data from Cassandra (from Standard
ColumnFamily) using a job but multiple instances of a job can run
simultaneously (kinda multiple threads), trying to access a same row but
need to make sure that only one instance of a job (thread) can access a row,
meaning if job A is accessing Row #1, then job B can't access Row #1.
*Possible solutions:*
*Solution #1:* Using Cages (and ZooKeeper) to make sure that one only job at
a time can access a row in CF. How do we make sure that Cages (transaction
coordinator using ZooKeeper) is not a Single Point of Failure? What is the
performance impact on write/read on nodes? There is some blog on distributed
concurrent queue at
http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/
*Solution #2: *Using some home-grown approach to store/maintain who is
accessing what, meaning which job is accessing which row.
Are there any other solutions to the above problem?
Can someone please help me on validate the design?
--
Thanks,
Mubarak Seyed.
Re: RE: Design Question
Posted by Aaron Morton <aa...@thelastpickle.com>.
Here they are
http://www.riptano.com/blog/slides-and-videos-cassandra-summit-2010
Aaron
On 09 Nov, 2010,at 09:36 AM, Jeremiah Jordan <JE...@morningstar.com> wrote:
Is the slide deck for this presentation online somewhere?
-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com]
Sent: Monday, November 08, 2010 2:02 PM
To: user
Subject: Re: Design Question
Hi Mubarak,
Did you see David Strauss's talk on queing at the Summit?
http://riptanoblip.tv/file/4015190/
What specifics can you give as to how your use case is similar to /
different from what David covered?
On Sun, Nov 7, 2010 at 7:05 PM, Mubarak Seyed <mu...@gmail.com> wrote:
> Hi All,
> Can someone please validate and recommend a solution for the given design
> problem?
> Problem statement: Need to de-queue data from Cassandra (from Standard
> ColumnFamily) using a job but multiple instances of a job can run
> simultaneously (kinda multiple threads), trying to access a same row but
> need to make sure that only one instance of a job (thread) can access a row,
> meaning if job A is accessing Row #1, then job B can't access Row #1.
> Possible solutions:
> Solution #1: Using Cages (and ZooKeeper) to make sure that one only job at a
> time can access a row in CF. How do we make sure that Cages (transaction
> coordinator using ZooKeeper) is not a Single Point of Failure? What is the
> performance impact on write/read on nodes? There is some blog on distributed
> concurrent queue
> at http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/
> Solution #2: Using some home-grown approach to store/maintain who is
> accessing what, meaning which job is accessing which row.
> Are there any other solutions to the above problem?
> Can someone please help me on validate the design?
> --
> Thanks,
> Mubarak Seyed.
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
RE: Design Question
Posted by Jeremiah Jordan <JE...@morningstar.com>.
Is the slide deck for this presentation online somewhere?
-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com]
Sent: Monday, November 08, 2010 2:02 PM
To: user
Subject: Re: Design Question
Hi Mubarak,
Did you see David Strauss's talk on queing at the Summit?
http://riptano.blip.tv/file/4015190/
What specifics can you give as to how your use case is similar to /
different from what David covered?
On Sun, Nov 7, 2010 at 7:05 PM, Mubarak Seyed <mu...@gmail.com> wrote:
> Hi All,
> Can someone please validate and recommend a solution for the given design
> problem?
> Problem statement: Need to de-queue data from Cassandra (from Standard
> ColumnFamily) using a job but multiple instances of a job can run
> simultaneously (kinda multiple threads), trying to access a same row but
> need to make sure that only one instance of a job (thread) can access a row,
> meaning if job A is accessing Row #1, then job B can't access Row #1.
> Possible solutions:
> Solution #1: Using Cages (and ZooKeeper) to make sure that one only job at a
> time can access a row in CF. How do we make sure that Cages (transaction
> coordinator using ZooKeeper) is not a Single Point of Failure? What is the
> performance impact on write/read on nodes? There is some blog on distributed
> concurrent queue
> at http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/
> Solution #2: Using some home-grown approach to store/maintain who is
> accessing what, meaning which job is accessing which row.
> Are there any other solutions to the above problem?
> Can someone please help me on validate the design?
> --
> Thanks,
> Mubarak Seyed.
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Re: Design Question
Posted by Jonathan Ellis <jb...@gmail.com>.
Hi Mubarak,
Did you see David Strauss's talk on queing at the Summit?
http://riptano.blip.tv/file/4015190/
What specifics can you give as to how your use case is similar to /
different from what David covered?
On Sun, Nov 7, 2010 at 7:05 PM, Mubarak Seyed <mu...@gmail.com> wrote:
> Hi All,
> Can someone please validate and recommend a solution for the given design
> problem?
> Problem statement: Need to de-queue data from Cassandra (from Standard
> ColumnFamily) using a job but multiple instances of a job can run
> simultaneously (kinda multiple threads), trying to access a same row but
> need to make sure that only one instance of a job (thread) can access a row,
> meaning if job A is accessing Row #1, then job B can't access Row #1.
> Possible solutions:
> Solution #1: Using Cages (and ZooKeeper) to make sure that one only job at a
> time can access a row in CF. How do we make sure that Cages (transaction
> coordinator using ZooKeeper) is not a Single Point of Failure? What is the
> performance impact on write/read on nodes? There is some blog on distributed
> concurrent queue
> at http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/
> Solution #2: Using some home-grown approach to store/maintain who is
> accessing what, meaning which job is accessing which row.
> Are there any other solutions to the above problem?
> Can someone please help me on validate the design?
> --
> Thanks,
> Mubarak Seyed.
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Re: Design Question
Posted by Dan Retzlaff <dr...@gmail.com>.
If you go the home-grown route, check out these musings on adapting
Lamport's Bakery algorithm to a similar problem:
http://wiki.apache.org/cassandra/Locking
On Sun, Nov 7, 2010 at 5:05 PM, Mubarak Seyed <mu...@gmail.com> wrote:
> Hi All,
> Can someone please validate and recommend a solution for the given design
> problem?
> Problem statement: Need to de-queue data from Cassandra (from Standard
> ColumnFamily) using a job but multiple instances of a job can run
> simultaneously (kinda multiple threads), trying to access a same row but
> need to make sure that only one instance of a job (thread) can access a row,
> meaning if job A is accessing Row #1, then job B can't access Row #1.
> Possible solutions:
> Solution #1: Using Cages (and ZooKeeper) to make sure that one only job at a
> time can access a row in CF. How do we make sure that Cages (transaction
> coordinator using ZooKeeper) is not a Single Point of Failure? What is the
> performance impact on write/read on nodes? There is some blog on distributed
> concurrent queue
> at http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/
> Solution #2: Using some home-grown approach to store/maintain who is
> accessing what, meaning which job is accessing which row.
> Are there any other solutions to the above problem?
> Can someone please help me on validate the design?
> --
> Thanks,
> Mubarak Seyed.
>
Re: Design Question
Posted by Aaron Morton <aa...@thelastpickle.com>.
FWIW I would recommend first trying to solve the issue in your application rather than with Cages or Zoo Keeper. Although I do not have experience with Cages or Zoo Keeper, it's another major server component in your stack.
If you really do have a queue and multiple simultaneous readers consider using something like Rabbit MQ http://www.rabbitmq.com/ . Or try something like Redis http://code.google.com/p/redis/ or Gear Man http://gearman.org/ to get a quick prototype going.
Hope that helps.
Aaron
On 08 Nov, 2010,at 02:05 PM, Mubarak Seyed <mu...@gmail.com> wrote:
Hi All,
Can someone please validate and recommend a solution for the given design problem?
Problem statement: Need to de-queue data from Cassandra (from Standard ColumnFamily) using a job but multiple instances of a job can run simultaneously (kinda multiple threads), trying to access a same row but need to make sure that only one instance of a job (thread) can access a row, meaning if job A is accessing Row #1, then job B can't access Row #1.
Possible solutions:
Solution #1: Using Cages (and ZooKeeper) to make sure that one only job at a time can access a row in CF. How do we make sure that Cages (transaction coordinator using ZooKeeper) is not a Single Point of Failure? What is the performance impact on write/read on nodes? There is some blog on distributed concurrent queue at http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/
Solution #2: Using some home-grown approach to store/maintain who is accessing what, meaning which job is accessing which row.
Are there any other solutions to the above problem?
Can someone please help me on validate the design?
--
Thanks,
Mubarak Seyed.