You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Roshni Rajagopal <ro...@hotmail.com> on 2012/09/19 14:16:33 UTC

Data Model - Consistency question

Hi Folks,
In the relational world, if I needed to model students, courses relationship, I may have donea students -master tablea course - master tablea bridge table students-course which gives me the ids to students and the courses they are taking. This can answer both 'which students take course A', as well as 'which courses are taken by student B'
In the cassandra world, I may design it like thisa static student column familya static course column familya student-course column family with student id as key and dynamic list of course - ids to answer 'which courses are taken by student B'a course-student column family with course id as key and dynamic list of student ids  'which students take course A'
A screen which displays some student entity details as well as all the courses she is taking will need to refer to 2 column families
Suppose an application inserts a new row in student column family, and a new row in student-course column family, as transactions or consistency across column families is not guaranteed, there is a chance that the client receives information that a student is attending a course from student-course column family, but does not exist in the student column family. 
If we use Strong consistency from the  reads + writes combination - will this scenario not occur ?And if we dont, can this scenario occur?  Regards,Roshni





Regards,Roshni 		 	   		  

Re: Data Model - Consistency question

Posted by "Hiller, Dean" <De...@nrel.gov>.
Thinking a little more on your issue, you can also do that in playroom as
OneToMany is represented with a few columns in the owning table/entity
unlike JPA and RDBMS.

Ie.

Student.java {
 List<Courses> - These course primary keys are saved one per column in the
student's row
}

Course.java {
 List<Students> - These students are saved one per column in the courses
row
}

We sometimes do this with playOrm and don't even bother with the S-SQL it
has which also means you don't need to worry about partitioning in that
case.

Later,
Dean

On 9/19/12 6:46 AM, "Hiller, Dean" <De...@nrel.gov> wrote:

>Yes, this scenario can occur(even with quorum writes/reads as you are
>dealing with different rows) as one write may be complete and the other
>not while someone else is reading from the cluster.  Generally though,
>you can do read repair when you read it in ;).  Ie. See if things are
>inconsistent when reading it and either 1. Wait and read again or 2.
>Figure out a way to display the results correctly based on merging data.
>In general #1 on a lot of systems is not a bad approach when you can't
>merge the data yourself because the conflicts are not happening to
>often(maybe < 1% of the time in a lot of cases)
>
>With playOrm if you have partitions, you can still have that relational
>model you described if you can figure out a partition strategy AND you
>can then query on it with joins and such.
>
>Later,
>Dean
>
>
>From: Roshni Rajagopal
><ro...@hotmail.com>>
>Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
><us...@cassandra.apache.org>>
>Date: Wednesday, September 19, 2012 6:16 AM
>To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
><us...@cassandra.apache.org>>
>Subject: Data Model - Consistency question
>
>Hi Folks,
>
>In the relational world, if I needed to model students, courses
>relationship, I may have done
>a students -master table
>a course - master table
>a bridge table students-course which gives me the ids to students and the
>courses they are taking. This can answer both 'which students take course
>A', as well as 'which courses are taken by student B'
>
>In the cassandra world, I may design it like this
>a static student column family
>a static course column family
>a student-course column family with student id as key and dynamic list of
>course - ids to answer 'which courses are taken by student B'
>a course-student column family with course id as key and dynamic list of
>student ids  'which students take course A'
>
>A screen which displays some student entity details as well as all the
>courses she is taking will need to refer to 2 column families
>
>Suppose an application inserts a new row in student column family, and a
>new row in student-course column family, as transactions or consistency
>across column families is not guaranteed, there is a chance that the
>client receives information that a student is attending a course from
>student-course column family, but does not exist in the student column
>family.
>
>If we use Strong consistency from the  reads + writes combination - will
>this scenario not occur ?
>And if we dont, can this scenario occur?
>
>Regards,
>Roshni
>
>
>
>
>
>
>Regards,
>Roshni


Re: Data Model - Consistency question

Posted by "Hiller, Dean" <De...@nrel.gov>.
Yes, this scenario can occur(even with quorum writes/reads as you are dealing with different rows) as one write may be complete and the other not while someone else is reading from the cluster.  Generally though, you can do read repair when you read it in ;).  Ie. See if things are inconsistent when reading it and either 1. Wait and read again or 2. Figure out a way to display the results correctly based on merging data.  In general #1 on a lot of systems is not a bad approach when you can't merge the data yourself because the conflicts are not happening to often(maybe < 1% of the time in a lot of cases)

With playOrm if you have partitions, you can still have that relational model you described if you can figure out a partition strategy AND you can then query on it with joins and such.

Later,
Dean


From: Roshni Rajagopal <ro...@hotmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, September 19, 2012 6:16 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Data Model - Consistency question

Hi Folks,

In the relational world, if I needed to model students, courses relationship, I may have done
a students -master table
a course - master table
a bridge table students-course which gives me the ids to students and the courses they are taking. This can answer both 'which students take course A', as well as 'which courses are taken by student B'

In the cassandra world, I may design it like this
a static student column family
a static course column family
a student-course column family with student id as key and dynamic list of course - ids to answer 'which courses are taken by student B'
a course-student column family with course id as key and dynamic list of student ids  'which students take course A'

A screen which displays some student entity details as well as all the courses she is taking will need to refer to 2 column families

Suppose an application inserts a new row in student column family, and a new row in student-course column family, as transactions or consistency across column families is not guaranteed, there is a chance that the client receives information that a student is attending a course from student-course column family, but does not exist in the student column family.

If we use Strong consistency from the  reads + writes combination - will this scenario not occur ?
And if we dont, can this scenario occur?

Regards,
Roshni






Regards,
Roshni