You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Roshni Rajagopal <ro...@hotmail.com> on 2012/09/13 19:54:51 UTC

Data Model

I want to learn how we can model a mix of static and dynamic columns in a family.
Consider a course_students col family which gives a list of students for a coursewith row key- Course IdColumns - Name, Teach_Nm, StudID1, StudID2, StudID3Values - Maths, Prof. Abc, 20,21,25 where 20,21,25 are IDs of students.
We have
fixed columns like Course Name, Teacher Name, and a dynamic number of
columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could
look for 'StudID' and get all the columns with the student Ids in Hector. But the
question was how would we determine the number for the column, like to add
StudID3 we need to read the row and identify that 2 students are there,
and this is the third one.


So we can remove the number in the column name, altogether and keep
columns like Course Name, Teacher Name, Student:20,Student:21, Student:25,
where the second part is the actual student id. However here we run into
the second issue that we cannot have some columns of a composite format
and some of another format, when we use static column families- all
columns would need to be in the format UTF8:integer We may want to treat
it as a composite column key and not use a delimiter- to get sorting,
validate the types of the parts of the key, not have to search for the
delimiter and separate the 2 components  manually etc. 


A third option is to put only data in the column name for students like
Course Name, Teacher Name, 20,21,25 - it would be difficult to identify
that columns with name 20, 21, 25 actually stand for student names - a bit
unreadable.


I hope this is not confusing, and would like to hear your thoughts on this.The question isaround when you de-normalize & want to have some static info like name ,and a dynamic list - whats the best way to model this.
Regards,Roshni 		 	   		  

Re: Data Model

Posted by "Hiller, Dean" <De...@nrel.gov>.
playOrm uses EXACTLY that pattern where @OneToMany becomes student.rowkeyStudent1 student.rowkeyStudent2 and the other fields are fixed.  It is a common pattern in noSQL.

Dean

From: aaron morton <aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, September 14, 2012 3:00 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Data Model

Consider a course_students col family which gives a list of students for a course

I would use two CF's:

Course CF:
* Each row is one course
* Columns are the properties and values of the course

CourseEnrolements CF
* Each row is one course
* Column name is the student ID.
* Column value may be blank or some useful value.

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 3:19 PM, Michael Morris <mi...@gmail.com>> wrote:

I'm fairly new to Cassandra myself, but had to solve a similar problem.  If ordering of the student number values is not important to you, you can store them as UTF8 values (Ascii would work too, may be a better choice?), and the resulting columns would be sorted by the lexical ordering of the numeric values (as opposed to numeric sorting), so it would be 1, 10, 11, 2, 3, 31, 5...

In my situation, I had a document, composed of digitized images of the pages.  My row key is the doc id number, and there were document level attributes, as well as page level attributes I wanted to capture in the row.  So in my model, I used composite columns of 3 UTF8 values.  The first is an attribute descriptor, I used 'a' to indicate a document level attribute, and 'p' as a page level attribute.  The second composite value depends on the 1st, for 'a' types, the 2nd value is the actual attribute identifier (ex, form type, scanner number, etc...).  For 'p' types, it refers to the page number.  Having Cassandra preserve the order of the page numbers is not a priority for me, so I can deal with the page number being sorted in String order in the database, I'll deal with numerical sorting that in the app logic (since the largest documents we process are only about 100 pages long).  The 3rd composite value is empty for all 'a' types, and for 'p' types, it refers to a page level attribute (page landmark information, image skew angle, location on disk, etc...).

As an example:

key1 => a:form=x, a:scanner=1425436, p:1:skew=0.0042142, p:1:file=page1.png, p:2:skew=0.0042412, p:2:file=page2.png
key2 => a:form=q, a:scanner=935625, p:1:skew=0.00032352, p:1:file=other1.png, p:2:skew=:0.0002355, p:2:file=other2.png

It's been working well for me when using the Hector client.

Thanks,

Mike

On Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya <cs...@gmail.com>> wrote:
I just started learning Cassandra any suggestion where to start with ??


Thanks
Soumya

On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal <ro...@hotmail.com>> wrote:
I want to learn how we can model a mix of static and dynamic columns in a family.

Consider a course_students col family which gives a list of students for a course
with row key- Course Id
Columns - Name, Teach_Nm, StudID1, StudID2, StudID3
Values - Maths, Prof. Abc, 20,21,25
where 20,21,25 are IDs of students.

We have

fixed columns like Course Name, Teacher Name, and a dynamic number of

columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could

look for 'StudID' and get all the columns with the student Ids in Hector. But the

question was how would we determine the number for the column, like to add

StudID3 we need to read the row and identify that 2 students are there,

and this is the third one.


So we can remove the number in the column name, altogether and keep

columns like Course Name, Teacher Name, Student:20,Student:21, Student:25,

where the second part is the actual student id. However here we run into

the second issue that we cannot have some columns of a composite format

and some of another format, when we use static column families- all

columns would need to be in the format UTF8:integer We may want to treat

it as a composite column key and not use a delimiter- to get sorting,

validate the types of the parts of the key, not have to search for the

delimiter and separate the 2 components  manually etc.


A third option is to put only data in the column name for students like

Course Name, Teacher Name, 20,21,25 - it would be difficult to identify

that columns with name 20, 21, 25 actually stand for student names - a bit

unreadable.


I hope this is not confusing, and would like to hear your thoughts on this.The question is

around when you de-normalize & want to have some static info like name ,

and a dynamic list - whats the best way to model this.


Regards,

Roshni



--
Regards and Thanks
Soumya Kanti Acharya



Re: Data Model

Posted by aaron morton <aa...@thelastpickle.com>.
> Consider a course_students col family which gives a list of students for a course

I would use two CF's:

Course CF:
	* Each row is one course
	* Columns are the properties and values of the course

CourseEnrolements CF
	* Each row is one course
	* Column name is the student ID. 
	* Column value may be blank or some useful value. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 3:19 PM, Michael Morris <mi...@gmail.com> wrote:

> I'm fairly new to Cassandra myself, but had to solve a similar problem.  If ordering of the student number values is not important to you, you can store them as UTF8 values (Ascii would work too, may be a better choice?), and the resulting columns would be sorted by the lexical ordering of the numeric values (as opposed to numeric sorting), so it would be 1, 10, 11, 2, 3, 31, 5...
> 
> In my situation, I had a document, composed of digitized images of the pages.  My row key is the doc id number, and there were document level attributes, as well as page level attributes I wanted to capture in the row.  So in my model, I used composite columns of 3 UTF8 values.  The first is an attribute descriptor, I used 'a' to indicate a document level attribute, and 'p' as a page level attribute.  The second composite value depends on the 1st, for 'a' types, the 2nd value is the actual attribute identifier (ex, form type, scanner number, etc...).  For 'p' types, it refers to the page number.  Having Cassandra preserve the order of the page numbers is not a priority for me, so I can deal with the page number being sorted in String order in the database, I'll deal with numerical sorting that in the app logic (since the largest documents we process are only about 100 pages long).  The 3rd composite value is empty for all 'a' types, and for 'p' types, it refers to a page level attribute (page landmark information, image skew angle, location on disk, etc...).
> 
> As an example:
> 
> key1 => a:form=x, a:scanner=1425436, p:1:skew=0.0042142, p:1:file=page1.png, p:2:skew=0.0042412, p:2:file=page2.png
> key2 => a:form=q, a:scanner=935625, p:1:skew=0.00032352, p:1:file=other1.png, p:2:skew=:0.0002355, p:2:file=other2.png
> 
> It's been working well for me when using the Hector client.
> 
> Thanks,
> 
> Mike
> 
> On Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya <cs...@gmail.com> wrote:
> I just started learning Cassandra any suggestion where to start with ??
> 
> 
> Thanks
> Soumya 
> 
> On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal <ro...@hotmail.com> wrote:
> I want to learn how we can model a mix of static and dynamic columns in a family.
> 
> Consider a course_students col family which gives a list of students for a course
> with row key- Course Id
> Columns - Name, Teach_Nm, StudID1, StudID2, StudID3
> Values - Maths, Prof. Abc, 20,21,25 
> where 20,21,25 are IDs of students.
> 
> We have
> fixed columns like Course Name, Teacher Name, and a dynamic number of
> 
> columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could
> 
> look for 'StudID' and get all the columns with the student Ids in Hector. But the
> 
> question was how would we determine the number for the column, like to add
> 
> StudID3 we need to read the row and identify that 2 students are there,
> 
> and this is the third one.
> 
> 
> 
> So we can remove the number in the column name, altogether and keep
> 
> columns like Course Name, Teacher Name, Student:20,Student:21, Student:25,
> 
> where the second part is the actual student id. However here we run into
> 
> the second issue that we cannot have some columns of a composite format
> 
> and some of another format, when we use static column families- all
> 
> columns would need to be in the format UTF8:integer We may want to treat
> 
> it as a composite column key and not use a delimiter- to get sorting,
> 
> validate the types of the parts of the key, not have to search for the
> 
> delimiter and separate the 2 components  manually etc. 
> 
> 
> 
> A third option is to put only data in the column name for students like
> 
> Course Name, Teacher Name, 20,21,25 - it would be difficult to identify
> 
> that columns with name 20, 21, 25 actually stand for student names - a bit
> 
> unreadable.
> 
> 
> 
> I hope this is not confusing, and would like to hear your thoughts on this.The question is
> 
> around when you de-normalize & want to have some static info like name ,
> 
> and a dynamic list - whats the best way to model this.
> 
> 
> 
> Regards,
> 
> Roshni
> 
> 
> 
> 
> -- 
> Regards and Thanks 
> Soumya Kanti Acharya
> 


Re: Data Model

Posted by Michael Morris <mi...@gmail.com>.
I'm fairly new to Cassandra myself, but had to solve a similar problem.  If
ordering of the student number values is not important to you, you can
store them as UTF8 values (Ascii would work too, may be a better choice?),
and the resulting columns would be sorted by the lexical ordering of the
numeric values (as opposed to numeric sorting), so it would be 1, 10, 11,
2, 3, 31, 5...

In my situation, I had a document, composed of digitized images of the
pages.  My row key is the doc id number, and there were document level
attributes, as well as page level attributes I wanted to capture in the
row.  So in my model, I used composite columns of 3 UTF8 values.  The first
is an attribute descriptor, I used 'a' to indicate a document level
attribute, and 'p' as a page level attribute.  The second composite value
depends on the 1st, for 'a' types, the 2nd value is the actual attribute
identifier (ex, form type, scanner number, etc...).  For 'p' types, it
refers to the page number.  Having Cassandra preserve the order of the page
numbers is not a priority for me, so I can deal with the page number being
sorted in String order in the database, I'll deal with numerical sorting
that in the app logic (since the largest documents we process are only
about 100 pages long).  The 3rd composite value is empty for all 'a' types,
and for 'p' types, it refers to a page level attribute (page landmark
information, image skew angle, location on disk, etc...).

As an example:

key1 => a:form=x, a:scanner=1425436, p:1:skew=0.0042142,
p:1:file=page1.png, p:2:skew=0.0042412, p:2:file=page2.png
key2 => a:form=q, a:scanner=935625, p:1:skew=0.00032352,
p:1:file=other1.png, p:2:skew=:0.0002355, p:2:file=other2.png

It's been working well for me when using the Hector client.

Thanks,

Mike

On Thu, Sep 13, 2012 at 12:59 PM, Soumya Acharya <cs...@gmail.com>wrote:

> I just started learning Cassandra any suggestion where to start with ??
>
>
> Thanks
> Soumya
>
> On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal <
> roshni_rajagopal@hotmail.com> wrote:
>
>>  I want to learn how we can model a mix of static and dynamic columns in
>> a family.
>>
>> Consider a course_students col family which gives a list of students for
>> a course
>> with row key- Course Id
>> Columns - Name, Teach_Nm, StudID1, StudID2, StudID3
>> Values - Maths, Prof. Abc, 20,21,25
>> where 20,21,25 are IDs of students.
>>
>> We have
>>
>> fixed columns like Course Name, Teacher Name, and a dynamic number of
>>
>> columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could
>>
>> look for 'StudID' and get all the columns with the student Ids in Hector.
>> But the
>>
>> question was how would we determine the number for the column, like to add
>>
>> StudID3 we need to read the row and identify that 2 students are there,
>>
>> and this is the third one.
>>
>>
>> So we can remove the number in the column name, altogether and keep
>>
>> columns like Course Name, Teacher Name, Student:20,Student:21, Student:25,
>>
>> where the second part is the actual student id. However here we run into
>>
>> the second issue that we cannot have some columns of a composite format
>>
>> and some of another format, when we use static column families- all
>>
>> columns would need to be in the format UTF8:integer We may want to treat
>>
>> it as a composite column key and not use a delimiter- to get sorting,
>>
>> validate the types of the parts of the key, not have to search for the
>>
>> delimiter and separate the 2 components  manually etc.
>>
>>
>> A third option is to put only data in the column name for students like
>>
>> Course Name, Teacher Name, 20,21,25 - it would be difficult to identify
>>
>> that columns with name 20, 21, 25 actually stand for student names - a bit
>>
>> unreadable.
>>
>>
>> I hope this is not confusing, and would like to hear your thoughts on
>> this.The question is
>>
>> around when you de-normalize & want to have some static info like name ,
>>
>> and a dynamic list - whats the best way to model this.
>>
>>
>> Regards,
>>
>> Roshni
>>
>
>
>
> --
> Regards and Thanks
> Soumya Kanti Acharya
>

Re: Data Model

Posted by Soumya Acharya <cs...@gmail.com>.
I just started learning Cassandra any suggestion where to start with ??


Thanks
Soumya

On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal <
roshni_rajagopal@hotmail.com> wrote:

>  I want to learn how we can model a mix of static and dynamic columns in
> a family.
>
> Consider a course_students col family which gives a list of students for a
> course
> with row key- Course Id
> Columns - Name, Teach_Nm, StudID1, StudID2, StudID3
> Values - Maths, Prof. Abc, 20,21,25
> where 20,21,25 are IDs of students.
>
> We have
>
> fixed columns like Course Name, Teacher Name, and a dynamic number of
>
> columns like 'StudID1', 'StudID2' etc, and my thoughts were that we could
>
> look for 'StudID' and get all the columns with the student Ids in Hector.
> But the
>
> question was how would we determine the number for the column, like to add
>
> StudID3 we need to read the row and identify that 2 students are there,
>
> and this is the third one.
>
>
> So we can remove the number in the column name, altogether and keep
>
> columns like Course Name, Teacher Name, Student:20,Student:21, Student:25,
>
> where the second part is the actual student id. However here we run into
>
> the second issue that we cannot have some columns of a composite format
>
> and some of another format, when we use static column families- all
>
> columns would need to be in the format UTF8:integer We may want to treat
>
> it as a composite column key and not use a delimiter- to get sorting,
>
> validate the types of the parts of the key, not have to search for the
>
> delimiter and separate the 2 components  manually etc.
>
>
> A third option is to put only data in the column name for students like
>
> Course Name, Teacher Name, 20,21,25 - it would be difficult to identify
>
> that columns with name 20, 21, 25 actually stand for student names - a bit
>
> unreadable.
>
>
> I hope this is not confusing, and would like to hear your thoughts on this.The
> question is
>
> around when you de-normalize & want to have some static info like name ,
>
> and a dynamic list - whats the best way to model this.
>
>
> Regards,
>
> Roshni
>



-- 
Regards and Thanks
Soumya Kanti Acharya