You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michael Dagaev <mi...@gmail.com> on 2008/09/24 15:46:40 UTC
Hbase schema for many-to-many association
Hi All
How would you design an Hbase table for many-to-many association
between two entities, for example Student and Course?
I would define two tables:
Student:
student id
student data (name, address, ...)
courses (use course ids as column qualifiers here)
Course:
course id
course data (name, syllabus, ...)
students (use student ids as column qualifiers here)
Does it make sense?
Thank you,
Michael
Re: Hbase schema for many-to-many association
Posted by Fuad Efendi <fu...@efendi.ca>.
Sorry for typo: HBase (not Hadoop)
Quoting Fuad Efendi <fu...@efendi.ca>:
> I agree with your design...
>
> Without traditional RDBMS indexes from row-oriented world (which are
> indeed column-oriented structures very similar to Hadoop) we should
> have tables like:
> STUDENT_COURSE: student, course
> COURSE_STUDENT: course, student
>
> I feel we can think about Hadoop tables as of index structures from
> traditional RDBMS...
>
> Even with traditional RDBMS, it is possible to design full
> (non-normalized) schema using single table only:
> CREATE TABLE my_database_schema
> table_name VARCHAR(256) NOT NULL,
> column_name VARCHAR(256) NOT NULL,
> cell_data BLOB
>
>
> The only problem is surrogate primary keys... modern pattern says: do
> not use primary keys having business meaning (such as Social Insurance
> Number, Driving License Number, and etc). With RDBMS, transactional
> changing primary key in 'parent' table will put a lock on virtually
> infinite number of records from child table, so that we should use
> generated STUDENT_ID instead of natural PKs.
>
> With HBase we should probably revert back to old pattern: find natural
> primary key, do not use surrogate auto-generated STUDENT_ID...
> (student_id could be used in their official transcripts - in this case
> it is 'natural' (having business meaning))
>
> -Fuad
>
> Quoting Michael Dagaev <mi...@gmail.com>:
>
>> Hi All
>>
>> How would you design an Hbase table for many-to-many association
>> between two entities, for example Student and Course?
>>
>> I would define two tables:
>>
>> Student:
>> student id
>> student data (name, address, ...)
>> courses (use course ids as column qualifiers here)
>>
>> Course:
>> course id
>> course data (name, syllabus, ...)
>> students (use student ids as column qualifiers here)
>>
>> Does it make sense?
>>
>> Thank you,
>> Michael
>>
Re: Hbase schema for many-to-many association
Posted by Michael Dagaev <mi...@gmail.com>.
Fuad
Thank you for the answer. I think I understand the pattern.
Best Regards,
Michael
On Wed, Sep 24, 2008 at 5:27 PM, Fuad Efendi <fu...@efendi.ca> wrote:
> I agree with your design...
>
> Without traditional RDBMS indexes from row-oriented world (which are indeed
> column-oriented structures very similar to Hadoop) we should have tables
> like:
> STUDENT_COURSE: student, course
> COURSE_STUDENT: course, student
>
> I feel we can think about Hadoop tables as of index structures from
> traditional RDBMS...
>
> Even with traditional RDBMS, it is possible to design full (non-normalized)
> schema using single table only:
> CREATE TABLE my_database_schema
> table_name VARCHAR(256) NOT NULL,
> column_name VARCHAR(256) NOT NULL,
> cell_data BLOB
>
>
> The only problem is surrogate primary keys... modern pattern says: do not
> use primary keys having business meaning (such as Social Insurance Number,
> Driving License Number, and etc). With RDBMS, transactional changing primary
> key in 'parent' table will put a lock on virtually infinite number of
> records from child table, so that we should use generated STUDENT_ID instead
> of natural PKs.
>
> With HBase we should probably revert back to old pattern: find natural
> primary key, do not use surrogate auto-generated STUDENT_ID... (student_id
> could be used in their official transcripts - in this case it is 'natural'
> (having business meaning))
>
> -Fuad
>
> Quoting Michael Dagaev <mi...@gmail.com>:
>
>> Hi All
>>
>> How would you design an Hbase table for many-to-many association
>> between two entities, for example Student and Course?
>>
>> I would define two tables:
>>
>> Student:
>> student id
>> student data (name, address, ...)
>> courses (use course ids as column qualifiers here)
>>
>> Course:
>> course id
>> course data (name, syllabus, ...)
>> students (use student ids as column qualifiers here)
>>
>> Does it make sense?
>>
>> Thank you,
>> Michael
>>
>
>
>
>
Re: Hbase schema for many-to-many association
Posted by Fuad Efendi <fu...@efendi.ca>.
I agree with your design...
Without traditional RDBMS indexes from row-oriented world (which are
indeed column-oriented structures very similar to Hadoop) we should
have tables like:
STUDENT_COURSE: student, course
COURSE_STUDENT: course, student
I feel we can think about Hadoop tables as of index structures from
traditional RDBMS...
Even with traditional RDBMS, it is possible to design full
(non-normalized) schema using single table only:
CREATE TABLE my_database_schema
table_name VARCHAR(256) NOT NULL,
column_name VARCHAR(256) NOT NULL,
cell_data BLOB
The only problem is surrogate primary keys... modern pattern says: do
not use primary keys having business meaning (such as Social Insurance
Number, Driving License Number, and etc). With RDBMS, transactional
changing primary key in 'parent' table will put a lock on virtually
infinite number of records from child table, so that we should use
generated STUDENT_ID instead of natural PKs.
With HBase we should probably revert back to old pattern: find natural
primary key, do not use surrogate auto-generated STUDENT_ID...
(student_id could be used in their official transcripts - in this case
it is 'natural' (having business meaning))
-Fuad
Quoting Michael Dagaev <mi...@gmail.com>:
> Hi All
>
> How would you design an Hbase table for many-to-many association
> between two entities, for example Student and Course?
>
> I would define two tables:
>
> Student:
> student id
> student data (name, address, ...)
> courses (use course ids as column qualifiers here)
>
> Course:
> course id
> course data (name, syllabus, ...)
> students (use student ids as column qualifiers here)
>
> Does it make sense?
>
> Thank you,
> Michael
>
Re: Hbase schema for many-to-many association
Posted by Michael Dagaev <mi...@gmail.com>.
Jonathan
Thank you for the answer. I am just a little bit concerned about
performance implications of calling Hbase twice if I want to retrieve
data about all classes for a student.
Best Regards,
Michael
On Wed, Sep 24, 2008 at 5:20 PM, Jonathan Gray <jl...@streamy.com> wrote:
> Michael,
>
> Your design does make sense.
>
> As you said, you'd probably have two column-families in each of the Student
> and Course tables. One for the data, another with a column per student or
> course.
>
> For example, a student row might look like:
>
> Student :
>
> id/row/key = 1001
> data:name = Student Name
> data:address = 123 ABC St
> courses:2001 = (If you need more information about this association, for
> example, if they are on the waiting list)
> courses:2002 = ...
>
>
> This schema gives you fast access to the queries, show all classes for a
> student (student table, courses family), or all students for a class
> (courses table, students family).
>
>
> Jonathan Gray
>
>
> -----Original Message-----
> From: Michael Dagaev [mailto:michael.dagaev@gmail.com]
> Sent: Wednesday, September 24, 2008 6:47 AM
> To: hbase-user@hadoop.apache.org
> Subject: Hbase schema for many-to-many association
>
> Hi All
>
> How would you design an Hbase table for many-to-many association
> between two entities, for example Student and Course?
>
> I would define two tables:
>
> Student:
> student id
> student data (name, address, ...)
> courses (use course ids as column qualifiers here)
>
> Course:
> course id
> course data (name, syllabus, ...)
> students (use student ids as column qualifiers here)
>
> Does it make sense?
>
> Thank you,
> Michael
>
>
RE: Hbase schema for many-to-many association
Posted by Jonathan Gray <jl...@streamy.com>.
Michael,
Your design does make sense.
As you said, you'd probably have two column-families in each of the Student
and Course tables. One for the data, another with a column per student or
course.
For example, a student row might look like:
Student :
id/row/key = 1001
data:name = Student Name
data:address = 123 ABC St
courses:2001 = (If you need more information about this association, for
example, if they are on the waiting list)
courses:2002 = ...
This schema gives you fast access to the queries, show all classes for a
student (student table, courses family), or all students for a class
(courses table, students family).
Jonathan Gray
-----Original Message-----
From: Michael Dagaev [mailto:michael.dagaev@gmail.com]
Sent: Wednesday, September 24, 2008 6:47 AM
To: hbase-user@hadoop.apache.org
Subject: Hbase schema for many-to-many association
Hi All
How would you design an Hbase table for many-to-many association
between two entities, for example Student and Course?
I would define two tables:
Student:
student id
student data (name, address, ...)
courses (use course ids as column qualifiers here)
Course:
course id
course data (name, syllabus, ...)
students (use student ids as column qualifiers here)
Does it make sense?
Thank you,
Michael