You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Michael Dagaev <mi...@gmail.com> on 2008/09/24 15:46:40 UTC

Hbase schema for many-to-many association

Hi All

How would you design an Hbase table for many-to-many association
between two entities, for example Student and Course?

I would define two tables:

Student:
    student id
    student data (name, address, ...)
    courses  (use course ids as column qualifiers here)

Course:
   course id
   course data (name, syllabus, ...)
   students (use student ids as column qualifiers here)

Does it make sense?

Thank you,
Michael

Re: Hbase schema for many-to-many association

Posted by Fuad Efendi <fu...@efendi.ca>.

Sorry for typo: HBase (not Hadoop)


Quoting Fuad Efendi <fu...@efendi.ca>:

> I agree with your design...
>
> Without traditional RDBMS indexes from row-oriented world (which are
> indeed column-oriented structures very similar to Hadoop) we should
> have tables like:
> STUDENT_COURSE: student, course
> COURSE_STUDENT: course, student
>
> I feel we can think about Hadoop tables as of index structures from
> traditional RDBMS...
>
> Even with traditional RDBMS, it is possible to design full
> (non-normalized) schema using single table only:
> CREATE TABLE my_database_schema
>    table_name  VARCHAR(256) NOT NULL,
>    column_name VARCHAR(256) NOT NULL,
>    cell_data   BLOB
>
>
> The only problem is surrogate primary keys... modern pattern says: do
> not use primary keys having business meaning (such as Social Insurance
> Number, Driving License Number, and etc). With RDBMS, transactional
> changing primary key in 'parent' table will put a lock on virtually
> infinite number of records from child table, so that we should use
> generated STUDENT_ID instead of natural PKs.
>
> With HBase we should probably revert back to old pattern: find natural
> primary key, do not use surrogate auto-generated STUDENT_ID...
> (student_id could be used in their official transcripts - in this case
> it is 'natural' (having business meaning))
>
> -Fuad
>
> Quoting Michael Dagaev <mi...@gmail.com>:
>
>> Hi All
>>
>> How would you design an Hbase table for many-to-many association
>> between two entities, for example Student and Course?
>>
>> I would define two tables:
>>
>> Student:
>>    student id
>>    student data (name, address, ...)
>>    courses  (use course ids as column qualifiers here)
>>
>> Course:
>>   course id
>>   course data (name, syllabus, ...)
>>   students (use student ids as column qualifiers here)
>>
>> Does it make sense?
>>
>> Thank you,
>> Michael
>>

Re: Hbase schema for many-to-many association

Posted by Michael Dagaev <mi...@gmail.com>.

Fuad

    Thank you for the answer. I think I understand the pattern.

Best Regards,
Michael

On Wed, Sep 24, 2008 at 5:27 PM, Fuad Efendi <fu...@efendi.ca> wrote:
> I agree with your design...
>
> Without traditional RDBMS indexes from row-oriented world (which are indeed
> column-oriented structures very similar to Hadoop) we should have tables
> like:
> STUDENT_COURSE: student, course
> COURSE_STUDENT: course, student
>
> I feel we can think about Hadoop tables as of index structures from
> traditional RDBMS...
>
> Even with traditional RDBMS, it is possible to design full (non-normalized)
> schema using single table only:
> CREATE TABLE my_database_schema
>   table_name  VARCHAR(256) NOT NULL,
>   column_name VARCHAR(256) NOT NULL,
>   cell_data   BLOB
>
>
> The only problem is surrogate primary keys... modern pattern says: do not
> use primary keys having business meaning (such as Social Insurance Number,
> Driving License Number, and etc). With RDBMS, transactional changing primary
> key in 'parent' table will put a lock on virtually infinite number of
> records from child table, so that we should use generated STUDENT_ID instead
> of natural PKs.
>
> With HBase we should probably revert back to old pattern: find natural
> primary key, do not use surrogate auto-generated STUDENT_ID... (student_id
> could be used in their official transcripts - in this case it is 'natural'
> (having business meaning))
>
> -Fuad
>
> Quoting Michael Dagaev <mi...@gmail.com>:
>
>> Hi All
>>
>> How would you design an Hbase table for many-to-many association
>> between two entities, for example Student and Course?
>>
>> I would define two tables:
>>
>> Student:
>>    student id
>>    student data (name, address, ...)
>>    courses  (use course ids as column qualifiers here)
>>
>> Course:
>>   course id
>>   course data (name, syllabus, ...)
>>   students (use student ids as column qualifiers here)
>>
>> Does it make sense?
>>
>> Thank you,
>> Michael
>>
>
>
>
>

Re: Hbase schema for many-to-many association

Posted by Fuad Efendi <fu...@efendi.ca>.

I agree with your design...

Without traditional RDBMS indexes from row-oriented world (which are  
indeed column-oriented structures very similar to Hadoop) we should  
have tables like:
STUDENT_COURSE: student, course
COURSE_STUDENT: course, student

I feel we can think about Hadoop tables as of index structures from  
traditional RDBMS...

Even with traditional RDBMS, it is possible to design full  
(non-normalized) schema using single table only:
CREATE TABLE my_database_schema
    table_name  VARCHAR(256) NOT NULL,
    column_name VARCHAR(256) NOT NULL,
    cell_data   BLOB


The only problem is surrogate primary keys... modern pattern says: do  
not use primary keys having business meaning (such as Social Insurance  
Number, Driving License Number, and etc). With RDBMS, transactional  
changing primary key in 'parent' table will put a lock on virtually  
infinite number of records from child table, so that we should use  
generated STUDENT_ID instead of natural PKs.

With HBase we should probably revert back to old pattern: find natural  
primary key, do not use surrogate auto-generated STUDENT_ID...  
(student_id could be used in their official transcripts - in this case  
it is 'natural' (having business meaning))

-Fuad

Quoting Michael Dagaev <mi...@gmail.com>:

> Hi All
>
> How would you design an Hbase table for many-to-many association
> between two entities, for example Student and Course?
>
> I would define two tables:
>
> Student:
>     student id
>     student data (name, address, ...)
>     courses  (use course ids as column qualifiers here)
>
> Course:
>    course id
>    course data (name, syllabus, ...)
>    students (use student ids as column qualifiers here)
>
> Does it make sense?
>
> Thank you,
> Michael
>

Re: Hbase schema for many-to-many association

Posted by Michael Dagaev <mi...@gmail.com>.

Jonathan

    Thank you for the answer.  I am just a little bit concerned about
performance implications of calling Hbase twice if I want to retrieve
data about all classes for a student.

Best Regards,
Michael

On Wed, Sep 24, 2008 at 5:20 PM, Jonathan Gray <jl...@streamy.com> wrote:
> Michael,
>
> Your design does make sense.
>
> As you said, you'd probably have two column-families in each of the Student
> and Course tables.  One for the data, another with a column per student or
> course.
>
> For example, a student row might look like:
>
> Student :
>
> id/row/key = 1001
> data:name = Student Name
> data:address = 123 ABC St
> courses:2001 = (If you need more information about this association, for
> example, if they are on the waiting list)
> courses:2002 = ...
>
>
> This schema gives you fast access to the queries, show all classes for a
> student (student table, courses family), or all students for a class
> (courses table, students family).
>
>
> Jonathan Gray
>
>
> -----Original Message-----
> From: Michael Dagaev [mailto:michael.dagaev@gmail.com]
> Sent: Wednesday, September 24, 2008 6:47 AM
> To: hbase-user@hadoop.apache.org
> Subject: Hbase schema for many-to-many association
>
> Hi All
>
> How would you design an Hbase table for many-to-many association
> between two entities, for example Student and Course?
>
> I would define two tables:
>
> Student:
>    student id
>    student data (name, address, ...)
>    courses  (use course ids as column qualifiers here)
>
> Course:
>   course id
>   course data (name, syllabus, ...)
>   students (use student ids as column qualifiers here)
>
> Does it make sense?
>
> Thank you,
> Michael
>
>

RE: Hbase schema for many-to-many association

Posted by Jonathan Gray <jl...@streamy.com>.

Michael,

Your design does make sense.

As you said, you'd probably have two column-families in each of the Student
and Course tables.  One for the data, another with a column per student or
course.

For example, a student row might look like:

Student : 

id/row/key = 1001
data:name = Student Name
data:address = 123 ABC St
courses:2001 = (If you need more information about this association, for
example, if they are on the waiting list)
courses:2002 = ...


This schema gives you fast access to the queries, show all classes for a
student (student table, courses family), or all students for a class
(courses table, students family).


Jonathan Gray


-----Original Message-----
From: Michael Dagaev [mailto:michael.dagaev@gmail.com] 
Sent: Wednesday, September 24, 2008 6:47 AM
To: hbase-user@hadoop.apache.org
Subject: Hbase schema for many-to-many association

Hi All

How would you design an Hbase table for many-to-many association
between two entities, for example Student and Course?

I would define two tables:

Student:
    student id
    student data (name, address, ...)
    courses  (use course ids as column qualifiers here)

Course:
   course id
   course data (name, syllabus, ...)
   students (use student ids as column qualifiers here)

Does it make sense?

Thank you,
Michael