You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by kranthi reddy <kr...@gmail.com> on 2010/03/31 12:05:03 UTC

Porting SQL DB into HBASE

Hi all,

        I have run into some trouble while trying to port SQL DB to Hbase.
The problem is my SQL DB has around 500 tables (approx) and it is very badly
designed. Around 45-50 tables could be denormalised into a single table and
the remaining tables are static tables. My doubts are

1) Is it possible to port this DB (Tables) to Hbase? If possible how?
2) How many tables can Hbase support with tolerance towards failure?
3) When so many tables are inserted, how is the performance going to be
effected? Will it remain same or degrade?

One possible solution I think is using column family for each table. But as
per my knowledge and previous experiments, I found Hbase isn't stable when
column families are more than 5.

Since every day large quantities of data is ported into the DataBase,
stability and fail proof system is highest priority.

Hoping for a positive response.

Thank you,
kranthi

RE: Porting SQL DB into HBASE

Posted by Michael Segel <mi...@hotmail.com>.


> Date: Wed, 14 Apr 2010 12:03:56 +0600
> Subject: Re: Porting SQL DB into HBASE
> From: imyousuf@gmail.com
> To: hbase-user@hadoop.apache.org
> 
> On Mon, Apr 12, 2010 at 2:55 PM, kranthi reddy <kr...@gmail.com> wrote:
> >
> > <snip />
> > The problem is denormalising these 20% tables is also extremely difficult
> > and we are planning to port them directly into hbase. And also denormalising
> > these tables would lead to a lot of redundant data.
> >
> 
> When denormalisation is been mentioned, it is implied having redundant
> data. The idea is as there is no join instead of doing N lookups (to
> replace N joins) keeping redundant data will allow you to do a single
> lookup and furthermore, HBase is great in scaling huge data sets.
> 

>From reading his last post, I suspect its less of an issue of denormalization than one of poor database design.

Paraphrasing his example, he has one table for users who access his system by phone. He has one table for users who access the system by van. 

Without looking at his table structures, its hard to see why he can't combine the two and then have a single field to denote access type (phone, van, etc ...) Even if there are fields that are unique to phone and fields that are unique to van, it doesn't mean that they can't be null.

Again, sometimes you have to look at alternatives to how you achieve your physical model of your database.
If you have a parent/child relationship between data, you can easily use a hierarchical model like Pick (U2,Revelation, etc) Not that I'm really a fan of Dick Pick (RIP) but this model would fit within HBase and work well. (I should add a caveat on column width and table size, but that's a different issue)

Going back to the problem the OP is having, he really needs to rethink his design. 

IMHO, I think one important issue that doesn't get addressed is thinking of your database as something more than a way to persist your objects. ;-) [And that is one thing that you debate at a bar, over beers (or your favorite beverage) :-) ]

HTH

-Mike

  
 		 	   		  
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

Re: Porting SQL DB into HBASE

Posted by kranthi reddy <kr...@gmail.com>.
Hi,

The amount of data being added is around 6-8GB per day. If we keep redundant
data the size increases exponentially and we are expecting it to increase by
atleast twice if not more.

Eg: Table 1 has 50 columns with unique entries and suppose "Column X" is the
primary key.
      Suppose we have Table 2 with 15 columns each with foreign key "Column
X".

      If for an entry "Y" in Table 1, we have 15 entries in Table 2 with
foreign key as "Y". Here we end up having 1 row in Table 1(50 cells filled)
and 15 rows in Table 2(15*15=225 cells filled).


    If these 2 tables are denormalized, we end up with 15 rows having
redundant data (15*50 cells + 15*15 cells = 975 cells filled).

Hope my example is clear.

Regards,
kranthi

On Wed, Apr 14, 2010 at 11:33 AM, Imran M Yousuf <im...@gmail.com> wrote:

> On Mon, Apr 12, 2010 at 2:55 PM, kranthi reddy <kr...@gmail.com>
> wrote:
> >
> > <snip />
> > The problem is denormalising these 20% tables is also extremely difficult
> > and we are planning to port them directly into hbase. And also
> denormalising
> > these tables would lead to a lot of redundant data.
> >
>
> When denormalisation is been mentioned, it is implied having redundant
> data. The idea is as there is no join instead of doing N lookups (to
> replace N joins) keeping redundant data will allow you to do a single
> lookup and furthermore, HBase is great in scaling huge data sets.
>
> When I started reading http://wiki.apache.org/hadoop/Hbase/FAQ#A20 it
> helped me understand it further.
>
> Hope this helps.
>
> Best regards,
>
> --
> Imran M Yousuf
> Entrepreneur & Software Engineer
> Smart IT Engineering
> Dhaka, Bangladesh
> Email: imran@smartitengineering.com
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>



-- 
Kranthi Reddy. B

http://www.setusoftware.com/setu/index.htm

Re: Porting SQL DB into HBASE

Posted by Imran M Yousuf <im...@gmail.com>.
On Mon, Apr 12, 2010 at 2:55 PM, kranthi reddy <kr...@gmail.com> wrote:
>
> <snip />
> The problem is denormalising these 20% tables is also extremely difficult
> and we are planning to port them directly into hbase. And also denormalising
> these tables would lead to a lot of redundant data.
>

When denormalisation is been mentioned, it is implied having redundant
data. The idea is as there is no join instead of doing N lookups (to
replace N joins) keeping redundant data will allow you to do a single
lookup and furthermore, HBase is great in scaling huge data sets.

When I started reading http://wiki.apache.org/hadoop/Hbase/FAQ#A20 it
helped me understand it further.

Hope this helps.

Best regards,

--
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Re: Porting SQL DB into HBASE

Posted by Jonathan Gray <jg...@facebook.com>.
Why split all the static data across 400 tables?  You could combine  
things into fewer tables by prefixing keys with something (maybe the  
original table names?).

Are your dynamic tables very large?  Can you address ak's question...  
Why hbase?

JG

On Apr 12, 2010, at 9:59 AM, "Amandeep Khurana" <am...@gmail.com>  
wrote:

> Kranthi,
>
> Your tables seem to be small. Why do you want to port them to HBase?
>
> -Amandeep
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Mon, Apr 12, 2010 at 1:55 AM, kranthi reddy <kranthili2020@gmail.com 
> >wrote:
>
>> HI jonathan,
>>
>> Sorry for the late response. Missed your reply.
>>
>> The problem is, around 80% (400) of the tables are static tables  
>> and the
>> remaining 20% (100) are dynamic tables that are updated on a daily  
>> basis.
>> The problem is denormalising these 20% tables is also extremely  
>> difficult
>> and we are planning to port them directly into hbase. And also
>> denormalising
>> these tables would lead to a lot of redundant data.
>>
>> Static tables have number of entries varying in hundreds and mostly  
>> less
>> than 1000 entries (rows). Where as the dynamic tables have more  
>> than 20,000
>> entries and each entry might be updated/modified at least once in a  
>> week.
>>
>> Regards,
>> kranthi
>>
>>
>> On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <jg...@facebook.com>
>> wrote:
>>
>>> Kranthi,
>>>
>>> HBase can handle a good number of tables, but tens or maybe a  
>>> hundred.
>> If
>>> you have 500 tables you should definitely be rethinking your schema
>> design.
>>> The issue is less about HBase being able to handle lots of tables,  
>>> and
>> much
>>> more about whether scattering your data across lots of tables will  
>>> be
>>> performant at read time.
>>>
>>>
>>> 1)  Impossible to answer that question without knowing the schemas  
>>> of the
>>> existing tables.
>>>
>>> 2)  Not really any relation between fault tolerance and the number  
>>> of
>>> tables except potentially for recovery time but this would be the  
>>> same
>> with
>>> few, very large tables.
>>>
>>> 3)  No difference in write performance.  Read performance if doing  
>>> simple
>>> key lookups would not be impacted, but most like having data  
>>> spread out
>> like
>>> this will mean you'll need joins of some sort.
>>>
>>> Can you tell more about your data and queries?
>>>
>>> JG
>>>
>>>> -----Original Message-----
>>>> From: kranthi reddy [mailto:kranthili2020@gmail.com]
>>>> Sent: Wednesday, March 31, 2010 3:05 AM
>>>> To: hbase-user@hadoop.apache.org
>>>> Subject: Porting SQL DB into HBASE
>>>>
>>>> Hi all,
>>>>
>>>>        I have run into some trouble while trying to port SQL DB to
>>>> Hbase.
>>>> The problem is my SQL DB has around 500 tables (approx) and it is  
>>>> very
>>>> badly
>>>> designed. Around 45-50 tables could be denormalised into a single  
>>>> table
>>>> and
>>>> the remaining tables are static tables. My doubts are
>>>>
>>>> 1) Is it possible to port this DB (Tables) to Hbase? If possible  
>>>> how?
>>>> 2) How many tables can Hbase support with tolerance towards  
>>>> failure?
>>>> 3) When so many tables are inserted, how is the performance going  
>>>> to be
>>>> effected? Will it remain same or degrade?
>>>>
>>>> One possible solution I think is using column family for each  
>>>> table.
>>>> But as
>>>> per my knowledge and previous experiments, I found Hbase isn't  
>>>> stable
>>>> when
>>>> column families are more than 5.
>>>>
>>>> Since every day large quantities of data is ported into the  
>>>> DataBase,
>>>> stability and fail proof system is highest priority.
>>>>
>>>> Hoping for a positive response.
>>>>
>>>> Thank you,
>>>> kranthi
>>>
>>
>>
>>
>> --
>> Kranthi Reddy. B
>> Room No : 98
>> Old Boys Hostel
>> IIIT-HYD
>>
>> -----------
>>
>> I don't know the key to success, but the key to failure is trying to
>> impress
>> others.
>>

Re: Porting SQL DB into HBASE

Posted by Amandeep Khurana <am...@gmail.com>.
Kranthi,

Your tables seem to be small. Why do you want to port them to HBase?

-Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Mon, Apr 12, 2010 at 1:55 AM, kranthi reddy <kr...@gmail.com>wrote:

> HI jonathan,
>
> Sorry for the late response. Missed your reply.
>
> The problem is, around 80% (400) of the tables are static tables and the
> remaining 20% (100) are dynamic tables that are updated on a daily basis.
> The problem is denormalising these 20% tables is also extremely difficult
> and we are planning to port them directly into hbase. And also
> denormalising
> these tables would lead to a lot of redundant data.
>
> Static tables have number of entries varying in hundreds and mostly less
> than 1000 entries (rows). Where as the dynamic tables have more than 20,000
> entries and each entry might be updated/modified at least once in a week.
>
> Regards,
> kranthi
>
>
> On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <jg...@facebook.com>
> wrote:
>
> > Kranthi,
> >
> > HBase can handle a good number of tables, but tens or maybe a hundred.
>  If
> > you have 500 tables you should definitely be rethinking your schema
> design.
> >  The issue is less about HBase being able to handle lots of tables, and
> much
> > more about whether scattering your data across lots of tables will be
> > performant at read time.
> >
> >
> > 1)  Impossible to answer that question without knowing the schemas of the
> > existing tables.
> >
> > 2)  Not really any relation between fault tolerance and the number of
> > tables except potentially for recovery time but this would be the same
> with
> > few, very large tables.
> >
> > 3)  No difference in write performance.  Read performance if doing simple
> > key lookups would not be impacted, but most like having data spread out
> like
> > this will mean you'll need joins of some sort.
> >
> > Can you tell more about your data and queries?
> >
> > JG
> >
> > > -----Original Message-----
> > > From: kranthi reddy [mailto:kranthili2020@gmail.com]
> > > Sent: Wednesday, March 31, 2010 3:05 AM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Porting SQL DB into HBASE
> > >
> > > Hi all,
> > >
> > >         I have run into some trouble while trying to port SQL DB to
> > > Hbase.
> > > The problem is my SQL DB has around 500 tables (approx) and it is very
> > > badly
> > > designed. Around 45-50 tables could be denormalised into a single table
> > > and
> > > the remaining tables are static tables. My doubts are
> > >
> > > 1) Is it possible to port this DB (Tables) to Hbase? If possible how?
> > > 2) How many tables can Hbase support with tolerance towards failure?
> > > 3) When so many tables are inserted, how is the performance going to be
> > > effected? Will it remain same or degrade?
> > >
> > > One possible solution I think is using column family for each table.
> > > But as
> > > per my knowledge and previous experiments, I found Hbase isn't stable
> > > when
> > > column families are more than 5.
> > >
> > > Since every day large quantities of data is ported into the DataBase,
> > > stability and fail proof system is highest priority.
> > >
> > > Hoping for a positive response.
> > >
> > > Thank you,
> > > kranthi
> >
>
>
>
> --
> Kranthi Reddy. B
> Room No : 98
> Old Boys Hostel
> IIIT-HYD
>
> -----------
>
> I don't know the key to success, but the key to failure is trying to
> impress
> others.
>

Re: Porting SQL DB into HBASE

Posted by kranthi reddy <kr...@gmail.com>.
Hi Amandeep,

 I get your point. But the situation is a bit more complex. I have tried to
explain it a better way below.

We have around 10 databases (Each may have 20-500 tables) which maintain
information about the people of a state. Each database is used to maintain
information for a different kind of service (like VAN DB maintains
information about users who availed the facility through parked VANS,
TELECOMMUNICATION DB maintains information about users who availed the
facility through TELEPHONE).

Now since a user can access the service through various services, he ended
up having different ID's in each database. Now we plan to combine all these
databases into a single database with one master table based on a few
heuristics like username,date of birth (if username and date of birth for a
person matches in different databases, it means that he is single user and
all his information from different databases can be stored as one single
entry) etc.

The problem at hand is that since we have different databases, and since the
data is increasing daily, it would be highly impossible to maintain and
improve the system in future. Also we might end up losing track of the
databases and information about a particular user. This was the reason why
we were planning to use Hbase.

Hope I am a bit more clearer now :) .
Regards,
kranthi

On Tue, Apr 13, 2010 at 11:01 AM, Amandeep Khurana <am...@gmail.com> wrote:

> You are mentioning 2 different reasons:
>
> Open source... Well, get MySQL..
>
> Large datasets? The table sizes that you reported in the earlier mails dont
> seem to justify a move to HBase. Keep in mind - to run HBase stably in
> production you would ideally want to have atleast 10 nodes. And you will
> have no SQL available. Make sure you are aware of the trade-offs between
> HBase v/s RDBMS before you decide... Even 100 millions rows can be handled
> by a relational database if it is tuned properly.
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Mon, Apr 12, 2010 at 10:17 PM, kranthi reddy <kranthili2020@gmail.com
> >wrote:
>
> > Hi all,
> >
> >
> > @Amandeep : The main reason for porting to Hbase is that it is an open
> > source. Currently the NGO is paying high licensing fee for Microsoft Sql
> > server. So in order to save money we planned to port to Hbase because of
> > scalability for large datasets.
> >
> > @Jonathan : The problem is that these static tables can't be combined.
> Each
> > table describes about different entities. For Eg: One static table might
> > contain information about all the counties in a country. And another
> table
> > might contain information all the doctors present in the country.
> >
> > That is the reason why I don't think it is possible to combine these
> static
> > tables as they don't have any primary/foreign keys referencing others.
> >
> > The dynamic tables are pretty huge (small when compared to what Hbase can
> > support). But these tables will be expanded and might contain upto 100
> > million in the coming future.
> >
> > Thank you,
> > kranthi
> >
> > On Tue, Apr 13, 2010 at 12:17 AM, Michael Segel
> > <mi...@hotmail.com>wrote:
> >
> > >
> > >
> > > Just an idea, take a look at a hierarchical design like Pick.
> > > I know its doable, but I don't know how well it will perform.
> > >
> > >
> > > > Date: Mon, 12 Apr 2010 14:25:48 +0530
> > > > Subject: Re: Porting SQL DB into HBASE
> > > > From: kranthili2020@gmail.com
> > > > To: hbase-user@hadoop.apache.org
> > > >
> > > > HI jonathan,
> > > >
> > > > Sorry for the late response. Missed your reply.
> > > >
> > > > The problem is, around 80% (400) of the tables are static tables and
> > the
> > > > remaining 20% (100) are dynamic tables that are updated on a daily
> > basis.
> > > > The problem is denormalising these 20% tables is also extremely
> > difficult
> > > > and we are planning to port them directly into hbase. And also
> > > denormalising
> > > > these tables would lead to a lot of redundant data.
> > > >
> > > > Static tables have number of entries varying in hundreds and mostly
> > less
> > > > than 1000 entries (rows). Where as the dynamic tables have more than
> > > 20,000
> > > > entries and each entry might be updated/modified at least once in a
> > week.
> > > >
> > > > Regards,
> > > > kranthi
> > > >
> > > >
> > > > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <jg...@facebook.com>
> > > wrote:
> > > >
> > > > > Kranthi,
> > > > >
> > > > > HBase can handle a good number of tables, but tens or maybe a
> > hundred.
> > >  If
> > > > > you have 500 tables you should definitely be rethinking your schema
> > > design.
> > > > >  The issue is less about HBase being able to handle lots of tables,
> > and
> > > much
> > > > > more about whether scattering your data across lots of tables will
> be
> > > > > performant at read time.
> > > > >
> > > > >
> > > > > 1)  Impossible to answer that question without knowing the schemas
> of
> > > the
> > > > > existing tables.
> > > > >
> > > > > 2)  Not really any relation between fault tolerance and the number
> of
> > > > > tables except potentially for recovery time but this would be the
> > same
> > > with
> > > > > few, very large tables.
> > > > >
> > > > > 3)  No difference in write performance.  Read performance if doing
> > > simple
> > > > > key lookups would not be impacted, but most like having data spread
> > out
> > > like
> > > > > this will mean you'll need joins of some sort.
> > > > >
> > > > > Can you tell more about your data and queries?
> > > > >
> > > > > JG
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: kranthi reddy [mailto:kranthili2020@gmail.com]
> > > > > > Sent: Wednesday, March 31, 2010 3:05 AM
> > > > > > To: hbase-user@hadoop.apache.org
> > > > > > Subject: Porting SQL DB into HBASE
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > >         I have run into some trouble while trying to port SQL DB
> to
> > > > > > Hbase.
> > > > > > The problem is my SQL DB has around 500 tables (approx) and it is
> > > very
> > > > > > badly
> > > > > > designed. Around 45-50 tables could be denormalised into a single
> > > table
> > > > > > and
> > > > > > the remaining tables are static tables. My doubts are
> > > > > >
> > > > > > 1) Is it possible to port this DB (Tables) to Hbase? If possible
> > how?
> > > > > > 2) How many tables can Hbase support with tolerance towards
> > failure?
> > > > > > 3) When so many tables are inserted, how is the performance going
> > to
> > > be
> > > > > > effected? Will it remain same or degrade?
> > > > > >
> > > > > > One possible solution I think is using column family for each
> > table.
> > > > > > But as
> > > > > > per my knowledge and previous experiments, I found Hbase isn't
> > stable
> > > > > > when
> > > > > > column families are more than 5.
> > > > > >
> > > > > > Since every day large quantities of data is ported into the
> > DataBase,
> > > > > > stability and fail proof system is highest priority.
> > > > > >
> > > > > > Hoping for a positive response.
> > > > > >
> > > > > > Thank you,
> > > > > > kranthi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Kranthi Reddy. B
> > > > Room No : 98
> > > > Old Boys Hostel
> > > > IIIT-HYD
> > > >
> > > > -----------
> > > >
> > > > I don't know the key to success, but the key to failure is trying to
> > > impress
> > > > others.
> > >
> > > _________________________________________________________________
> > > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars
> with
> > > Hotmail.
> > >
> > >
> >
> http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
> > >
> >
> >
> >
> > --
> > Kranthi Reddy. B
> > Room No : 98
> > Old Boys Hostel
> > IIIT-HYD
> >
> > -----------
> >
> > I don't know the key to success, but the key to failure is trying to
> > impress
> > others.
> >
>



-- 
Kranthi Reddy. B
Room No : 98
Old Boys Hostel
IIIT-HYD

-----------

I don't know the key to success, but the key to failure is trying to impress
others.

Re: Porting SQL DB into HBASE

Posted by Amandeep Khurana <am...@gmail.com>.
You are mentioning 2 different reasons:

Open source... Well, get MySQL..

Large datasets? The table sizes that you reported in the earlier mails dont
seem to justify a move to HBase. Keep in mind - to run HBase stably in
production you would ideally want to have atleast 10 nodes. And you will
have no SQL available. Make sure you are aware of the trade-offs between
HBase v/s RDBMS before you decide... Even 100 millions rows can be handled
by a relational database if it is tuned properly.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Mon, Apr 12, 2010 at 10:17 PM, kranthi reddy <kr...@gmail.com>wrote:

> Hi all,
>
>
> @Amandeep : The main reason for porting to Hbase is that it is an open
> source. Currently the NGO is paying high licensing fee for Microsoft Sql
> server. So in order to save money we planned to port to Hbase because of
> scalability for large datasets.
>
> @Jonathan : The problem is that these static tables can't be combined. Each
> table describes about different entities. For Eg: One static table might
> contain information about all the counties in a country. And another table
> might contain information all the doctors present in the country.
>
> That is the reason why I don't think it is possible to combine these static
> tables as they don't have any primary/foreign keys referencing others.
>
> The dynamic tables are pretty huge (small when compared to what Hbase can
> support). But these tables will be expanded and might contain upto 100
> million in the coming future.
>
> Thank you,
> kranthi
>
> On Tue, Apr 13, 2010 at 12:17 AM, Michael Segel
> <mi...@hotmail.com>wrote:
>
> >
> >
> > Just an idea, take a look at a hierarchical design like Pick.
> > I know its doable, but I don't know how well it will perform.
> >
> >
> > > Date: Mon, 12 Apr 2010 14:25:48 +0530
> > > Subject: Re: Porting SQL DB into HBASE
> > > From: kranthili2020@gmail.com
> > > To: hbase-user@hadoop.apache.org
> > >
> > > HI jonathan,
> > >
> > > Sorry for the late response. Missed your reply.
> > >
> > > The problem is, around 80% (400) of the tables are static tables and
> the
> > > remaining 20% (100) are dynamic tables that are updated on a daily
> basis.
> > > The problem is denormalising these 20% tables is also extremely
> difficult
> > > and we are planning to port them directly into hbase. And also
> > denormalising
> > > these tables would lead to a lot of redundant data.
> > >
> > > Static tables have number of entries varying in hundreds and mostly
> less
> > > than 1000 entries (rows). Where as the dynamic tables have more than
> > 20,000
> > > entries and each entry might be updated/modified at least once in a
> week.
> > >
> > > Regards,
> > > kranthi
> > >
> > >
> > > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <jg...@facebook.com>
> > wrote:
> > >
> > > > Kranthi,
> > > >
> > > > HBase can handle a good number of tables, but tens or maybe a
> hundred.
> >  If
> > > > you have 500 tables you should definitely be rethinking your schema
> > design.
> > > >  The issue is less about HBase being able to handle lots of tables,
> and
> > much
> > > > more about whether scattering your data across lots of tables will be
> > > > performant at read time.
> > > >
> > > >
> > > > 1)  Impossible to answer that question without knowing the schemas of
> > the
> > > > existing tables.
> > > >
> > > > 2)  Not really any relation between fault tolerance and the number of
> > > > tables except potentially for recovery time but this would be the
> same
> > with
> > > > few, very large tables.
> > > >
> > > > 3)  No difference in write performance.  Read performance if doing
> > simple
> > > > key lookups would not be impacted, but most like having data spread
> out
> > like
> > > > this will mean you'll need joins of some sort.
> > > >
> > > > Can you tell more about your data and queries?
> > > >
> > > > JG
> > > >
> > > > > -----Original Message-----
> > > > > From: kranthi reddy [mailto:kranthili2020@gmail.com]
> > > > > Sent: Wednesday, March 31, 2010 3:05 AM
> > > > > To: hbase-user@hadoop.apache.org
> > > > > Subject: Porting SQL DB into HBASE
> > > > >
> > > > > Hi all,
> > > > >
> > > > >         I have run into some trouble while trying to port SQL DB to
> > > > > Hbase.
> > > > > The problem is my SQL DB has around 500 tables (approx) and it is
> > very
> > > > > badly
> > > > > designed. Around 45-50 tables could be denormalised into a single
> > table
> > > > > and
> > > > > the remaining tables are static tables. My doubts are
> > > > >
> > > > > 1) Is it possible to port this DB (Tables) to Hbase? If possible
> how?
> > > > > 2) How many tables can Hbase support with tolerance towards
> failure?
> > > > > 3) When so many tables are inserted, how is the performance going
> to
> > be
> > > > > effected? Will it remain same or degrade?
> > > > >
> > > > > One possible solution I think is using column family for each
> table.
> > > > > But as
> > > > > per my knowledge and previous experiments, I found Hbase isn't
> stable
> > > > > when
> > > > > column families are more than 5.
> > > > >
> > > > > Since every day large quantities of data is ported into the
> DataBase,
> > > > > stability and fail proof system is highest priority.
> > > > >
> > > > > Hoping for a positive response.
> > > > >
> > > > > Thank you,
> > > > > kranthi
> > > >
> > >
> > >
> > >
> > > --
> > > Kranthi Reddy. B
> > > Room No : 98
> > > Old Boys Hostel
> > > IIIT-HYD
> > >
> > > -----------
> > >
> > > I don't know the key to success, but the key to failure is trying to
> > impress
> > > others.
> >
> > _________________________________________________________________
> > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with
> > Hotmail.
> >
> >
> http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
> >
>
>
>
> --
> Kranthi Reddy. B
> Room No : 98
> Old Boys Hostel
> IIIT-HYD
>
> -----------
>
> I don't know the key to success, but the key to failure is trying to
> impress
> others.
>

Re: Porting SQL DB into HBASE

Posted by kranthi reddy <kr...@gmail.com>.
Hi all,


@Amandeep : The main reason for porting to Hbase is that it is an open
source. Currently the NGO is paying high licensing fee for Microsoft Sql
server. So in order to save money we planned to port to Hbase because of
scalability for large datasets.

@Jonathan : The problem is that these static tables can't be combined. Each
table describes about different entities. For Eg: One static table might
contain information about all the counties in a country. And another table
might contain information all the doctors present in the country.

That is the reason why I don't think it is possible to combine these static
tables as they don't have any primary/foreign keys referencing others.

The dynamic tables are pretty huge (small when compared to what Hbase can
support). But these tables will be expanded and might contain upto 100
million in the coming future.

Thank you,
kranthi

On Tue, Apr 13, 2010 at 12:17 AM, Michael Segel
<mi...@hotmail.com>wrote:

>
>
> Just an idea, take a look at a hierarchical design like Pick.
> I know its doable, but I don't know how well it will perform.
>
>
> > Date: Mon, 12 Apr 2010 14:25:48 +0530
> > Subject: Re: Porting SQL DB into HBASE
> > From: kranthili2020@gmail.com
> > To: hbase-user@hadoop.apache.org
> >
> > HI jonathan,
> >
> > Sorry for the late response. Missed your reply.
> >
> > The problem is, around 80% (400) of the tables are static tables and the
> > remaining 20% (100) are dynamic tables that are updated on a daily basis.
> > The problem is denormalising these 20% tables is also extremely difficult
> > and we are planning to port them directly into hbase. And also
> denormalising
> > these tables would lead to a lot of redundant data.
> >
> > Static tables have number of entries varying in hundreds and mostly less
> > than 1000 entries (rows). Where as the dynamic tables have more than
> 20,000
> > entries and each entry might be updated/modified at least once in a week.
> >
> > Regards,
> > kranthi
> >
> >
> > On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <jg...@facebook.com>
> wrote:
> >
> > > Kranthi,
> > >
> > > HBase can handle a good number of tables, but tens or maybe a hundred.
>  If
> > > you have 500 tables you should definitely be rethinking your schema
> design.
> > >  The issue is less about HBase being able to handle lots of tables, and
> much
> > > more about whether scattering your data across lots of tables will be
> > > performant at read time.
> > >
> > >
> > > 1)  Impossible to answer that question without knowing the schemas of
> the
> > > existing tables.
> > >
> > > 2)  Not really any relation between fault tolerance and the number of
> > > tables except potentially for recovery time but this would be the same
> with
> > > few, very large tables.
> > >
> > > 3)  No difference in write performance.  Read performance if doing
> simple
> > > key lookups would not be impacted, but most like having data spread out
> like
> > > this will mean you'll need joins of some sort.
> > >
> > > Can you tell more about your data and queries?
> > >
> > > JG
> > >
> > > > -----Original Message-----
> > > > From: kranthi reddy [mailto:kranthili2020@gmail.com]
> > > > Sent: Wednesday, March 31, 2010 3:05 AM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: Porting SQL DB into HBASE
> > > >
> > > > Hi all,
> > > >
> > > >         I have run into some trouble while trying to port SQL DB to
> > > > Hbase.
> > > > The problem is my SQL DB has around 500 tables (approx) and it is
> very
> > > > badly
> > > > designed. Around 45-50 tables could be denormalised into a single
> table
> > > > and
> > > > the remaining tables are static tables. My doubts are
> > > >
> > > > 1) Is it possible to port this DB (Tables) to Hbase? If possible how?
> > > > 2) How many tables can Hbase support with tolerance towards failure?
> > > > 3) When so many tables are inserted, how is the performance going to
> be
> > > > effected? Will it remain same or degrade?
> > > >
> > > > One possible solution I think is using column family for each table.
> > > > But as
> > > > per my knowledge and previous experiments, I found Hbase isn't stable
> > > > when
> > > > column families are more than 5.
> > > >
> > > > Since every day large quantities of data is ported into the DataBase,
> > > > stability and fail proof system is highest priority.
> > > >
> > > > Hoping for a positive response.
> > > >
> > > > Thank you,
> > > > kranthi
> > >
> >
> >
> >
> > --
> > Kranthi Reddy. B
> > Room No : 98
> > Old Boys Hostel
> > IIIT-HYD
> >
> > -----------
> >
> > I don't know the key to success, but the key to failure is trying to
> impress
> > others.
>
> _________________________________________________________________
> The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with
> Hotmail.
>
> http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
>



-- 
Kranthi Reddy. B
Room No : 98
Old Boys Hostel
IIIT-HYD

-----------

I don't know the key to success, but the key to failure is trying to impress
others.

RE: Porting SQL DB into HBASE

Posted by Michael Segel <mi...@hotmail.com>.

Just an idea, take a look at a hierarchical design like Pick. 
I know its doable, but I don't know how well it will perform.

 
> Date: Mon, 12 Apr 2010 14:25:48 +0530
> Subject: Re: Porting SQL DB into HBASE
> From: kranthili2020@gmail.com
> To: hbase-user@hadoop.apache.org
> 
> HI jonathan,
> 
> Sorry for the late response. Missed your reply.
> 
> The problem is, around 80% (400) of the tables are static tables and the
> remaining 20% (100) are dynamic tables that are updated on a daily basis.
> The problem is denormalising these 20% tables is also extremely difficult
> and we are planning to port them directly into hbase. And also denormalising
> these tables would lead to a lot of redundant data.
> 
> Static tables have number of entries varying in hundreds and mostly less
> than 1000 entries (rows). Where as the dynamic tables have more than 20,000
> entries and each entry might be updated/modified at least once in a week.
> 
> Regards,
> kranthi
> 
> 
> On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <jg...@facebook.com> wrote:
> 
> > Kranthi,
> >
> > HBase can handle a good number of tables, but tens or maybe a hundred.  If
> > you have 500 tables you should definitely be rethinking your schema design.
> >  The issue is less about HBase being able to handle lots of tables, and much
> > more about whether scattering your data across lots of tables will be
> > performant at read time.
> >
> >
> > 1)  Impossible to answer that question without knowing the schemas of the
> > existing tables.
> >
> > 2)  Not really any relation between fault tolerance and the number of
> > tables except potentially for recovery time but this would be the same with
> > few, very large tables.
> >
> > 3)  No difference in write performance.  Read performance if doing simple
> > key lookups would not be impacted, but most like having data spread out like
> > this will mean you'll need joins of some sort.
> >
> > Can you tell more about your data and queries?
> >
> > JG
> >
> > > -----Original Message-----
> > > From: kranthi reddy [mailto:kranthili2020@gmail.com]
> > > Sent: Wednesday, March 31, 2010 3:05 AM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Porting SQL DB into HBASE
> > >
> > > Hi all,
> > >
> > >         I have run into some trouble while trying to port SQL DB to
> > > Hbase.
> > > The problem is my SQL DB has around 500 tables (approx) and it is very
> > > badly
> > > designed. Around 45-50 tables could be denormalised into a single table
> > > and
> > > the remaining tables are static tables. My doubts are
> > >
> > > 1) Is it possible to port this DB (Tables) to Hbase? If possible how?
> > > 2) How many tables can Hbase support with tolerance towards failure?
> > > 3) When so many tables are inserted, how is the performance going to be
> > > effected? Will it remain same or degrade?
> > >
> > > One possible solution I think is using column family for each table.
> > > But as
> > > per my knowledge and previous experiments, I found Hbase isn't stable
> > > when
> > > column families are more than 5.
> > >
> > > Since every day large quantities of data is ported into the DataBase,
> > > stability and fail proof system is highest priority.
> > >
> > > Hoping for a positive response.
> > >
> > > Thank you,
> > > kranthi
> >
> 
> 
> 
> -- 
> Kranthi Reddy. B
> Room No : 98
> Old Boys Hostel
> IIIT-HYD
> 
> -----------
> 
> I don't know the key to success, but the key to failure is trying to impress
> others.
 		 	   		  
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

Re: Porting SQL DB into HBASE

Posted by kranthi reddy <kr...@gmail.com>.
HI jonathan,

Sorry for the late response. Missed your reply.

The problem is, around 80% (400) of the tables are static tables and the
remaining 20% (100) are dynamic tables that are updated on a daily basis.
The problem is denormalising these 20% tables is also extremely difficult
and we are planning to port them directly into hbase. And also denormalising
these tables would lead to a lot of redundant data.

Static tables have number of entries varying in hundreds and mostly less
than 1000 entries (rows). Where as the dynamic tables have more than 20,000
entries and each entry might be updated/modified at least once in a week.

Regards,
kranthi


On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray <jg...@facebook.com> wrote:

> Kranthi,
>
> HBase can handle a good number of tables, but tens or maybe a hundred.  If
> you have 500 tables you should definitely be rethinking your schema design.
>  The issue is less about HBase being able to handle lots of tables, and much
> more about whether scattering your data across lots of tables will be
> performant at read time.
>
>
> 1)  Impossible to answer that question without knowing the schemas of the
> existing tables.
>
> 2)  Not really any relation between fault tolerance and the number of
> tables except potentially for recovery time but this would be the same with
> few, very large tables.
>
> 3)  No difference in write performance.  Read performance if doing simple
> key lookups would not be impacted, but most like having data spread out like
> this will mean you'll need joins of some sort.
>
> Can you tell more about your data and queries?
>
> JG
>
> > -----Original Message-----
> > From: kranthi reddy [mailto:kranthili2020@gmail.com]
> > Sent: Wednesday, March 31, 2010 3:05 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: Porting SQL DB into HBASE
> >
> > Hi all,
> >
> >         I have run into some trouble while trying to port SQL DB to
> > Hbase.
> > The problem is my SQL DB has around 500 tables (approx) and it is very
> > badly
> > designed. Around 45-50 tables could be denormalised into a single table
> > and
> > the remaining tables are static tables. My doubts are
> >
> > 1) Is it possible to port this DB (Tables) to Hbase? If possible how?
> > 2) How many tables can Hbase support with tolerance towards failure?
> > 3) When so many tables are inserted, how is the performance going to be
> > effected? Will it remain same or degrade?
> >
> > One possible solution I think is using column family for each table.
> > But as
> > per my knowledge and previous experiments, I found Hbase isn't stable
> > when
> > column families are more than 5.
> >
> > Since every day large quantities of data is ported into the DataBase,
> > stability and fail proof system is highest priority.
> >
> > Hoping for a positive response.
> >
> > Thank you,
> > kranthi
>



-- 
Kranthi Reddy. B
Room No : 98
Old Boys Hostel
IIIT-HYD

-----------

I don't know the key to success, but the key to failure is trying to impress
others.

RE: Porting SQL DB into HBASE

Posted by Jonathan Gray <jg...@facebook.com>.
Kranthi,

HBase can handle a good number of tables, but tens or maybe a hundred.  If you have 500 tables you should definitely be rethinking your schema design.  The issue is less about HBase being able to handle lots of tables, and much more about whether scattering your data across lots of tables will be performant at read time.


1)  Impossible to answer that question without knowing the schemas of the existing tables.

2)  Not really any relation between fault tolerance and the number of tables except potentially for recovery time but this would be the same with few, very large tables.

3)  No difference in write performance.  Read performance if doing simple key lookups would not be impacted, but most like having data spread out like this will mean you'll need joins of some sort.

Can you tell more about your data and queries?

JG

> -----Original Message-----
> From: kranthi reddy [mailto:kranthili2020@gmail.com]
> Sent: Wednesday, March 31, 2010 3:05 AM
> To: hbase-user@hadoop.apache.org
> Subject: Porting SQL DB into HBASE
> 
> Hi all,
> 
>         I have run into some trouble while trying to port SQL DB to
> Hbase.
> The problem is my SQL DB has around 500 tables (approx) and it is very
> badly
> designed. Around 45-50 tables could be denormalised into a single table
> and
> the remaining tables are static tables. My doubts are
> 
> 1) Is it possible to port this DB (Tables) to Hbase? If possible how?
> 2) How many tables can Hbase support with tolerance towards failure?
> 3) When so many tables are inserted, how is the performance going to be
> effected? Will it remain same or degrade?
> 
> One possible solution I think is using column family for each table.
> But as
> per my knowledge and previous experiments, I found Hbase isn't stable
> when
> column families are more than 5.
> 
> Since every day large quantities of data is ported into the DataBase,
> stability and fail proof system is highest priority.
> 
> Hoping for a positive response.
> 
> Thank you,
> kranthi