You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Kiran <ki...@gmail.com> on 2013/04/28 05:12:51 UTC

HBase and Datawarehouse

What is the difference between a NoSQL database like HBase and a data
warehouse? Doesn’t both store data from disparate sources and formats?



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase and Datawarehouse

Posted by Michael Segel <mi...@hotmail.com>.

Hmmm

I don't recommend HBase in situations where you are not running a M/R Framework. Sorry, as much as I love HBase, IMHO there are probably better solutions for a standalone NoSQL Databases. (YMMV depending on your use case.) 
The strength of HBase is that its part of the Hadoop Ecosystem. 

I would think that it would probably be better to go virtual than go multi-region servers on bare hardware.  You take a hit on I/O, but you can work around that too. 

But I'm conservative unless I have to get creative. ;-) 

But something to consider when white boarding ideas... 



On Apr 30, 2013, at 1:30 PM, Andrew Purtell <ap...@apache.org> wrote:

> You wouldn't do that if colocating MR. It is one way to soak up "extra" RAM
> on a large RAM box, although I'm not sure I would recommend it (I have no
> personal experience trying it, yet). For more on this where people are
> actively considering it, see
> https://issues.apache.org/jira/browse/BIGTOP-732
> 
> On Tue, Apr 30, 2013 at 11:14 AM, Michael Segel
> <mi...@hotmail.com>wrote:
> 
>> Multiple RS per host?
>> Huh?
>> 
>> That seems very counter intuitive and potentially problematic w M/R jobs.
>> Could you expand on this?
>> 
>> Thx
>> 
>> -Mike
>> 
>> On Apr 30, 2013, at 12:38 PM, Andrew Purtell <ap...@apache.org> wrote:
>> 
>>> Rules of thumb for starting off safely and for easing support issues are
>>> really good to have, but there are no hard barriers or singular
>> approaches:
>>> use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
>>> multiple regionservers per host. It is going to depend on how the cluster
>>> is used and loaded. If we are talking about coprocessors, then effective
>>> limits are less clear, using a coprocessor to integrate an external
>> process
>>> implemented with native code communicating over memory mapped files in
>>> /dev/shm isn't outside what is possible (strawman alert).
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <kevin.odell@cloudera.com
>>> wrote:
>>> 
>>>> Asaf,
>>>> 
>>>> The heap barrier is something of a legend :)  You can ask 10 different
>>>> HBase committers what they think the max heap is and get 10 different
>>>> answers.  This is my take on heap sizes from the many clusters I have
>> dealt
>>>> with:
>>>> 
>>>> 8GB -> Standard heap size, and tends to run fine without any tuning
>>>> 
>>>> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
>>>> cause churn(usually blockcache)
>>>> 
>>>> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
>>>> and ZK timeouts
>>>> 
>>>> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise
>> the
>>>> ZK timeout a little higher
>>>> 
>>>> 32GB -> We do have a couple people running this high, but the pain out
>>>> weighs the gains(IMHO)
>>>> 
>>>> 64GB -> Let me know how it goes :)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <ap...@apache.org>
>>>> wrote:
>>>> 
>>>>> I don't wish to be rude, but you are making odd claims as fact as
>>>>> "mentioned in a couple of posts". It will be difficult to have a
>> serious
>>>>> conversation. I encourage you to test your hypotheses and let us know
>> if
>>>> in
>>>>> fact there is a JVM "heap barrier" (and where it may be).
>>>>> 
>>>>> On Monday, April 29, 2013, Asaf Mesika wrote:
>>>>> 
>>>>>> I think for Pheoenix truly to succeed, it's need HBase to break the
>> JVM
>>>>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
>>>> of
>>>>>> analytics queries utilize memory, thus since its memory is shared with
>>>>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
>>>>>> Pheonix was implemented outside HBase on the same machine (like Drill
>>>> or
>>>>>> Impala is doing), you can have 60GB for this process, running many
>> OLAP
>>>>>> queries in parallel, utilizing the same data set.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org
>>>>> <javascript:;>>
>>>>>> wrote:
>>>>>> 
>>>>>>>> HBase is not really intended for heavy data crunching
>>>>>>> 
>>>>>>> Yes it is. This is why we have first class MapReduce integration and
>>>>>>> optimized scanners.
>>>>>>> 
>>>>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
>>>>>> OLAP.
>>>>>>> 
>>>>>>> Urban Airship's Datacube is an example of a successful OLAP project
>>>>>>> implemented on HBase: http://github.com/urbanairship/datacube
>>>>>>> 
>>>>>>> "Urban Airship uses the datacube project to support its analytics
>>>> stack
>>>>>> for
>>>>>>> mobile apps. We handle about ~10K events per second per node."
>>>>>>> 
>>>>>>> 
>>>>>>> Also there is Adobe's SaasBase:
>>>>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>>>>>> 
>>>>>>> Etc.
>>>>>>> 
>>>>>>> Where an HBase OLAP application will differ tremendously from a
>>>>>> traditional
>>>>>>> data warehouse is of course in the interface to the datastore. You
>>>> have
>>>>>> to
>>>>>>> design and speak in the language of the HBase API, though Phoenix (
>>>>>>> https://github.com/forcedotcom/phoenix) is changing that.
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com
>>>>> <javascript:;>
>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Kiran,
>>>>>>>> 
>>>>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
>>>>>> based
>>>>>>>> database meant for lookups or queries that expect response in
>>>>>>> milliseconds.
>>>>>>>> OLAP i.e. data warehouse usually involves heavy data crunching.
>>>> HBase
>>>>>> is
>>>>>>>> not really intended for heavy data crunching. If you want to just
>>>>> store
>>>>>>>> denoramlized data and do simple queries then HBase is good. For
>>>> OLAP
>>>>>> kind
>>>>>>>> of stuff, you can make HBase work but IMO you will be better off
>>>>> using
>>>>>>> Hive
>>>>>>>> for  data warehousing.
>>>>>>>> 
>>>>>>>> HTH,
>>>>>>>> Anil Gupta
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
>>>>> <javascript:;>>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> But in HBase data can be said to be in  denormalised state as the
>>>>>>>>> methodology
>>>>>>>>> used for storage is a (column family:column) based flexible
>>>> schema
>>>>>>> .Also,
>>>>>>>>> from Google's  big table paper it is evident that HBase is
>>>> capable
>>>>> of
>>>>>>>> doing
>>>>>>>>> OLAP.SO where does the difference lie?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
>>>>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> 
>>>>>>>  - Andy
>>>>>>> 
>>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>>> Hein
>>>>>>> (via Tom White)
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> 
>>>>>  - Andy
>>>>> 
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>>> (via Tom White)
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Kevin O'Dell
>>>> Systems Engineer, Cloudera
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>  - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>> 
>> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: HBase and Datawarehouse

Posted by Andrew Purtell <ap...@apache.org>.

You wouldn't do that if colocating MR. It is one way to soak up "extra" RAM
on a large RAM box, although I'm not sure I would recommend it (I have no
personal experience trying it, yet). For more on this where people are
actively considering it, see
https://issues.apache.org/jira/browse/BIGTOP-732

On Tue, Apr 30, 2013 at 11:14 AM, Michael Segel
<mi...@hotmail.com>wrote:

> Multiple RS per host?
> Huh?
>
> That seems very counter intuitive and potentially problematic w M/R jobs.
> Could you expand on this?
>
> Thx
>
> -Mike
>
> On Apr 30, 2013, at 12:38 PM, Andrew Purtell <ap...@apache.org> wrote:
>
> > Rules of thumb for starting off safely and for easing support issues are
> > really good to have, but there are no hard barriers or singular
> approaches:
> > use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
> > multiple regionservers per host. It is going to depend on how the cluster
> > is used and loaded. If we are talking about coprocessors, then effective
> > limits are less clear, using a coprocessor to integrate an external
> process
> > implemented with native code communicating over memory mapped files in
> > /dev/shm isn't outside what is possible (strawman alert).
> >
> >
> > On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
> >
> >> Asaf,
> >>
> >>  The heap barrier is something of a legend :)  You can ask 10 different
> >> HBase committers what they think the max heap is and get 10 different
> >> answers.  This is my take on heap sizes from the many clusters I have
> dealt
> >> with:
> >>
> >> 8GB -> Standard heap size, and tends to run fine without any tuning
> >>
> >> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
> >> cause churn(usually blockcache)
> >>
> >> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
> >> and ZK timeouts
> >>
> >> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise
> the
> >> ZK timeout a little higher
> >>
> >> 32GB -> We do have a couple people running this high, but the pain out
> >> weighs the gains(IMHO)
> >>
> >> 64GB -> Let me know how it goes :)
> >>
> >>
> >>
> >>
> >> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >>
> >>> I don't wish to be rude, but you are making odd claims as fact as
> >>> "mentioned in a couple of posts". It will be difficult to have a
> serious
> >>> conversation. I encourage you to test your hypotheses and let us know
> if
> >> in
> >>> fact there is a JVM "heap barrier" (and where it may be).
> >>>
> >>> On Monday, April 29, 2013, Asaf Mesika wrote:
> >>>
> >>>> I think for Pheoenix truly to succeed, it's need HBase to break the
> JVM
> >>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
> >> of
> >>>> analytics queries utilize memory, thus since its memory is shared with
> >>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
> >>>> Pheonix was implemented outside HBase on the same machine (like Drill
> >> or
> >>>> Impala is doing), you can have 60GB for this process, running many
> OLAP
> >>>> queries in parallel, utilizing the same data set.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org
> >>> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>>>> HBase is not really intended for heavy data crunching
> >>>>>
> >>>>> Yes it is. This is why we have first class MapReduce integration and
> >>>>> optimized scanners.
> >>>>>
> >>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
> >>>> OLAP.
> >>>>>
> >>>>> Urban Airship's Datacube is an example of a successful OLAP project
> >>>>> implemented on HBase: http://github.com/urbanairship/datacube
> >>>>>
> >>>>> "Urban Airship uses the datacube project to support its analytics
> >> stack
> >>>> for
> >>>>> mobile apps. We handle about ~10K events per second per node."
> >>>>>
> >>>>>
> >>>>> Also there is Adobe's SaasBase:
> >>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
> >>>>>
> >>>>> Etc.
> >>>>>
> >>>>> Where an HBase OLAP application will differ tremendously from a
> >>>> traditional
> >>>>> data warehouse is of course in the interface to the datastore. You
> >> have
> >>>> to
> >>>>> design and speak in the language of the HBase API, though Phoenix (
> >>>>> https://github.com/forcedotcom/phoenix) is changing that.
> >>>>>
> >>>>>
> >>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com
> >>> <javascript:;>
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Kiran,
> >>>>>>
> >>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
> >>>> based
> >>>>>> database meant for lookups or queries that expect response in
> >>>>> milliseconds.
> >>>>>> OLAP i.e. data warehouse usually involves heavy data crunching.
> >> HBase
> >>>> is
> >>>>>> not really intended for heavy data crunching. If you want to just
> >>> store
> >>>>>> denoramlized data and do simple queries then HBase is good. For
> >> OLAP
> >>>> kind
> >>>>>> of stuff, you can make HBase work but IMO you will be better off
> >>> using
> >>>>> Hive
> >>>>>> for  data warehousing.
> >>>>>>
> >>>>>> HTH,
> >>>>>> Anil Gupta
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
> >>> <javascript:;>>
> >>>> wrote:
> >>>>>>
> >>>>>>> But in HBase data can be said to be in  denormalised state as the
> >>>>>>> methodology
> >>>>>>> used for storage is a (column family:column) based flexible
> >> schema
> >>>>> .Also,
> >>>>>>> from Google's  big table paper it is evident that HBase is
> >> capable
> >>> of
> >>>>>> doing
> >>>>>>> OLAP.SO where does the difference lie?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> View this message in context:
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> >>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>>
> >>>>>   - Andy
> >>>>>
> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >>> Hein
> >>>>> (via Tom White)
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>>   - Andy
> >>>
> >>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>> (via Tom White)
> >>>
> >>
> >>
> >>
> >> --
> >> Kevin O'Dell
> >> Systems Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase and Datawarehouse

Posted by Michael Segel <mi...@hotmail.com>.

Multiple RS per host? 
Huh? 

That seems very counter intuitive and potentially problematic w M/R jobs. 
Could you expand on this? 

Thx

-Mike

On Apr 30, 2013, at 12:38 PM, Andrew Purtell <ap...@apache.org> wrote:

> Rules of thumb for starting off safely and for easing support issues are
> really good to have, but there are no hard barriers or singular approaches:
> use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
> multiple regionservers per host. It is going to depend on how the cluster
> is used and loaded. If we are talking about coprocessors, then effective
> limits are less clear, using a coprocessor to integrate an external process
> implemented with native code communicating over memory mapped files in
> /dev/shm isn't outside what is possible (strawman alert).
> 
> 
> On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <ke...@cloudera.com>wrote:
> 
>> Asaf,
>> 
>>  The heap barrier is something of a legend :)  You can ask 10 different
>> HBase committers what they think the max heap is and get 10 different
>> answers.  This is my take on heap sizes from the many clusters I have dealt
>> with:
>> 
>> 8GB -> Standard heap size, and tends to run fine without any tuning
>> 
>> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
>> cause churn(usually blockcache)
>> 
>> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
>> and ZK timeouts
>> 
>> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
>> ZK timeout a little higher
>> 
>> 32GB -> We do have a couple people running this high, but the pain out
>> weighs the gains(IMHO)
>> 
>> 64GB -> Let me know how it goes :)
>> 
>> 
>> 
>> 
>> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <ap...@apache.org>
>> wrote:
>> 
>>> I don't wish to be rude, but you are making odd claims as fact as
>>> "mentioned in a couple of posts". It will be difficult to have a serious
>>> conversation. I encourage you to test your hypotheses and let us know if
>> in
>>> fact there is a JVM "heap barrier" (and where it may be).
>>> 
>>> On Monday, April 29, 2013, Asaf Mesika wrote:
>>> 
>>>> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
>>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
>> of
>>>> analytics queries utilize memory, thus since its memory is shared with
>>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
>>>> Pheonix was implemented outside HBase on the same machine (like Drill
>> or
>>>> Impala is doing), you can have 60GB for this process, running many OLAP
>>>> queries in parallel, utilizing the same data set.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org
>>> <javascript:;>>
>>>> wrote:
>>>> 
>>>>>> HBase is not really intended for heavy data crunching
>>>>> 
>>>>> Yes it is. This is why we have first class MapReduce integration and
>>>>> optimized scanners.
>>>>> 
>>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
>>>> OLAP.
>>>>> 
>>>>> Urban Airship's Datacube is an example of a successful OLAP project
>>>>> implemented on HBase: http://github.com/urbanairship/datacube
>>>>> 
>>>>> "Urban Airship uses the datacube project to support its analytics
>> stack
>>>> for
>>>>> mobile apps. We handle about ~10K events per second per node."
>>>>> 
>>>>> 
>>>>> Also there is Adobe's SaasBase:
>>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>>>> 
>>>>> Etc.
>>>>> 
>>>>> Where an HBase OLAP application will differ tremendously from a
>>>> traditional
>>>>> data warehouse is of course in the interface to the datastore. You
>> have
>>>> to
>>>>> design and speak in the language of the HBase API, though Phoenix (
>>>>> https://github.com/forcedotcom/phoenix) is changing that.
>>>>> 
>>>>> 
>>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com
>>> <javascript:;>
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi Kiran,
>>>>>> 
>>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
>>>> based
>>>>>> database meant for lookups or queries that expect response in
>>>>> milliseconds.
>>>>>> OLAP i.e. data warehouse usually involves heavy data crunching.
>> HBase
>>>> is
>>>>>> not really intended for heavy data crunching. If you want to just
>>> store
>>>>>> denoramlized data and do simple queries then HBase is good. For
>> OLAP
>>>> kind
>>>>>> of stuff, you can make HBase work but IMO you will be better off
>>> using
>>>>> Hive
>>>>>> for  data warehousing.
>>>>>> 
>>>>>> HTH,
>>>>>> Anil Gupta
>>>>>> 
>>>>>> 
>>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
>>> <javascript:;>>
>>>> wrote:
>>>>>> 
>>>>>>> But in HBase data can be said to be in  denormalised state as the
>>>>>>> methodology
>>>>>>> used for storage is a (column family:column) based flexible
>> schema
>>>>> .Also,
>>>>>>> from Google's  big table paper it is evident that HBase is
>> capable
>>> of
>>>>>> doing
>>>>>>> OLAP.SO where does the difference lie?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
>>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> 
>>>>>   - Andy
>>>>> 
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>>>> (via Tom White)
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>> 
>> 
>> 
>> 
>> --
>> Kevin O'Dell
>> Systems Engineer, Cloudera
>> 
> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: HBase and Datawarehouse

Posted by Andrew Purtell <ap...@apache.org>.

Running more than one RS on a host is an option for soaking up "extra" RAM,
since that is what we are discussing, but I can't recommend it because I
have no experience with that approach. I think I do want to experiment with
it, but not on a box with less than something like 16 or 24 cores.


On Tue, Apr 30, 2013 at 11:19 AM, Amandeep Khurana <am...@gmail.com> wrote:

> Multiple RS' per host gets you around the WAL bottleneck as well. But
> it's operationally less than ideal. Do you usually recommend this
> approach, Andy? I've shied away from it mostly.
>
> On Apr 30, 2013, at 10:38 AM, Andrew Purtell <ap...@apache.org> wrote:
>
> > Rules of thumb for starting off safely and for easing support issues are
> > really good to have, but there are no hard barriers or singular
> approaches:
> > use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
> > multiple regionservers per host. It is going to depend on how the cluster
> > is used and loaded. If we are talking about coprocessors, then effective
> > limits are less clear, using a coprocessor to integrate an external
> process
> > implemented with native code communicating over memory mapped files in
> > /dev/shm isn't outside what is possible (strawman alert).
> >
> >
> > On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
> >
> >> Asaf,
> >>
> >>  The heap barrier is something of a legend :)  You can ask 10 different
> >> HBase committers what they think the max heap is and get 10 different
> >> answers.  This is my take on heap sizes from the many clusters I have
> dealt
> >> with:
> >>
> >> 8GB -> Standard heap size, and tends to run fine without any tuning
> >>
> >> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
> >> cause churn(usually blockcache)
> >>
> >> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
> >> and ZK timeouts
> >>
> >> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise
> the
> >> ZK timeout a little higher
> >>
> >> 32GB -> We do have a couple people running this high, but the pain out
> >> weighs the gains(IMHO)
> >>
> >> 64GB -> Let me know how it goes :)
> >>
> >>
> >>
> >>
> >> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >>
> >>> I don't wish to be rude, but you are making odd claims as fact as
> >>> "mentioned in a couple of posts". It will be difficult to have a
> serious
> >>> conversation. I encourage you to test your hypotheses and let us know
> if
> >> in
> >>> fact there is a JVM "heap barrier" (and where it may be).
> >>>
> >>> On Monday, April 29, 2013, Asaf Mesika wrote:
> >>>
> >>>> I think for Pheoenix truly to succeed, it's need HBase to break the
> JVM
> >>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
> >> of
> >>>> analytics queries utilize memory, thus since its memory is shared with
> >>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
> >>>> Pheonix was implemented outside HBase on the same machine (like Drill
> >> or
> >>>> Impala is doing), you can have 60GB for this process, running many
> OLAP
> >>>> queries in parallel, utilizing the same data set.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org
> >>> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>>>> HBase is not really intended for heavy data crunching
> >>>>>
> >>>>> Yes it is. This is why we have first class MapReduce integration and
> >>>>> optimized scanners.
> >>>>>
> >>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
> >>>> OLAP.
> >>>>>
> >>>>> Urban Airship's Datacube is an example of a successful OLAP project
> >>>>> implemented on HBase: http://github.com/urbanairship/datacube
> >>>>>
> >>>>> "Urban Airship uses the datacube project to support its analytics
> >> stack
> >>>> for
> >>>>> mobile apps. We handle about ~10K events per second per node."
> >>>>>
> >>>>>
> >>>>> Also there is Adobe's SaasBase:
> >>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
> >>>>>
> >>>>> Etc.
> >>>>>
> >>>>> Where an HBase OLAP application will differ tremendously from a
> >>>> traditional
> >>>>> data warehouse is of course in the interface to the datastore. You
> >> have
> >>>> to
> >>>>> design and speak in the language of the HBase API, though Phoenix (
> >>>>> https://github.com/forcedotcom/phoenix) is changing that.
> >>>>>
> >>>>>
> >>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com
> >>> <javascript:;>
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Kiran,
> >>>>>>
> >>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
> >>>> based
> >>>>>> database meant for lookups or queries that expect response in
> >>>>> milliseconds.
> >>>>>> OLAP i.e. data warehouse usually involves heavy data crunching.
> >> HBase
> >>>> is
> >>>>>> not really intended for heavy data crunching. If you want to just
> >>> store
> >>>>>> denoramlized data and do simple queries then HBase is good. For
> >> OLAP
> >>>> kind
> >>>>>> of stuff, you can make HBase work but IMO you will be better off
> >>> using
> >>>>> Hive
> >>>>>> for  data warehousing.
> >>>>>>
> >>>>>> HTH,
> >>>>>> Anil Gupta
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
> >>> <javascript:;>>
> >>>> wrote:
> >>>>>>
> >>>>>>> But in HBase data can be said to be in  denormalised state as the
> >>>>>>> methodology
> >>>>>>> used for storage is a (column family:column) based flexible
> >> schema
> >>>>> .Also,
> >>>>>>> from Google's  big table paper it is evident that HBase is
> >> capable
> >>> of
> >>>>>> doing
> >>>>>>> OLAP.SO where does the difference lie?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> View this message in context:
> >>
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> >>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>>
> >>>>>   - Andy
> >>>>>
> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >>> Hein
> >>>>> (via Tom White)
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>>   - Andy
> >>>
> >>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>> (via Tom White)
> >>
> >>
> >>
> >> --
> >> Kevin O'Dell
> >> Systems Engineer, Cloudera
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase and Datawarehouse

Posted by Amandeep Khurana <am...@gmail.com>.

Multiple RS' per host gets you around the WAL bottleneck as well. But
it's operationally less than ideal. Do you usually recommend this
approach, Andy? I've shied away from it mostly.

On Apr 30, 2013, at 10:38 AM, Andrew Purtell <ap...@apache.org> wrote:

> Rules of thumb for starting off safely and for easing support issues are
> really good to have, but there are no hard barriers or singular approaches:
> use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
> multiple regionservers per host. It is going to depend on how the cluster
> is used and loaded. If we are talking about coprocessors, then effective
> limits are less clear, using a coprocessor to integrate an external process
> implemented with native code communicating over memory mapped files in
> /dev/shm isn't outside what is possible (strawman alert).
>
>
> On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <ke...@cloudera.com>wrote:
>
>> Asaf,
>>
>>  The heap barrier is something of a legend :)  You can ask 10 different
>> HBase committers what they think the max heap is and get 10 different
>> answers.  This is my take on heap sizes from the many clusters I have dealt
>> with:
>>
>> 8GB -> Standard heap size, and tends to run fine without any tuning
>>
>> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
>> cause churn(usually blockcache)
>>
>> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
>> and ZK timeouts
>>
>> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
>> ZK timeout a little higher
>>
>> 32GB -> We do have a couple people running this high, but the pain out
>> weighs the gains(IMHO)
>>
>> 64GB -> Let me know how it goes :)
>>
>>
>>
>>
>> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>
>>> I don't wish to be rude, but you are making odd claims as fact as
>>> "mentioned in a couple of posts". It will be difficult to have a serious
>>> conversation. I encourage you to test your hypotheses and let us know if
>> in
>>> fact there is a JVM "heap barrier" (and where it may be).
>>>
>>> On Monday, April 29, 2013, Asaf Mesika wrote:
>>>
>>>> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
>>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
>> of
>>>> analytics queries utilize memory, thus since its memory is shared with
>>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
>>>> Pheonix was implemented outside HBase on the same machine (like Drill
>> or
>>>> Impala is doing), you can have 60GB for this process, running many OLAP
>>>> queries in parallel, utilizing the same data set.
>>>>
>>>>
>>>>
>>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org
>>> <javascript:;>>
>>>> wrote:
>>>>
>>>>>> HBase is not really intended for heavy data crunching
>>>>>
>>>>> Yes it is. This is why we have first class MapReduce integration and
>>>>> optimized scanners.
>>>>>
>>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
>>>> OLAP.
>>>>>
>>>>> Urban Airship's Datacube is an example of a successful OLAP project
>>>>> implemented on HBase: http://github.com/urbanairship/datacube
>>>>>
>>>>> "Urban Airship uses the datacube project to support its analytics
>> stack
>>>> for
>>>>> mobile apps. We handle about ~10K events per second per node."
>>>>>
>>>>>
>>>>> Also there is Adobe's SaasBase:
>>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>>>>
>>>>> Etc.
>>>>>
>>>>> Where an HBase OLAP application will differ tremendously from a
>>>> traditional
>>>>> data warehouse is of course in the interface to the datastore. You
>> have
>>>> to
>>>>> design and speak in the language of the HBase API, though Phoenix (
>>>>> https://github.com/forcedotcom/phoenix) is changing that.
>>>>>
>>>>>
>>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com
>>> <javascript:;>
>>>>>
>>>>> wrote:
>>>>>
>>>>>> Hi Kiran,
>>>>>>
>>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
>>>> based
>>>>>> database meant for lookups or queries that expect response in
>>>>> milliseconds.
>>>>>> OLAP i.e. data warehouse usually involves heavy data crunching.
>> HBase
>>>> is
>>>>>> not really intended for heavy data crunching. If you want to just
>>> store
>>>>>> denoramlized data and do simple queries then HBase is good. For
>> OLAP
>>>> kind
>>>>>> of stuff, you can make HBase work but IMO you will be better off
>>> using
>>>>> Hive
>>>>>> for  data warehousing.
>>>>>>
>>>>>> HTH,
>>>>>> Anil Gupta
>>>>>>
>>>>>>
>>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
>>> <javascript:;>>
>>>> wrote:
>>>>>>
>>>>>>> But in HBase data can be said to be in  denormalised state as the
>>>>>>> methodology
>>>>>>> used for storage is a (column family:column) based flexible
>> schema
>>>>> .Also,
>>>>>>> from Google's  big table paper it is evident that HBase is
>> capable
>>> of
>>>>>> doing
>>>>>>> OLAP.SO where does the difference lie?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
>>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>>   - Andy
>>>>>
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>>>> (via Tom White)
>>>
>>>
>>> --
>>> Best regards,
>>>
>>>   - Andy
>>>
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>
>>
>>
>> --
>> Kevin O'Dell
>> Systems Engineer, Cloudera
>
>
>
> --
> Best regards,
>
>   - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: HBase and Datawarehouse

Posted by Andrew Purtell <ap...@apache.org>.

Rules of thumb for starting off safely and for easing support issues are
really good to have, but there are no hard barriers or singular approaches:
use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
multiple regionservers per host. It is going to depend on how the cluster
is used and loaded. If we are talking about coprocessors, then effective
limits are less clear, using a coprocessor to integrate an external process
implemented with native code communicating over memory mapped files in
/dev/shm isn't outside what is possible (strawman alert).


On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell <ke...@cloudera.com>wrote:

> Asaf,
>
>   The heap barrier is something of a legend :)  You can ask 10 different
> HBase committers what they think the max heap is and get 10 different
> answers.  This is my take on heap sizes from the many clusters I have dealt
> with:
>
> 8GB -> Standard heap size, and tends to run fine without any tuning
>
> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
> cause churn(usually blockcache)
>
> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
> and ZK timeouts
>
> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
> ZK timeout a little higher
>
> 32GB -> We do have a couple people running this high, but the pain out
> weighs the gains(IMHO)
>
> 64GB -> Let me know how it goes :)
>
>
>
>
> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > I don't wish to be rude, but you are making odd claims as fact as
> > "mentioned in a couple of posts". It will be difficult to have a serious
> > conversation. I encourage you to test your hypotheses and let us know if
> in
> > fact there is a JVM "heap barrier" (and where it may be).
> >
> > On Monday, April 29, 2013, Asaf Mesika wrote:
> >
> > > I think for Pheoenix truly to succeed, it's need HBase to break the JVM
> > > Heap barrier of 12G as I saw mentioned in couple of posts. since Lots
> of
> > > analytics queries utilize memory, thus since its memory is shared with
> > > HBase, there's so much you can do on 12GB heap. On the other hand, if
> > > Pheonix was implemented outside HBase on the same machine (like Drill
> or
> > > Impala is doing), you can have 60GB for this process, running many OLAP
> > > queries in parallel, utilizing the same data set.
> > >
> > >
> > >
> > > On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org
> > <javascript:;>>
> > > wrote:
> > >
> > > > > HBase is not really intended for heavy data crunching
> > > >
> > > > Yes it is. This is why we have first class MapReduce integration and
> > > > optimized scanners.
> > > >
> > > > Recent versions, like 0.94, also do pretty well with the 'O' part of
> > > OLAP.
> > > >
> > > > Urban Airship's Datacube is an example of a successful OLAP project
> > > > implemented on HBase: http://github.com/urbanairship/datacube
> > > >
> > > > "Urban Airship uses the datacube project to support its analytics
> stack
> > > for
> > > > mobile apps. We handle about ~10K events per second per node."
> > > >
> > > >
> > > > Also there is Adobe's SaasBase:
> > > > http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
> > > >
> > > > Etc.
> > > >
> > > > Where an HBase OLAP application will differ tremendously from a
> > > traditional
> > > > data warehouse is of course in the interface to the datastore. You
> have
> > > to
> > > > design and speak in the language of the HBase API, though Phoenix (
> > > > https://github.com/forcedotcom/phoenix) is changing that.
> > > >
> > > >
> > > > On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com
> > <javascript:;>
> > > >
> > > > wrote:
> > > >
> > > > > Hi Kiran,
> > > > >
> > > > > In HBase the data is denormalized but at the core HBase is KeyValue
> > > based
> > > > > database meant for lookups or queries that expect response in
> > > > milliseconds.
> > > > > OLAP i.e. data warehouse usually involves heavy data crunching.
> HBase
> > > is
> > > > > not really intended for heavy data crunching. If you want to just
> > store
> > > > > denoramlized data and do simple queries then HBase is good. For
> OLAP
> > > kind
> > > > > of stuff, you can make HBase work but IMO you will be better off
> > using
> > > > Hive
> > > > > for  data warehousing.
> > > > >
> > > > > HTH,
> > > > > Anil Gupta
> > > > >
> > > > >
> > > > > On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
> > <javascript:;>>
> > > wrote:
> > > > >
> > > > > > But in HBase data can be said to be in  denormalised state as the
> > > > > > methodology
> > > > > > used for storage is a (column family:column) based flexible
> schema
> > > > .Also,
> > > > > > from Google's  big table paper it is evident that HBase is
> capable
> > of
> > > > > doing
> > > > > > OLAP.SO where does the difference lie?
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > View this message in context:
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> > > > > > Sent from the HBase User mailing list archive at Nabble.com.
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
>
>
> --
> Kevin O'Dell
> Systems Engineer, Cloudera
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase and Datawarehouse

Posted by Michael Segel <mi...@hotmail.com>.

Tell me why your RS needs to be that large?  (> 8 GB. )

I think the answer is that it depends. Especially when you start to add in coprocessors. 
I'm not saying that there are not legitimate reasons, but that a lot of time, people just up the heap size without thinking about the problem.
To Kevin's point, when you exceed a certain point, you're going to need to really start to think about the tuning process. 

MSLABs is now on by default or so I am told. 

-Just because you can do something doesn't mean its a good idea. ;-) 

On Apr 30, 2013, at 7:01 AM, Kevin O'dell <ke...@cloudera.com> wrote:

> Asaf,
> 
>  The heap barrier is something of a legend :)  You can ask 10 different
> HBase committers what they think the max heap is and get 10 different
> answers.  This is my take on heap sizes from the many clusters I have dealt
> with:
> 
> 8GB -> Standard heap size, and tends to run fine without any tuning
> 
> 12GB -> Needs some TLC with regards to JVM tuning if your workload tends
> cause churn(usually blockcache)
> 
> 16GB -> GC tuning is a must, and now we need to start looking into MSLab
> and ZK timeouts
> 
> 20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
> ZK timeout a little higher
> 
> 32GB -> We do have a couple people running this high, but the pain out
> weighs the gains(IMHO)
> 
> 64GB -> Let me know how it goes :)
> 
> 
> 
> 
> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <ap...@apache.org> wrote:
> 
>> I don't wish to be rude, but you are making odd claims as fact as
>> "mentioned in a couple of posts". It will be difficult to have a serious
>> conversation. I encourage you to test your hypotheses and let us know if in
>> fact there is a JVM "heap barrier" (and where it may be).
>> 
>> On Monday, April 29, 2013, Asaf Mesika wrote:
>> 
>>> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
>>> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
>>> analytics queries utilize memory, thus since its memory is shared with
>>> HBase, there's so much you can do on 12GB heap. On the other hand, if
>>> Pheonix was implemented outside HBase on the same machine (like Drill or
>>> Impala is doing), you can have 60GB for this process, running many OLAP
>>> queries in parallel, utilizing the same data set.
>>> 
>>> 
>>> 
>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org
>> <javascript:;>>
>>> wrote:
>>> 
>>>>> HBase is not really intended for heavy data crunching
>>>> 
>>>> Yes it is. This is why we have first class MapReduce integration and
>>>> optimized scanners.
>>>> 
>>>> Recent versions, like 0.94, also do pretty well with the 'O' part of
>>> OLAP.
>>>> 
>>>> Urban Airship's Datacube is an example of a successful OLAP project
>>>> implemented on HBase: http://github.com/urbanairship/datacube
>>>> 
>>>> "Urban Airship uses the datacube project to support its analytics stack
>>> for
>>>> mobile apps. We handle about ~10K events per second per node."
>>>> 
>>>> 
>>>> Also there is Adobe's SaasBase:
>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>>> 
>>>> Etc.
>>>> 
>>>> Where an HBase OLAP application will differ tremendously from a
>>> traditional
>>>> data warehouse is of course in the interface to the datastore. You have
>>> to
>>>> design and speak in the language of the HBase API, though Phoenix (
>>>> https://github.com/forcedotcom/phoenix) is changing that.
>>>> 
>>>> 
>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com
>> <javascript:;>
>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Kiran,
>>>>> 
>>>>> In HBase the data is denormalized but at the core HBase is KeyValue
>>> based
>>>>> database meant for lookups or queries that expect response in
>>>> milliseconds.
>>>>> OLAP i.e. data warehouse usually involves heavy data crunching. HBase
>>> is
>>>>> not really intended for heavy data crunching. If you want to just
>> store
>>>>> denoramlized data and do simple queries then HBase is good. For OLAP
>>> kind
>>>>> of stuff, you can make HBase work but IMO you will be better off
>> using
>>>> Hive
>>>>> for  data warehousing.
>>>>> 
>>>>> HTH,
>>>>> Anil Gupta
>>>>> 
>>>>> 
>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
>> <javascript:;>>
>>> wrote:
>>>>> 
>>>>>> But in HBase data can be said to be in  denormalised state as the
>>>>>> methodology
>>>>>> used for storage is a (column family:column) based flexible schema
>>>> .Also,
>>>>>> from Google's  big table paper it is evident that HBase is capable
>> of
>>>>> doing
>>>>>> OLAP.SO where does the difference lie?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> View this message in context:
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>>   - Andy
>>>> 
>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>> (via Tom White)
>>>> 
>>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>> 
> 
> 
> 
> -- 
> Kevin O'Dell
> Systems Engineer, Cloudera

Re: HBase and Datawarehouse

Posted by Kevin O'dell <ke...@cloudera.com>.

Asaf,

  The heap barrier is something of a legend :)  You can ask 10 different
HBase committers what they think the max heap is and get 10 different
answers.  This is my take on heap sizes from the many clusters I have dealt
with:

8GB -> Standard heap size, and tends to run fine without any tuning

12GB -> Needs some TLC with regards to JVM tuning if your workload tends
cause churn(usually blockcache)

16GB -> GC tuning is a must, and now we need to start looking into MSLab
and ZK timeouts

20GB -> Same as 16GB in regards to tuning, but we tend to need to raise the
ZK timeout a little higher

32GB -> We do have a couple people running this high, but the pain out
weighs the gains(IMHO)

64GB -> Let me know how it goes :)




On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <ap...@apache.org> wrote:

> I don't wish to be rude, but you are making odd claims as fact as
> "mentioned in a couple of posts". It will be difficult to have a serious
> conversation. I encourage you to test your hypotheses and let us know if in
> fact there is a JVM "heap barrier" (and where it may be).
>
> On Monday, April 29, 2013, Asaf Mesika wrote:
>
> > I think for Pheoenix truly to succeed, it's need HBase to break the JVM
> > Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
> > analytics queries utilize memory, thus since its memory is shared with
> > HBase, there's so much you can do on 12GB heap. On the other hand, if
> > Pheonix was implemented outside HBase on the same machine (like Drill or
> > Impala is doing), you can have 60GB for this process, running many OLAP
> > queries in parallel, utilizing the same data set.
> >
> >
> >
> > On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org
> <javascript:;>>
> > wrote:
> >
> > > > HBase is not really intended for heavy data crunching
> > >
> > > Yes it is. This is why we have first class MapReduce integration and
> > > optimized scanners.
> > >
> > > Recent versions, like 0.94, also do pretty well with the 'O' part of
> > OLAP.
> > >
> > > Urban Airship's Datacube is an example of a successful OLAP project
> > > implemented on HBase: http://github.com/urbanairship/datacube
> > >
> > > "Urban Airship uses the datacube project to support its analytics stack
> > for
> > > mobile apps. We handle about ~10K events per second per node."
> > >
> > >
> > > Also there is Adobe's SaasBase:
> > > http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
> > >
> > > Etc.
> > >
> > > Where an HBase OLAP application will differ tremendously from a
> > traditional
> > > data warehouse is of course in the interface to the datastore. You have
> > to
> > > design and speak in the language of the HBase API, though Phoenix (
> > > https://github.com/forcedotcom/phoenix) is changing that.
> > >
> > >
> > > On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com
> <javascript:;>
> > >
> > > wrote:
> > >
> > > > Hi Kiran,
> > > >
> > > > In HBase the data is denormalized but at the core HBase is KeyValue
> > based
> > > > database meant for lookups or queries that expect response in
> > > milliseconds.
> > > > OLAP i.e. data warehouse usually involves heavy data crunching. HBase
> > is
> > > > not really intended for heavy data crunching. If you want to just
> store
> > > > denoramlized data and do simple queries then HBase is good. For OLAP
> > kind
> > > > of stuff, you can make HBase work but IMO you will be better off
> using
> > > Hive
> > > > for  data warehousing.
> > > >
> > > > HTH,
> > > > Anil Gupta
> > > >
> > > >
> > > > On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
> <javascript:;>>
> > wrote:
> > > >
> > > > > But in HBase data can be said to be in  denormalised state as the
> > > > > methodology
> > > > > used for storage is a (column family:column) based flexible schema
> > > .Also,
> > > > > from Google's  big table paper it is evident that HBase is capable
> of
> > > > doing
> > > > > OLAP.SO where does the difference lie?
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > View this message in context:
> > > > >
> > > >
> > >
> >
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> > > > > Sent from the HBase User mailing list archive at Nabble.com.
> > > > >
> > > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
Kevin O'Dell
Systems Engineer, Cloudera

Re: HBase and Datawarehouse

Posted by Andrew Purtell <ap...@apache.org>.

I don't wish to be rude, but you are making odd claims as fact as
"mentioned in a couple of posts". It will be difficult to have a serious
conversation. I encourage you to test your hypotheses and let us know if in
fact there is a JVM "heap barrier" (and where it may be).

On Monday, April 29, 2013, Asaf Mesika wrote:

> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
> analytics queries utilize memory, thus since its memory is shared with
> HBase, there's so much you can do on 12GB heap. On the other hand, if
> Pheonix was implemented outside HBase on the same machine (like Drill or
> Impala is doing), you can have 60GB for this process, running many OLAP
> queries in parallel, utilizing the same data set.
>
>
>
> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <apurtell@apache.org<javascript:;>>
> wrote:
>
> > > HBase is not really intended for heavy data crunching
> >
> > Yes it is. This is why we have first class MapReduce integration and
> > optimized scanners.
> >
> > Recent versions, like 0.94, also do pretty well with the 'O' part of
> OLAP.
> >
> > Urban Airship's Datacube is an example of a successful OLAP project
> > implemented on HBase: http://github.com/urbanairship/datacube
> >
> > "Urban Airship uses the datacube project to support its analytics stack
> for
> > mobile apps. We handle about ~10K events per second per node."
> >
> >
> > Also there is Adobe's SaasBase:
> > http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
> >
> > Etc.
> >
> > Where an HBase OLAP application will differ tremendously from a
> traditional
> > data warehouse is of course in the interface to the datastore. You have
> to
> > design and speak in the language of the HBase API, though Phoenix (
> > https://github.com/forcedotcom/phoenix) is changing that.
> >
> >
> > On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <anilgupta84@gmail.com<javascript:;>
> >
> > wrote:
> >
> > > Hi Kiran,
> > >
> > > In HBase the data is denormalized but at the core HBase is KeyValue
> based
> > > database meant for lookups or queries that expect response in
> > milliseconds.
> > > OLAP i.e. data warehouse usually involves heavy data crunching. HBase
> is
> > > not really intended for heavy data crunching. If you want to just store
> > > denoramlized data and do simple queries then HBase is good. For OLAP
> kind
> > > of stuff, you can make HBase work but IMO you will be better off using
> > Hive
> > > for  data warehousing.
> > >
> > > HTH,
> > > Anil Gupta
> > >
> > >
> > > On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com<javascript:;>>
> wrote:
> > >
> > > > But in HBase data can be said to be in  denormalised state as the
> > > > methodology
> > > > used for storage is a (column family:column) based flexible schema
> > .Also,
> > > > from Google's  big table paper it is evident that HBase is capable of
> > > doing
> > > > OLAP.SO where does the difference lie?
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> > > > Sent from the HBase User mailing list archive at Nabble.com.
> > > >
> > >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase and Datawarehouse

Posted by Viral Bajaria <vi...@gmail.com>.

On Mon, Apr 29, 2013 at 10:54 PM, Asaf Mesika <as...@gmail.com> wrote:

> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
> analytics queries utilize memory, thus since its memory is shared with
> HBase, there's so much you can do on 12GB heap. On the other hand, if
> Pheonix was implemented outside HBase on the same machine (like Drill or
> Impala is doing), you can have 60GB for this process, running many OLAP
> queries in parallel, utilizing the same data set.
>

Can you shed more info on 12GB heap barrier ?

-Viral

Re: HBase and Datawarehouse

Posted by James Taylor <jt...@salesforce.com>.

Phoenix will succeed if HBase succeeds. Phoenix just makes it easier to 
drive HBase to it's maximum capability. IMHO, if HBase is to make 
further gains in the OLAP space, scans need to be faster and new, more 
compressed columnar-store type block formats need to be developed.

Running inside HBase is what gives Phoenix most of its performance 
advantage. Have you seen our numbers against Impala: 
https://github.com/forcedotcom/phoenix/wiki/Performance? Drill will need 
something to efficiently execute a query plan against HBase and Phoenix 
is a good fit here.

Thanks,

James

On 04/29/2013 10:54 PM, Asaf Mesika wrote:
> I think for Pheoenix truly to succeed, it's need HBase to break the JVM
> Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
> analytics queries utilize memory, thus since its memory is shared with
> HBase, there's so much you can do on 12GB heap. On the other hand, if
> Pheonix was implemented outside HBase on the same machine (like Drill or
> Impala is doing), you can have 60GB for this process, running many OLAP
> queries in parallel, utilizing the same data set.
>
>
>
> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <ap...@apache.org> wrote:
>
>>> HBase is not really intended for heavy data crunching
>> Yes it is. This is why we have first class MapReduce integration and
>> optimized scanners.
>>
>> Recent versions, like 0.94, also do pretty well with the 'O' part of OLAP.
>>
>> Urban Airship's Datacube is an example of a successful OLAP project
>> implemented on HBase: http://github.com/urbanairship/datacube
>>
>> "Urban Airship uses the datacube project to support its analytics stack for
>> mobile apps. We handle about ~10K events per second per node."
>>
>>
>> Also there is Adobe's SaasBase:
>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>
>> Etc.
>>
>> Where an HBase OLAP application will differ tremendously from a traditional
>> data warehouse is of course in the interface to the datastore. You have to
>> design and speak in the language of the HBase API, though Phoenix (
>> https://github.com/forcedotcom/phoenix) is changing that.
>>
>>
>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <an...@gmail.com>
>> wrote:
>>
>>> Hi Kiran,
>>>
>>> In HBase the data is denormalized but at the core HBase is KeyValue based
>>> database meant for lookups or queries that expect response in
>> milliseconds.
>>> OLAP i.e. data warehouse usually involves heavy data crunching. HBase is
>>> not really intended for heavy data crunching. If you want to just store
>>> denoramlized data and do simple queries then HBase is good. For OLAP kind
>>> of stuff, you can make HBase work but IMO you will be better off using
>> Hive
>>> for  data warehousing.
>>>
>>> HTH,
>>> Anil Gupta
>>>
>>>
>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <ki...@gmail.com> wrote:
>>>
>>>> But in HBase data can be said to be in  denormalised state as the
>>>> methodology
>>>> used for storage is a (column family:column) based flexible schema
>> .Also,
>>>> from Google's  big table paper it is evident that HBase is capable of
>>> doing
>>>> OLAP.SO where does the difference lie?
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>
>> --
>> Best regards,
>>
>>     - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>

Re: HBase and Datawarehouse

Posted by Asaf Mesika <as...@gmail.com>.

I think for Pheoenix truly to succeed, it's need HBase to break the JVM
Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
analytics queries utilize memory, thus since its memory is shared with
HBase, there's so much you can do on 12GB heap. On the other hand, if
Pheonix was implemented outside HBase on the same machine (like Drill or
Impala is doing), you can have 60GB for this process, running many OLAP
queries in parallel, utilizing the same data set.



On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <ap...@apache.org> wrote:

> > HBase is not really intended for heavy data crunching
>
> Yes it is. This is why we have first class MapReduce integration and
> optimized scanners.
>
> Recent versions, like 0.94, also do pretty well with the 'O' part of OLAP.
>
> Urban Airship's Datacube is an example of a successful OLAP project
> implemented on HBase: http://github.com/urbanairship/datacube
>
> "Urban Airship uses the datacube project to support its analytics stack for
> mobile apps. We handle about ~10K events per second per node."
>
>
> Also there is Adobe's SaasBase:
> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>
> Etc.
>
> Where an HBase OLAP application will differ tremendously from a traditional
> data warehouse is of course in the interface to the datastore. You have to
> design and speak in the language of the HBase API, though Phoenix (
> https://github.com/forcedotcom/phoenix) is changing that.
>
>
> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <an...@gmail.com>
> wrote:
>
> > Hi Kiran,
> >
> > In HBase the data is denormalized but at the core HBase is KeyValue based
> > database meant for lookups or queries that expect response in
> milliseconds.
> > OLAP i.e. data warehouse usually involves heavy data crunching. HBase is
> > not really intended for heavy data crunching. If you want to just store
> > denoramlized data and do simple queries then HBase is good. For OLAP kind
> > of stuff, you can make HBase work but IMO you will be better off using
> Hive
> > for  data warehousing.
> >
> > HTH,
> > Anil Gupta
> >
> >
> > On Sun, Apr 28, 2013 at 8:39 PM, Kiran <ki...@gmail.com> wrote:
> >
> > > But in HBase data can be said to be in  denormalised state as the
> > > methodology
> > > used for storage is a (column family:column) based flexible schema
> .Also,
> > > from Google's  big table paper it is evident that HBase is capable of
> > doing
> > > OLAP.SO where does the difference lie?
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> > > Sent from the HBase User mailing list archive at Nabble.com.
> > >
> >
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: HBase and Datawarehouse

Posted by Andrew Purtell <ap...@apache.org>.

> HBase is not really intended for heavy data crunching

Yes it is. This is why we have first class MapReduce integration and
optimized scanners.

Recent versions, like 0.94, also do pretty well with the 'O' part of OLAP.

Urban Airship's Datacube is an example of a successful OLAP project
implemented on HBase: http://github.com/urbanairship/datacube

"Urban Airship uses the datacube project to support its analytics stack for
mobile apps. We handle about ~10K events per second per node."

Also there is Adobe's SaasBase:
http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe

Etc.

Where an HBase OLAP application will differ tremendously from a traditional
data warehouse is of course in the interface to the datastore. You have to
design and speak in the language of the HBase API, though Phoenix (
https://github.com/forcedotcom/phoenix) is changing that.

On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <an...@gmail.com> wrote:

> Hi Kiran,
>
> In HBase the data is denormalized but at the core HBase is KeyValue based
> database meant for lookups or queries that expect response in milliseconds.
> OLAP i.e. data warehouse usually involves heavy data crunching. HBase is
> not really intended for heavy data crunching. If you want to just store
> denoramlized data and do simple queries then HBase is good. For OLAP kind
> of stuff, you can make HBase work but IMO you will be better off using Hive
> for  data warehousing.
>
> HTH,
> Anil Gupta
>
>
> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <ki...@gmail.com> wrote:
>
> > But in HBase data can be said to be in  denormalised state as the
> > methodology
> > used for storage is a (column family:column) based flexible schema .Also,
> > from Google's  big table paper it is evident that HBase is capable of
> doing
> > OLAP.SO where does the difference lie?
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase and Datawarehouse

Posted by Mohammad Tariq <do...@gmail.com>.

Sorry for the late response. I totally agree with Anil. If you have
warehousing needs, I would also suggest Hive. You could easily map your
Hbase tables to your Hive tables and crunch crunch the data. It would save
you from writing lengthy and tedious MR jobs. And as Anil has said Pig is
another good choice, if you have to do lot of transformations on your data.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Mon, Apr 29, 2013 at 10:30 PM, anil gupta <an...@gmail.com> wrote:

> Inline.
>
> On Sun, Apr 28, 2013 at 10:40 PM, Kiran <ki...@gmail.com> wrote:
>
> > Anil,
> >
> > So it means HBase can help in easy retrieval and insertions on large
> > volumes
> > of data but it lacks the power to analyse and summarize the data?
>
> Out of the box, it can do simple aggregations like sum, avg, etc. But, if
> you have complex analytical queries(lead, lag rolling aggregates) then you
> can write your Coprocessor for doing those analytical queries.
>
> > In HBase
> > can't we write Map-Reduce jobs that can do this "data cunching"?
>
> Yes you can do. But, if you are only going to do MR then why use HBase for
> storing data?
>
> > As per your
> > analysis isn't that a feasible approach than the data warehousing
> systems?
> >
> I dont know your use case in detail so cant say whether it will work for
> you or not. But, theoretically it is feasible. Have you evaluated hive/pig
> for data warehousing?
>
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043220.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: HBase and Datawarehouse

Posted by anil gupta <an...@gmail.com>.

Inline.

On Sun, Apr 28, 2013 at 10:40 PM, Kiran <ki...@gmail.com> wrote:

> Anil,
>
> So it means HBase can help in easy retrieval and insertions on large
> volumes
> of data but it lacks the power to analyse and summarize the data?

Out of the box, it can do simple aggregations like sum, avg, etc. But, if
you have complex analytical queries(lead, lag rolling aggregates) then you
can write your Coprocessor for doing those analytical queries.

> In HBase
> can't we write Map-Reduce jobs that can do this "data cunching"?

Yes you can do. But, if you are only going to do MR then why use HBase for
storing data?

> As per your
> analysis isn't that a feasible approach than the data warehousing systems?
>
I dont know your use case in detail so cant say whether it will work for
you or not. But, theoretically it is feasible. Have you evaluated hive/pig
for data warehousing?

>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043220.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

-- 
Thanks & Regards,
Anil Gupta

Re: HBase and Datawarehouse

Posted by Kiran <ki...@gmail.com>.

Anil,

So it means HBase can help in easy retrieval and insertions on large volumes
of data but it lacks the power to analyse and summarize the data?In HBase 
can't we write Map-Reduce jobs that can do this "data cunching"?As per your
analysis isn't that a feasible approach than the data warehousing systems?



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043220.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase and Datawarehouse

Posted by anil gupta <an...@gmail.com>.

Hi Kiran,

In HBase the data is denormalized but at the core HBase is KeyValue based
database meant for lookups or queries that expect response in milliseconds.
OLAP i.e. data warehouse usually involves heavy data crunching. HBase is
not really intended for heavy data crunching. If you want to just store
denoramlized data and do simple queries then HBase is good. For OLAP kind
of stuff, you can make HBase work but IMO you will be better off using Hive
for  data warehousing.

HTH,
Anil Gupta

On Sun, Apr 28, 2013 at 8:39 PM, Kiran <ki...@gmail.com> wrote:

> But in HBase data can be said to be in  denormalised state as the
> methodology
> used for storage is a (column family:column) based flexible schema .Also,
> from Google's  big table paper it is evident that HBase is capable of doing
> OLAP.SO where does the difference lie?
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

-- 
Thanks & Regards,
Anil Gupta

Re: HBase and Datawarehouse

Posted by Kiran <ki...@gmail.com>.

But in HBase data can be said to be in  denormalised state as the methodology
used for storage is a (column family:column) based flexible schema .Also,
from Google's  big table paper it is evident that HBase is capable of doing
OLAP.SO where does the difference lie?



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: HBase and Datawarehouse

Posted by Mohammad Tariq <do...@gmail.com>.

*Database(SQL, NoSQL, Graph, Document etc etc)*

   1. Used for Online Transactional Processing (OLTP) but can be used for
   other purposes such as Data Warehousing. This records the data from the
   user for history.
   2. The tables and joins are complex since they are normalized (for
   RDMS). This is done to reduce redundant data and to save storage space.
   3. Entity – Relational modeling techniques are used for RDMS database
   design.
   4. Optimized for write operation.
   5. Performance is low for analysis queries.

*Warehouse*

   1. Used for Online Analytical Processing (OLAP). This reads the
   historical data for the Users for business decisions.
   2. The Tables and joins are simple since they are de-normalized. This is
   done to reduce the response time for analytical queries.
   3. Data – Modeling techniques are used for the Data Warehouse design.
   4. Optimized for read operations.
   5. High performance for analytical queries.
   6. Is *usually* a Database.

                                                                HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com

On Sun, Apr 28, 2013 at 10:30 PM, shashwat shriparv <
dwivedishashwat@gmail.com> wrote:

> HBase is a nosql database
>
> *Thanks & Regards    *
>
> ∞
> Shashwat Shriparv
>
>
>
> On Sun, Apr 28, 2013 at 4:42 PM, Em <ma...@yahoo.de> wrote:
>
> > I suggest you to read Google's BigTable-paper to understand the
> > differences.
> >
> > Am 28.04.2013 05:12, schrieb Kiran:
> > > What is the difference between a NoSQL database like HBase and a data
> > > warehouse? Doesn’t both store data from disparate sources and formats?
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172.html
> > > Sent from the HBase User mailing list archive at Nabble.com.
> > >
> >
>

Re: HBase and Datawarehouse

Posted by shashwat shriparv <dw...@gmail.com>.

HBase is a nosql database

*Thanks & Regards    *

∞
Shashwat Shriparv



On Sun, Apr 28, 2013 at 4:42 PM, Em <ma...@yahoo.de> wrote:

> I suggest you to read Google's BigTable-paper to understand the
> differences.
>
> Am 28.04.2013 05:12, schrieb Kiran:
> > What is the difference between a NoSQL database like HBase and a data
> > warehouse? Doesn’t both store data from disparate sources and formats?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>

Re: HBase and Datawarehouse

Posted by Em <ma...@yahoo.de>.

I suggest you to read Google's BigTable-paper to understand the differences.

Am 28.04.2013 05:12, schrieb Kiran:
> What is the difference between a NoSQL database like HBase and a data
> warehouse? Doesn’t both store data from disparate sources and formats?
> 
> 
> 
> --
> View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172.html
> Sent from the HBase User mailing list archive at Nabble.com.
>