You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by Stephen Boesch <ja...@gmail.com> on 2015/07/13 02:08:29 UTC

Using a different K-V store than HBase

HBase/Zookeeper is a heavy/complex stack when considering small-scale
development and testing .  The Mini HBase cluster is tricky to configure
and consumes a fair amount of memory.  Zookeeper suffers from timeout
issues that complicate debugging cycles.   Region server management also
complicates testing  It may be preferable to have an option to avoid these
considerations altogether when working on/developing portions of logic that
do not interface directly with the indexing and metadata logic.

In the eBay blog there is a single sentence mentioning it "may" be possible
to use a different K-V backend than HBase:

http://www.ebaytech
blog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/



   - *Storage Engine: *This engine manages the underlying storage –
   specifically the cuboids, which are stored as key-value pairs. The Storage
   Engine uses HBase ... *Kylin can also be extended to support other K-V
   systems, such as Redis <http://redis.io/>.*

Is there any documentation on how that extension may be achieved?  A
pluggable interface?  I would for example like to see Cassandra as a
drop-in replacement for HBase.

Thanks

stephenb

Re: Using a different K-V store than HBase

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Jul 15, 2015 at 1:47 AM, Li Yang <li...@apache.org> wrote:

> Kylin has clearly defined interface between query module and storage module
> -- IStorageEngine. Somehow so far the only implementation is HBase.
>

Well, I have reports from MapR customers that Kylin works on MapR DB.  But
that is because MapR DB supports the HBase API.  You have to tell Kylin not
to use the co-processor, but that doesn't seem terribly critical.

Re: Using a different K-V store than HBase

Posted by Luke Han <lu...@gmail.com>.

For term "heavy/complex stack", I have different opinion, Kylin, from the
begging is designed to serve TB to PB level dataset and queries, and scale
out for high concurrency, for that scenario, such complex is worth to do.
Distribution + Big Data is not simple world we could live with;-)

But I agree with you about development cycle part, that's why we are
introduce mini cluster, and also off-hadoop-cli installation guide for
people to develop on own laptop but can share one or more dev cluster even
just VM.

And, please refer to Yang's recent thread for
https://issues.apache.org/jira/browse/KYLIN-875 about more generic storage
interface.


Thanks.


Best Regards!
---------------------

Luke Han

On Wed, Jul 15, 2015 at 4:47 PM, Li Yang <li...@apache.org> wrote:

> For dev/test purpose, I personally uses HDP 2.2 on a single node sandbox
> with 8GM memory. Not lightweight at all, but works quite well.
>
> Kylin has clearly defined interface between query module and storage module
> -- IStorageEngine. Somehow so far the only implementation is HBase.
>
>
>
> On Mon, Jul 13, 2015 at 8:08 AM, Stephen Boesch <ja...@gmail.com> wrote:
>
> > HBase/Zookeeper is a heavy/complex stack when considering small-scale
> > development and testing .  The Mini HBase cluster is tricky to configure
> > and consumes a fair amount of memory.  Zookeeper suffers from timeout
> > issues that complicate debugging cycles.   Region server management also
> > complicates testing  It may be preferable to have an option to avoid
> these
> > considerations altogether when working on/developing portions of logic
> that
> > do not interface directly with the indexing and metadata logic.
> >
> > In the eBay blog there is a single sentence mentioning it "may" be
> possible
> > to use a different K-V backend than HBase:
> >
> > http://www.ebaytech
> > blog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/
> >
> >
> >
> >    - *Storage Engine: *This engine manages the underlying storage –
> >    specifically the cuboids, which are stored as key-value pairs. The
> > Storage
> >    Engine uses HBase ... *Kylin can also be extended to support other K-V
> >    systems, such as Redis <http://redis.io/>.*
> >
> > Is there any documentation on how that extension may be achieved?  A
> > pluggable interface?  I would for example like to see Cassandra as a
> > drop-in replacement for HBase.
> >
> > Thanks
> >
> > stephenb
> >
>

Re: Using a different K-V store than HBase

Posted by Li Yang <li...@apache.org>.

For dev/test purpose, I personally uses HDP 2.2 on a single node sandbox
with 8GM memory. Not lightweight at all, but works quite well.

Kylin has clearly defined interface between query module and storage module
-- IStorageEngine. Somehow so far the only implementation is HBase.



On Mon, Jul 13, 2015 at 8:08 AM, Stephen Boesch <ja...@gmail.com> wrote:

> HBase/Zookeeper is a heavy/complex stack when considering small-scale
> development and testing .  The Mini HBase cluster is tricky to configure
> and consumes a fair amount of memory.  Zookeeper suffers from timeout
> issues that complicate debugging cycles.   Region server management also
> complicates testing  It may be preferable to have an option to avoid these
> considerations altogether when working on/developing portions of logic that
> do not interface directly with the indexing and metadata logic.
>
> In the eBay blog there is a single sentence mentioning it "may" be possible
> to use a different K-V backend than HBase:
>
> http://www.ebaytech
> blog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/
>
>
>
>    - *Storage Engine: *This engine manages the underlying storage –
>    specifically the cuboids, which are stored as key-value pairs. The
> Storage
>    Engine uses HBase ... *Kylin can also be extended to support other K-V
>    systems, such as Redis <http://redis.io/>.*
>
> Is there any documentation on how that extension may be achieved?  A
> pluggable interface?  I would for example like to see Cassandra as a
> drop-in replacement for HBase.
>
> Thanks
>
> stephenb
>

Re: Using a different K-V store than HBase

Posted by Li Yang <li...@apache.org>.

With KYLIN-875 <https://issues.apache.org/jira/browse/KYLIN-875>, the
storage module is becoming plugin-able. By implementing a few interfaces,
you can port Kylin to run on another storage (other than HBase). It's still
lack of documentation. So give me a ping if anyone is interested to
implement a different storage. I'll be glad to share more details.

On Tue, Jul 28, 2015 at 12:49 AM, Ted Dunning <te...@gmail.com> wrote:

> On Sun, Jul 26, 2015 at 7:39 PM, hongbin ma <ma...@apache.org> wrote:
>
> > Again, what benefit will MapR DB brings us if we choose to use it?
> > performance? stability?
> >
>
> Conversations about MapR DB directly should be off of this list. This is
> the Kylin mailing list and talk here should center on that. Other software
> can have relevance, but only in the context of Kylin itself.
>
> I will contact the poster off-list to follow up.
>

Re: Using a different K-V store than HBase

Posted by Ted Dunning <te...@gmail.com>.

On Sun, Jul 26, 2015 at 7:39 PM, hongbin ma <ma...@apache.org> wrote:

> Again, what benefit will MapR DB brings us if we choose to use it?
> performance? stability?
>

Conversations about MapR DB directly should be off of this list. This is
the Kylin mailing list and talk here should center on that. Other software
can have relevance, but only in the context of Kylin itself.

I will contact the poster off-list to follow up.

Re: Using a different K-V store than HBase

Posted by hongbin ma <ma...@apache.org>.

Yang is in his summer vacation now.
Again, what benefit will MapR DB brings us if we choose to use it?
performance? stability?

On Sat, Jul 25, 2015 at 12:00 AM, Ted Dunning <te...@gmail.com> wrote:

> On Thu, Jul 23, 2015 at 6:45 PM, Li Yang <li...@apache.org> wrote:
>
> > I heard concerns on HBase from time to time too, but often from a
> stability
> > point of view.
> >
> > So what do you guys recommend if to replace HBase and why?
> >
> > On Fri, Jul 24, 2015 at 9:24 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > That somewhat depends on what you mean by something else.
> > >
> > > For MapR-DB, the only difference is lack of support for co-processors.
> > That
> > > means that switching from HBase is pretty easy.
> > >
> >
>
> I am heavily biased in this matter.  My red hat labels me as biased and it
> says "MapR".
>
> I think that MapR DB is an excellent choice as long as the co-processor
> lack doesn't cause you problems.
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Using a different K-V store than HBase

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Jul 23, 2015 at 6:45 PM, Li Yang <li...@apache.org> wrote:

> I heard concerns on HBase from time to time too, but often from a stability
> point of view.
>
> So what do you guys recommend if to replace HBase and why?
>
> On Fri, Jul 24, 2015 at 9:24 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > That somewhat depends on what you mean by something else.
> >
> > For MapR-DB, the only difference is lack of support for co-processors.
> That
> > means that switching from HBase is pretty easy.
> >
>

I am heavily biased in this matter.  My red hat labels me as biased and it
says "MapR".

I think that MapR DB is an excellent choice as long as the co-processor
lack doesn't cause you problems.

Re: Using a different K-V store than HBase

Posted by Li Yang <li...@apache.org>.

I heard concerns on HBase from time to time too, but often from a stability
point of view.

So what do you guys recommend if to replace HBase and why?

On Fri, Jul 24, 2015 at 9:24 AM, Ted Dunning <te...@gmail.com> wrote:

> That somewhat depends on what you mean by something else.
>
> For MapR-DB, the only difference is lack of support for co-processors. That
> means that switching from HBase is pretty easy.
>
> For a non-HBase key-value store that is not HBase API compatible, you will
> have a much bigger job ahead of you.  In particular, Kylin depends
> critically on having efficient range scans which is a rare design point for
> kv stores.  That means you probably won't be able to get usable performance
> from any system that doesn't support efficient key-order range scans.
>
> What KV store are you thinking of using?
>
>
>
>
> On Thu, Jul 23, 2015 at 6:07 PM, Stephen Boesch <ja...@gmail.com> wrote:
>
> > I am inquiring if anyone has had any thoughts on this - and also where is
> > the documentation on how to plugin a different key-value storage engine
> > than HBase?
> >
> > thanks!
> >
> >
> > 2015-07-12 17:08 GMT-07:00 Stephen Boesch <ja...@gmail.com>:
> >
> > >
> > > HBase/Zookeeper is a heavy/complex stack when considering small-scale
> > > development and testing .  The Mini HBase cluster is tricky to
> configure
> > > and consumes a fair amount of memory.  Zookeeper suffers from timeout
> > > issues that complicate debugging cycles.   Region server management
> also
> > > complicates testing  It may be preferable to have an option to avoid
> > these
> > > considerations altogether when working on/developing portions of logic
> > that
> > > do not interface directly with the indexing and metadata logic.
> > >
> > > In the eBay blog there is a single sentence mentioning it "may" be
> > > possible to use a different K-V backend than HBase:
> > >
> > > http://www.ebaytech
> > > blog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/
> > >
> > >
> > >
> > >    - *Storage Engine: *This engine manages the underlying storage –
> > >    specifically the cuboids, which are stored as key-value pairs. The
> > Storage
> > >    Engine uses HBase ... *Kylin can also be extended to support other
> K-V
> > >    systems, such as Redis <http://redis.io/>.*
> > >
> > > Is there any documentation on how that extension may be achieved?  A
> > > pluggable interface?  I would for example like to see Cassandra as a
> > > drop-in replacement for HBase.
> > >
> > > Thanks
> > >
> > > stephenb
> > >
> >
>

Re: Using a different K-V store than HBase

Posted by Ted Dunning <te...@gmail.com>.

That somewhat depends on what you mean by something else.

For MapR-DB, the only difference is lack of support for co-processors. That
means that switching from HBase is pretty easy.

For a non-HBase key-value store that is not HBase API compatible, you will
have a much bigger job ahead of you.  In particular, Kylin depends
critically on having efficient range scans which is a rare design point for
kv stores.  That means you probably won't be able to get usable performance
from any system that doesn't support efficient key-order range scans.

What KV store are you thinking of using?




On Thu, Jul 23, 2015 at 6:07 PM, Stephen Boesch <ja...@gmail.com> wrote:

> I am inquiring if anyone has had any thoughts on this - and also where is
> the documentation on how to plugin a different key-value storage engine
> than HBase?
>
> thanks!
>
>
> 2015-07-12 17:08 GMT-07:00 Stephen Boesch <ja...@gmail.com>:
>
> >
> > HBase/Zookeeper is a heavy/complex stack when considering small-scale
> > development and testing .  The Mini HBase cluster is tricky to configure
> > and consumes a fair amount of memory.  Zookeeper suffers from timeout
> > issues that complicate debugging cycles.   Region server management also
> > complicates testing  It may be preferable to have an option to avoid
> these
> > considerations altogether when working on/developing portions of logic
> that
> > do not interface directly with the indexing and metadata logic.
> >
> > In the eBay blog there is a single sentence mentioning it "may" be
> > possible to use a different K-V backend than HBase:
> >
> > http://www.ebaytech
> > blog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/
> >
> >
> >
> >    - *Storage Engine: *This engine manages the underlying storage –
> >    specifically the cuboids, which are stored as key-value pairs. The
> Storage
> >    Engine uses HBase ... *Kylin can also be extended to support other K-V
> >    systems, such as Redis <http://redis.io/>.*
> >
> > Is there any documentation on how that extension may be achieved?  A
> > pluggable interface?  I would for example like to see Cassandra as a
> > drop-in replacement for HBase.
> >
> > Thanks
> >
> > stephenb
> >
>

Re: Using a different K-V store than HBase

Posted by Stephen Boesch <ja...@gmail.com>.

I am inquiring if anyone has had any thoughts on this - and also where is
the documentation on how to plugin a different key-value storage engine
than HBase?

thanks!


2015-07-12 17:08 GMT-07:00 Stephen Boesch <ja...@gmail.com>:

>
> HBase/Zookeeper is a heavy/complex stack when considering small-scale
> development and testing .  The Mini HBase cluster is tricky to configure
> and consumes a fair amount of memory.  Zookeeper suffers from timeout
> issues that complicate debugging cycles.   Region server management also
> complicates testing  It may be preferable to have an option to avoid these
> considerations altogether when working on/developing portions of logic that
> do not interface directly with the indexing and metadata logic.
>
> In the eBay blog there is a single sentence mentioning it "may" be
> possible to use a different K-V backend than HBase:
>
> http://www.ebaytech
> blog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/
>
>
>
>    - *Storage Engine: *This engine manages the underlying storage –
>    specifically the cuboids, which are stored as key-value pairs. The Storage
>    Engine uses HBase ... *Kylin can also be extended to support other K-V
>    systems, such as Redis <http://redis.io/>.*
>
> Is there any documentation on how that extension may be achieved?  A
> pluggable interface?  I would for example like to see Cassandra as a
> drop-in replacement for HBase.
>
> Thanks
>
> stephenb
>