You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Yung-An He <ma...@gmail.com> on 2017/09/27 14:27:28 UTC

[Proposal] Supporting Query Langeage In HBase

Hi folks,

Currently, HBase hasn’t support SQL syntax yet. Many users who are familiar
with SQL syntax can only query data from HBase via Hive, Impala, Phoenix or
other tools that support SQL syntax. However, some tools are too
complicated to install and need to restart the HBase cluster. Additionally,
there may be side effects that affect raw data. If HBase has a native SQL
querying module which is easy to install without restarting the cluster as
well as not affecting the original data, it would be extremely helpful and
handy for users that are already familiar with SQL syntax.

In HareDB, we have implemented HareQL, and hope that we can contribute some
piece as a module to HBase. See attached document for more details.

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Sean Busbey <bu...@apache.org>.
How about a comparison to the current integration work with SparkSQL?
Wouldn't a shorter path to a sql REPL loop be finishing that work?

On Sep 28, 2017 01:53, "Yung-An He" <ma...@gmail.com> wrote:

> Thanks for comments and suggestions,
>
> I add a section about the comparison of our solution and Phoenix in the
> proposal.
> And Here is the link:
> https://drive.google.com/file/d/0Bw6_ESGWcwIqVVgybW5OaE43bzg/view
>
>
> Below are some differences between HBaseQL and Phoenix:
>
> 1. In some scenarios, users could not and would not restart the HBase when
> installing third-party tools.
> 2. In Phoenix, when creating a table, it will create a real HBase table.
> And it would also drop HBase table
> when executing DROP TABLE. In HBaseQL, no matter what the above situation
> is, it will not create or
> delete native HBase tables.
>
> 3. If you want to map an existing HBase table to Phoenix, then you have to
> create a View in the Phoenix and
> that View will be read-only. We can only see the data as well as we can not
> modify or insert new data
> through Phoenix. In HBaseQL, when you create a table, it means mapping an
> existing HBase table instead
> of creating a new one. And you can not only see the data but also modify or
> insert new data.
>
> 4. HBaseQL is designed to make HBase handy for users that are already
> familiar with SQL syntax and does
> not affect the raw data. Based on this, it only supports single table query
> to avoid excessive consumption of
> resources.
> ‌
>
> 2017-09-28 11:14 GMT+08:00 Chia-Ping Tsai <ch...@apache.org>:
>
> > Could you share the comparison of your solution and Phoenix? I feel you
> > will contribute a significant  improvement (and a chunk of code) to
> hbase,
> > so we must understand what benefit HBaseQL can bring to hbase.
> >
> >
> > On 2017-09-27 22:27, Yung-An He <ma...@gmail.com> wrote:
> > > Hi folks,
> > >
> > > Currently, HBase hasn’t support SQL syntax yet. Many users who are
> > familiar
> > > with SQL syntax can only query data from HBase via Hive, Impala,
> Phoenix
> > or
> > > other tools that support SQL syntax. However, some tools are too
> > > complicated to install and need to restart the HBase cluster.
> > Additionally,
> > > there may be side effects that affect raw data. If HBase has a native
> SQL
> > > querying module which is easy to install without restarting the cluster
> > as
> > > well as not affecting the original data, it would be extremely helpful
> > and
> > > handy for users that are already familiar with SQL syntax.
> > >
> > > In HareDB, we have implemented HareQL, and hope that we can contribute
> > some
> > > piece as a module to HBase. See attached document for more details.
> > >
> >
>

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Yung-An He <ma...@gmail.com>.
Thanks for comments and suggestions,

I add a section about the comparison of our solution and Phoenix in the
proposal.
And Here is the link:
https://drive.google.com/file/d/0Bw6_ESGWcwIqVVgybW5OaE43bzg/view


Below are some differences between HBaseQL and Phoenix:

1. In some scenarios, users could not and would not restart the HBase when
installing third-party tools.
2. In Phoenix, when creating a table, it will create a real HBase table.
And it would also drop HBase table
when executing DROP TABLE. In HBaseQL, no matter what the above situation
is, it will not create or
delete native HBase tables.

3. If you want to map an existing HBase table to Phoenix, then you have to
create a View in the Phoenix and
that View will be read-only. We can only see the data as well as we can not
modify or insert new data
through Phoenix. In HBaseQL, when you create a table, it means mapping an
existing HBase table instead
of creating a new one. And you can not only see the data but also modify or
insert new data.

4. HBaseQL is designed to make HBase handy for users that are already
familiar with SQL syntax and does
not affect the raw data. Based on this, it only supports single table query
to avoid excessive consumption of
resources.
‌

2017-09-28 11:14 GMT+08:00 Chia-Ping Tsai <ch...@apache.org>:

> Could you share the comparison of your solution and Phoenix? I feel you
> will contribute a significant  improvement (and a chunk of code) to hbase,
> so we must understand what benefit HBaseQL can bring to hbase.
>
>
> On 2017-09-27 22:27, Yung-An He <ma...@gmail.com> wrote:
> > Hi folks,
> >
> > Currently, HBase hasn’t support SQL syntax yet. Many users who are
> familiar
> > with SQL syntax can only query data from HBase via Hive, Impala, Phoenix
> or
> > other tools that support SQL syntax. However, some tools are too
> > complicated to install and need to restart the HBase cluster.
> Additionally,
> > there may be side effects that affect raw data. If HBase has a native SQL
> > querying module which is easy to install without restarting the cluster
> as
> > well as not affecting the original data, it would be extremely helpful
> and
> > handy for users that are already familiar with SQL syntax.
> >
> > In HareDB, we have implemented HareQL, and hope that we can contribute
> some
> > piece as a module to HBase. See attached document for more details.
> >
>

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Chia-Ping Tsai <ch...@apache.org>.
Could you share the comparison of your solution and Phoenix? I feel you will contribute a significant  improvement (and a chunk of code) to hbase, so we must understand what benefit HBaseQL can bring to hbase.


On 2017-09-27 22:27, Yung-An He <ma...@gmail.com> wrote: 
> Hi folks,
> 
> Currently, HBase hasn’t support SQL syntax yet. Many users who are familiar
> with SQL syntax can only query data from HBase via Hive, Impala, Phoenix or
> other tools that support SQL syntax. However, some tools are too
> complicated to install and need to restart the HBase cluster. Additionally,
> there may be side effects that affect raw data. If HBase has a native SQL
> querying module which is easy to install without restarting the cluster as
> well as not affecting the original data, it would be extremely helpful and
> handy for users that are already familiar with SQL syntax.
> 
> In HareDB, we have implemented HareQL, and hope that we can contribute some
> piece as a module to HBase. See attached document for more details.
> 

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Yung-An He <ma...@gmail.com>.
It seems something wrong with googledoc. The link is available now.

2017-09-27 23:03 GMT+08:00 Ted Yu <yu...@gmail.com>:

> When I clicked on the link, I was shown 'There was a problem previewing
> this document'
>
> I clicked the download button but looks like the pdf downloaded is 0 byte.
>
> Can you double check ?
>
> On Wed, Sep 27, 2017 at 7:56 AM, Yung-An He <ma...@gmail.com> wrote:
>
> > Thank you, Ted.
> > It's my bad!
> >
> > Here is the link:
> > https://drive.google.com/file/d/0Bw6_ESGWcwIqYWpRWkFtTFBIQms/view?
> > usp=sharing
> >
> > 2017-09-27 22:34 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > > Document didn't come through.
> > >
> > > Consider using googledoc, etc
> > >
> > > On Wed, Sep 27, 2017 at 7:27 AM, Yung-An He <ma...@gmail.com>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Currently, HBase hasn’t support SQL syntax yet. Many users who are
> > > > familiar with SQL syntax can only query data from HBase via Hive,
> > Impala,
> > > > Phoenix or other tools that support SQL syntax. However, some tools
> are
> > > too
> > > > complicated to install and need to restart the HBase cluster.
> > > Additionally,
> > > > there may be side effects that affect raw data. If HBase has a native
> > SQL
> > > > querying module which is easy to install without restarting the
> cluster
> > > as
> > > > well as not affecting the original data, it would be extremely
> helpful
> > > and
> > > > handy for users that are already familiar with SQL syntax.
> > > >
> > > > In HareDB, we have implemented HareQL, and hope that we can
> contribute
> > > > some piece as a module to HBase. See attached document for more
> > details.
> > > >
> > >
> >
>

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Ted Yu <yu...@gmail.com>.
When I clicked on the link, I was shown 'There was a problem previewing
this document'

I clicked the download button but looks like the pdf downloaded is 0 byte.

Can you double check ?

On Wed, Sep 27, 2017 at 7:56 AM, Yung-An He <ma...@gmail.com> wrote:

> Thank you, Ted.
> It's my bad!
>
> Here is the link:
> https://drive.google.com/file/d/0Bw6_ESGWcwIqYWpRWkFtTFBIQms/view?
> usp=sharing
>
> 2017-09-27 22:34 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > Document didn't come through.
> >
> > Consider using googledoc, etc
> >
> > On Wed, Sep 27, 2017 at 7:27 AM, Yung-An He <ma...@gmail.com> wrote:
> >
> > > Hi folks,
> > >
> > > Currently, HBase hasn’t support SQL syntax yet. Many users who are
> > > familiar with SQL syntax can only query data from HBase via Hive,
> Impala,
> > > Phoenix or other tools that support SQL syntax. However, some tools are
> > too
> > > complicated to install and need to restart the HBase cluster.
> > Additionally,
> > > there may be side effects that affect raw data. If HBase has a native
> SQL
> > > querying module which is easy to install without restarting the cluster
> > as
> > > well as not affecting the original data, it would be extremely helpful
> > and
> > > handy for users that are already familiar with SQL syntax.
> > >
> > > In HareDB, we have implemented HareQL, and hope that we can contribute
> > > some piece as a module to HBase. See attached document for more
> details.
> > >
> >
>

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Yung-An He <ma...@gmail.com>.
Thank you, Ted.
It's my bad!

Here is the link:
https://drive.google.com/file/d/0Bw6_ESGWcwIqYWpRWkFtTFBIQms/view?usp=sharing

2017-09-27 22:34 GMT+08:00 Ted Yu <yu...@gmail.com>:

> Document didn't come through.
>
> Consider using googledoc, etc
>
> On Wed, Sep 27, 2017 at 7:27 AM, Yung-An He <ma...@gmail.com> wrote:
>
> > Hi folks,
> >
> > Currently, HBase hasn’t support SQL syntax yet. Many users who are
> > familiar with SQL syntax can only query data from HBase via Hive, Impala,
> > Phoenix or other tools that support SQL syntax. However, some tools are
> too
> > complicated to install and need to restart the HBase cluster.
> Additionally,
> > there may be side effects that affect raw data. If HBase has a native SQL
> > querying module which is easy to install without restarting the cluster
> as
> > well as not affecting the original data, it would be extremely helpful
> and
> > handy for users that are already familiar with SQL syntax.
> >
> > In HareDB, we have implemented HareQL, and hope that we can contribute
> > some piece as a module to HBase. See attached document for more details.
> >
>

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Ted Yu <yu...@gmail.com>.
Document didn't come through.

Consider using googledoc, etc

On Wed, Sep 27, 2017 at 7:27 AM, Yung-An He <ma...@gmail.com> wrote:

> Hi folks,
>
> Currently, HBase hasn’t support SQL syntax yet. Many users who are
> familiar with SQL syntax can only query data from HBase via Hive, Impala,
> Phoenix or other tools that support SQL syntax. However, some tools are too
> complicated to install and need to restart the HBase cluster. Additionally,
> there may be side effects that affect raw data. If HBase has a native SQL
> querying module which is easy to install without restarting the cluster as
> well as not affecting the original data, it would be extremely helpful and
> handy for users that are already familiar with SQL syntax.
>
> In HareDB, we have implemented HareQL, and hope that we can contribute
> some piece as a module to HBase. See attached document for more details.
>

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Stack <st...@duboce.net>.
Thank you for making this proposal Yung-An He.

What does "...semantic analysis using the tree data structure..." mean?

The hbase:tableschema would specify column typing? It would allow compound
types? How about the row key as compound type? What typing system will
HBaseQL use? Will it make a new one or use Phoenix types or the hbase
OrderedType?

Having to describe the table in hbase:tableschema is similar to how Drill
operates; we could embed a Drillbit and get what is here in this proposal?

I wouldn't lean too much on the need of a restart differentiating HBaseQL
and Phoenix or even the fact that Phoenix owns its tables; in the scheme of
things these inconveniences fade after initial one-time pain.

Upsides as I see it are table-scope and no joins; i.e. basic.

I see some downside though

+ Yet another HBase SQL (YAHS) in an already crowded space offering a new
permutation that is intentionally crimped and so likely useful at first but
ultimately may just frustrate (no joins, etc.)
+ It is a new effort rather when existing projects are starved for
resources as it is (e.g. could this HBaseQL be done using a cut-down,
table-scoped, no-joins Phoenix so Phoenix got the benefit of some dev
resource? Or as an embedded Drillbit with HBase amenities?)
+ The user would have to run two shells: hbase sql and hbase shell.
+ It is a load of new code in a new domain (SQL) that we'd be pulling into
a hbase, a project that is already broad in scope.

Thanks again for putting up the proposal Yung-An He,
St.Ack
















On Wed, Sep 27, 2017 at 7:27 AM, Yung-An He <ma...@gmail.com> wrote:

> Hi folks,
>
> Currently, HBase hasn’t support SQL syntax yet. Many users who are
> familiar with SQL syntax can only query data from HBase via Hive, Impala,
> Phoenix or other tools that support SQL syntax. However, some tools are too
> complicated to install and need to restart the HBase cluster. Additionally,
> there may be side effects that affect raw data. If HBase has a native SQL
> querying module which is easy to install without restarting the cluster as
> well as not affecting the original data, it would be extremely helpful and
> handy for users that are already familiar with SQL syntax.
>
> In HareDB, we have implemented HareQL, and hope that we can contribute
> some piece as a module to HBase. See attached document for more details.
>

Re: [Proposal] Supporting Query Langeage In HBase

Posted by Josh Elser <el...@apache.org>.
Thanks for sharing.

Architecturally, this looks identical to what Apache Phoenix is today. 
Why would we use this new codebase over a proven tool like Phoenix? What 
does your proposal bring to the table which Phoenix doesn't do?

On 9/27/17 10:27 AM, Yung-An He wrote:
> Hi folks,
> 
> Currently, HBase hasn’t support SQL syntax yet. Many users who are 
> familiar with SQL syntax can only query data from HBase via Hive, 
> Impala, Phoenix or other tools that support SQL syntax. However, some 
> tools are too complicated to install and need to restart the HBase 
> cluster. Additionally, there may be side effects that affect raw data. 
> If HBase has a native SQL querying module which is easy to install 
> without restarting the cluster as well as not affecting the original 
> data, it would be extremely helpful and handy for users that are already 
> familiar with SQL syntax.
> 
> In HareDB, we have implemented HareQL, and hope that we can contribute 
> some piece as a module to HBase. See attached document for more details.