You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Himanshu Vashishtha <va...@gmail.com> on 2010/08/19 11:47:27 UTC

HBase: project ideas

Dear All:
I have been looking around HBase (running/debugging it, etc) for a couple of
weeks now, and it is fascinating. I am in search of a good project for my
grad studies, focussing around HBase, but am not able to finalize it. I am
looking for some project idea that I can use. It can be user or a dev
project, I am open to all :)

One idea (user specific) is to migrate a XQuery like tool that uses
relational db schema (there are bunch of papers suggesting it) to HBase, but
I don't sure whether it is really a judicial use of HBase. Please suggest.


Thanks,
Himanshu

RE: HBase: project ideas

Posted by Jonathan Gray <jg...@facebook.com>.
Himanshu,

Seems like you might have an interest in using Coprocessors to do stuff like low-latency aggregates.  This is a big area of interest for some of us but not a lot of concerted effort in this direction yet.  There is plenty to do here for a research project.

Check out:

https://issues.apache.org/jira/browse/HBASE-2000

And specifically:

https://issues.apache.org/jira/browse/HBASE-1512

JG

> -----Original Message-----
> From: Himanshu Vashishtha [mailto:vashishtha.h@gmail.com]
> Sent: Thursday, August 19, 2010 11:30 AM
> To: dev@hbase.apache.org
> Cc: user@hbase.apache.org
> Subject: Re: HBase: project ideas
> 
> Hello Stack,
> Thanks for the reply. please see inline.
> 
> Cheers,
> Himanshu
> 
> On Thu, Aug 19, 2010 at 11:22 AM, Stack <st...@duboce.net> wrote:
> 
> > On Thu, Aug 19, 2010 at 2:47 AM, Himanshu Vashishtha
> > <va...@gmail.com> wrote:
> > > Dear All:
> > > I have been looking around HBase (running/debugging it, etc) for a
> couple
> > of
> > > weeks now, and it is fascinating. I am in search of a good project
> for my
> > > grad studies, focussing around HBase, but am not able to finalize
> it. I
> > am
> > > looking for some project idea that I can use. It can be user or a
> dev
> > > project, I am open to all :)
> > >
> > > One idea (user specific) is to migrate a XQuery like tool that uses
> > > relational db schema (there are bunch of papers suggesting it) to
> HBase,
> > but
> > > I don't sure whether it is really a judicial use of HBase. Please
> > suggest.
> > >
> > >
> >
> > Hello Himanshu.
> >
> > Its hard to make suggestion when I've no clue as to your interests.
> >
> Hadoop fascinates me. I wrote a tool for my lab which indexes a given
> document collection (of plain text files) and then user can query it
> from
> four predefined operations... I store those indexes on HDFS using
> Mapfiles(to reduce the request-response latency).
> 
> Can you cite some of the papers you mention?
> > So, I want to carry it forward for XML, and I came across two
> approaches:
> > indexing the doc, OR storing them in a rdbms style while also
> considering
> > schema info.
> >
> Paper ( for index based approach): An efficient inverted index
> technique for
> XML documents using RDBMS, Chiyoung Seo, others..2003.
> 
> and for rdbms approach: *A Comprehensive XQuery* to *SQL* Translation
> using
> Dynamic Interval Encoding. David DeHaan, David Toman, Mariano P.
> Consens,
> others... in 2003, and its references.
> 
> I developed a prototype for the index based one in HBase, but it is
> limited
> in usage (due to its inherent approach of indexing, you can't fire
> elegant
> operations like summing, grouping etc). Its quite raw.
> 
>  + Have you looked at HIVE?  It might be more pertinent making this run
> > better atop hbase rather than making a new XQuery-like tool for
> hbase.
> >
> 
> Not yet. I read that it runs a MR job for every query, and it kind of
> slows
> its response time, so I skipped it past. But yes, it does provides lot
> of
> relational schema stuff I see.
> 
> > + Build an app that allows various kind of location queries using
> > geohashing+hbase combo.  There's a few fellas floating on the list
> who
> > might be able to help you out on this project.
> >
> > For extra points, whatever you do, build it using hbase-2000
> coprocessors.
> >   I am sorry I couldn't get this.
> >
> 
> 
> > Thanks for writing the list Himanshu.
> > St.Ack
> >

RE: HBase: project ideas

Posted by Jonathan Gray <jg...@facebook.com>.
Himanshu,

Seems like you might have an interest in using Coprocessors to do stuff like low-latency aggregates.  This is a big area of interest for some of us but not a lot of concerted effort in this direction yet.  There is plenty to do here for a research project.

Check out:

https://issues.apache.org/jira/browse/HBASE-2000

And specifically:

https://issues.apache.org/jira/browse/HBASE-1512

JG

> -----Original Message-----
> From: Himanshu Vashishtha [mailto:vashishtha.h@gmail.com]
> Sent: Thursday, August 19, 2010 11:30 AM
> To: dev@hbase.apache.org
> Cc: user@hbase.apache.org
> Subject: Re: HBase: project ideas
> 
> Hello Stack,
> Thanks for the reply. please see inline.
> 
> Cheers,
> Himanshu
> 
> On Thu, Aug 19, 2010 at 11:22 AM, Stack <st...@duboce.net> wrote:
> 
> > On Thu, Aug 19, 2010 at 2:47 AM, Himanshu Vashishtha
> > <va...@gmail.com> wrote:
> > > Dear All:
> > > I have been looking around HBase (running/debugging it, etc) for a
> couple
> > of
> > > weeks now, and it is fascinating. I am in search of a good project
> for my
> > > grad studies, focussing around HBase, but am not able to finalize
> it. I
> > am
> > > looking for some project idea that I can use. It can be user or a
> dev
> > > project, I am open to all :)
> > >
> > > One idea (user specific) is to migrate a XQuery like tool that uses
> > > relational db schema (there are bunch of papers suggesting it) to
> HBase,
> > but
> > > I don't sure whether it is really a judicial use of HBase. Please
> > suggest.
> > >
> > >
> >
> > Hello Himanshu.
> >
> > Its hard to make suggestion when I've no clue as to your interests.
> >
> Hadoop fascinates me. I wrote a tool for my lab which indexes a given
> document collection (of plain text files) and then user can query it
> from
> four predefined operations... I store those indexes on HDFS using
> Mapfiles(to reduce the request-response latency).
> 
> Can you cite some of the papers you mention?
> > So, I want to carry it forward for XML, and I came across two
> approaches:
> > indexing the doc, OR storing them in a rdbms style while also
> considering
> > schema info.
> >
> Paper ( for index based approach): An efficient inverted index
> technique for
> XML documents using RDBMS, Chiyoung Seo, others..2003.
> 
> and for rdbms approach: *A Comprehensive XQuery* to *SQL* Translation
> using
> Dynamic Interval Encoding. David DeHaan, David Toman, Mariano P.
> Consens,
> others... in 2003, and its references.
> 
> I developed a prototype for the index based one in HBase, but it is
> limited
> in usage (due to its inherent approach of indexing, you can't fire
> elegant
> operations like summing, grouping etc). Its quite raw.
> 
>  + Have you looked at HIVE?  It might be more pertinent making this run
> > better atop hbase rather than making a new XQuery-like tool for
> hbase.
> >
> 
> Not yet. I read that it runs a MR job for every query, and it kind of
> slows
> its response time, so I skipped it past. But yes, it does provides lot
> of
> relational schema stuff I see.
> 
> > + Build an app that allows various kind of location queries using
> > geohashing+hbase combo.  There's a few fellas floating on the list
> who
> > might be able to help you out on this project.
> >
> > For extra points, whatever you do, build it using hbase-2000
> coprocessors.
> >   I am sorry I couldn't get this.
> >
> 
> 
> > Thanks for writing the list Himanshu.
> > St.Ack
> >

Re: HBase: project ideas

Posted by Himanshu Vashishtha <va...@gmail.com>.
Hello Stack,
Thanks for the reply. please see inline.

Cheers,
Himanshu

On Thu, Aug 19, 2010 at 11:22 AM, Stack <st...@duboce.net> wrote:

> On Thu, Aug 19, 2010 at 2:47 AM, Himanshu Vashishtha
> <va...@gmail.com> wrote:
> > Dear All:
> > I have been looking around HBase (running/debugging it, etc) for a couple
> of
> > weeks now, and it is fascinating. I am in search of a good project for my
> > grad studies, focussing around HBase, but am not able to finalize it. I
> am
> > looking for some project idea that I can use. It can be user or a dev
> > project, I am open to all :)
> >
> > One idea (user specific) is to migrate a XQuery like tool that uses
> > relational db schema (there are bunch of papers suggesting it) to HBase,
> but
> > I don't sure whether it is really a judicial use of HBase. Please
> suggest.
> >
> >
>
> Hello Himanshu.
>
> Its hard to make suggestion when I've no clue as to your interests.
>
Hadoop fascinates me. I wrote a tool for my lab which indexes a given
document collection (of plain text files) and then user can query it from
four predefined operations... I store those indexes on HDFS using
Mapfiles(to reduce the request-response latency).

Can you cite some of the papers you mention?
> So, I want to carry it forward for XML, and I came across two approaches:
> indexing the doc, OR storing them in a rdbms style while also considering
> schema info.
>
Paper ( for index based approach): An efficient inverted index technique for
XML documents using RDBMS, Chiyoung Seo, others..2003.

and for rdbms approach: *A Comprehensive XQuery* to *SQL* Translation using
Dynamic Interval Encoding. David DeHaan, David Toman, Mariano P. Consens,
others... in 2003, and its references.

I developed a prototype for the index based one in HBase, but it is limited
in usage (due to its inherent approach of indexing, you can't fire elegant
operations like summing, grouping etc). Its quite raw.

 + Have you looked at HIVE?  It might be more pertinent making this run
> better atop hbase rather than making a new XQuery-like tool for hbase.
>

Not yet. I read that it runs a MR job for every query, and it kind of slows
its response time, so I skipped it past. But yes, it does provides lot of
relational schema stuff I see.

> + Build an app that allows various kind of location queries using
> geohashing+hbase combo.  There's a few fellas floating on the list who
> might be able to help you out on this project.
>
> For extra points, whatever you do, build it using hbase-2000 coprocessors.
>   I am sorry I couldn't get this.
>


> Thanks for writing the list Himanshu.
> St.Ack
>

Re: HBase: project ideas

Posted by Himanshu Vashishtha <va...@gmail.com>.
Hello Stack,
Thanks for the reply. please see inline.

Cheers,
Himanshu

On Thu, Aug 19, 2010 at 11:22 AM, Stack <st...@duboce.net> wrote:

> On Thu, Aug 19, 2010 at 2:47 AM, Himanshu Vashishtha
> <va...@gmail.com> wrote:
> > Dear All:
> > I have been looking around HBase (running/debugging it, etc) for a couple
> of
> > weeks now, and it is fascinating. I am in search of a good project for my
> > grad studies, focussing around HBase, but am not able to finalize it. I
> am
> > looking for some project idea that I can use. It can be user or a dev
> > project, I am open to all :)
> >
> > One idea (user specific) is to migrate a XQuery like tool that uses
> > relational db schema (there are bunch of papers suggesting it) to HBase,
> but
> > I don't sure whether it is really a judicial use of HBase. Please
> suggest.
> >
> >
>
> Hello Himanshu.
>
> Its hard to make suggestion when I've no clue as to your interests.
>
Hadoop fascinates me. I wrote a tool for my lab which indexes a given
document collection (of plain text files) and then user can query it from
four predefined operations... I store those indexes on HDFS using
Mapfiles(to reduce the request-response latency).

Can you cite some of the papers you mention?
> So, I want to carry it forward for XML, and I came across two approaches:
> indexing the doc, OR storing them in a rdbms style while also considering
> schema info.
>
Paper ( for index based approach): An efficient inverted index technique for
XML documents using RDBMS, Chiyoung Seo, others..2003.

and for rdbms approach: *A Comprehensive XQuery* to *SQL* Translation using
Dynamic Interval Encoding. David DeHaan, David Toman, Mariano P. Consens,
others... in 2003, and its references.

I developed a prototype for the index based one in HBase, but it is limited
in usage (due to its inherent approach of indexing, you can't fire elegant
operations like summing, grouping etc). Its quite raw.

 + Have you looked at HIVE?  It might be more pertinent making this run
> better atop hbase rather than making a new XQuery-like tool for hbase.
>

Not yet. I read that it runs a MR job for every query, and it kind of slows
its response time, so I skipped it past. But yes, it does provides lot of
relational schema stuff I see.

> + Build an app that allows various kind of location queries using
> geohashing+hbase combo.  There's a few fellas floating on the list who
> might be able to help you out on this project.
>
> For extra points, whatever you do, build it using hbase-2000 coprocessors.
>   I am sorry I couldn't get this.
>


> Thanks for writing the list Himanshu.
> St.Ack
>

Re: HBase: project ideas

Posted by Stack <st...@duboce.net>.
On Thu, Aug 19, 2010 at 2:47 AM, Himanshu Vashishtha
<va...@gmail.com> wrote:
> Dear All:
> I have been looking around HBase (running/debugging it, etc) for a couple of
> weeks now, and it is fascinating. I am in search of a good project for my
> grad studies, focussing around HBase, but am not able to finalize it. I am
> looking for some project idea that I can use. It can be user or a dev
> project, I am open to all :)
>
> One idea (user specific) is to migrate a XQuery like tool that uses
> relational db schema (there are bunch of papers suggesting it) to HBase, but
> I don't sure whether it is really a judicial use of HBase. Please suggest.
>
>

Hello Himanshu.

Its hard to make suggestion when I've no clue as to your interests.
Can you cite some of the papers you mention?

+ Have you looked at HIVE?  It might be more pertinent making this run
better atop hbase rather than making a new XQuery-like tool for hbase.
+ Build an app that allows various kind of location queries using
geohashing+hbase combo.  There's a few fellas floating on the list who
might be able to help you out on this project.

For extra points, whatever you do, build it using hbase-2000 coprocessors.

Thanks for writing the list Himanshu.
St.Ack

Re: HBase: project ideas

Posted by Stack <st...@duboce.net>.
On Thu, Aug 19, 2010 at 2:47 AM, Himanshu Vashishtha
<va...@gmail.com> wrote:
> Dear All:
> I have been looking around HBase (running/debugging it, etc) for a couple of
> weeks now, and it is fascinating. I am in search of a good project for my
> grad studies, focussing around HBase, but am not able to finalize it. I am
> looking for some project idea that I can use. It can be user or a dev
> project, I am open to all :)
>
> One idea (user specific) is to migrate a XQuery like tool that uses
> relational db schema (there are bunch of papers suggesting it) to HBase, but
> I don't sure whether it is really a judicial use of HBase. Please suggest.
>
>

Hello Himanshu.

Its hard to make suggestion when I've no clue as to your interests.
Can you cite some of the papers you mention?

+ Have you looked at HIVE?  It might be more pertinent making this run
better atop hbase rather than making a new XQuery-like tool for hbase.
+ Build an app that allows various kind of location queries using
geohashing+hbase combo.  There's a few fellas floating on the list who
might be able to help you out on this project.

For extra points, whatever you do, build it using hbase-2000 coprocessors.

Thanks for writing the list Himanshu.
St.Ack