You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jason Rutherglen <ja...@gmail.com> on 2011/04/15 03:18:33 UTC

Re: HBase and Lucene for realtime search

Since posting this I started working on HBASE-3529, the goal of which
is to integrate Lucene into HBase, with an eye towards fully
integrating realtime search when it's available in Lucene.  RT'll give
immediate consistency of HBase put's into the search index.  The first
challenge has been how to perform queries on index files stored in
HDFS without speed degradation.

To solve that problem, I took the general notion of HDFS-347 and
instead now directly obtain a single block's java.io.File and memory
map it for Lucene's usage.  The benchmark's show that this system is
viable for Lucene queries.  The code is still rough, I will be
cleaning it up and making it easier for others to assemble and try on
their own.

There is work to be done on splitting the indexes and moving Lucene
indexes (to the local data node) when HBase rebalances a region.
Perhaps we can discuss issues on the dev list.  Comments are welcome.

Re: HBase and Lucene for realtime search

Posted by Jason Rutherglen <ja...@gmail.com>.
Previously in this thread there was concern about the indexing speed
of Lucene vs. HBase, while certainly the throughput will not be as
high when building a search index in conjunction with HBase, it should
be quite good nonetheless.

Here's a link to a discussion on this:

http://bit.ly/dGxlEp

Here are the two links at the bottom of the thread:

http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/

http://blog.mikemccandless.com/2010/09/lucenes-indexing-is-fast.html

Re: HBase and Lucene for realtime search

Posted by Jason Rutherglen <ja...@gmail.com>.
Ted thanks!

On Thu, Apr 14, 2011 at 7:41 PM, Ted Yu <yu...@gmail.com> wrote:
> Jason:
> I logged https://issues.apache.org/jira/browse/HBASE-3786
> Feel free to comment there.
>
> On Thu, Apr 14, 2011 at 6:18 PM, Jason Rutherglen <
> jason.rutherglen@gmail.com> wrote:
>
>> Since posting this I started working on HBASE-3529, the goal of which
>> is to integrate Lucene into HBase, with an eye towards fully
>> integrating realtime search when it's available in Lucene.  RT'll give
>> immediate consistency of HBase put's into the search index.  The first
>> challenge has been how to perform queries on index files stored in
>> HDFS without speed degradation.
>>
>> To solve that problem, I took the general notion of HDFS-347 and
>> instead now directly obtain a single block's java.io.File and memory
>> map it for Lucene's usage.  The benchmark's show that this system is
>> viable for Lucene queries.  The code is still rough, I will be
>> cleaning it up and making it easier for others to assemble and try on
>> their own.
>>
>> There is work to be done on splitting the indexes and moving Lucene
>> indexes (to the local data node) when HBase rebalances a region.
>> Perhaps we can discuss issues on the dev list.  Comments are welcome.
>>
>

Re: HBase and Lucene for realtime search

Posted by Ted Yu <yu...@gmail.com>.
Jason:
I logged https://issues.apache.org/jira/browse/HBASE-3786
Feel free to comment there.

On Thu, Apr 14, 2011 at 6:18 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> Since posting this I started working on HBASE-3529, the goal of which
> is to integrate Lucene into HBase, with an eye towards fully
> integrating realtime search when it's available in Lucene.  RT'll give
> immediate consistency of HBase put's into the search index.  The first
> challenge has been how to perform queries on index files stored in
> HDFS without speed degradation.
>
> To solve that problem, I took the general notion of HDFS-347 and
> instead now directly obtain a single block's java.io.File and memory
> map it for Lucene's usage.  The benchmark's show that this system is
> viable for Lucene queries.  The code is still rough, I will be
> cleaning it up and making it easier for others to assemble and try on
> their own.
>
> There is work to be done on splitting the indexes and moving Lucene
> indexes (to the local data node) when HBase rebalances a region.
> Perhaps we can discuss issues on the dev list.  Comments are welcome.
>