You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Geoff Hendrey <gh...@decarta.com> on 2011/07/19 18:12:53 UTC

hbase + lucene?

Hi -

 

At hadoop summit it was mentioned that there was a planning meeting for
a project regarding hbase and lucene. I believe the meeting was
scheduled for the day after the summit. I wasn't able to attend, but I
would like to keep abreast of what's going on in this regard. Anyone
know anything about this?

 

-geoff

Re: hbase + lucene?

Posted by Jason Rutherglen <ja...@gmail.com>.

> How are index updates handled, since they lead to performance degradation?

Document updates or complete re-indexing?  Lucene is very fast to
index, and soon will be just as fast in realtime - LUCENE-2312.

Lucene segments merges, like HBase compactions will carry the most overhead.

On Thu, Jul 21, 2011 at 10:20 AM, Mark Kerzner <ma...@gmail.com> wrote:
> How are index updates handled, since they lead to performance degradation?
>
> Mark
>
> On Thu, Jul 21, 2011 at 12:18 PM, Geoff Hendrey <gh...@decarta.com>wrote:
>
>> Thanks for the pointer. If I understand correctly, the index
>> partitioning strategy would simply mirror the region-server partitioning
>> strategy. That is to say, each region would have a lucene index stored
>> in HDFS to provide secondary indexing. When a user searches for "Fat
>> Cat", then every region would receive the query, and execute the search
>> and the results of this "query fanout" would then be merged together. Is
>> that roughly the idea? Individual lucene indexes would remain reasonably
>> sized because each index indexes only the data of a single region.
>> Correct?
>>
>> This is really interesting stuff. Thanks for your help in understanding
>> it.
>>
>> -geoff
>>
>> -----Original Message-----
>> From: Gary Helmling [mailto:ghelmling@gmail.com]
>> Sent: Tuesday, July 19, 2011 11:21 AM
>> To: user@hbase.apache.org
>> Subject: Re: hbase + lucene?
>>
>> I wasn't at the day-after presentation, but I believe these are the
>> slides?
>>
>> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B2c-F
>> WyLSJBCN2E5MTdmOGMtY2U5NS00NmEwLWE2NmItZTYxOTI0MTJmMzU5&hl=en_US
>>
>>
>> On Tue, Jul 19, 2011 at 10:29 AM, Stack <st...@duboce.net> wrote:
>>
>> > Here is the issue: https://issues.apache.org/jira/browse/HBASE-3529
>> >
>> > And let me chase Jason to post his slides.
>> >
>> > St.Ack
>> >
>> > On Tue, Jul 19, 2011 at 9:12 AM, Geoff Hendrey <gh...@decarta.com>
>> > wrote:
>> > > Hi -
>> > >
>> > >
>> > >
>> > > At hadoop summit it was mentioned that there was a planning meeting
>> for
>> > > a project regarding hbase and lucene. I believe the meeting was
>> > > scheduled for the day after the summit. I wasn't able to attend, but
>> I
>> > > would like to keep abreast of what's going on in this regard. Anyone
>> > > know anything about this?
>> > >
>> > >
>> > >
>> > > -geoff
>> > >
>> > >
>> >
>>
>

Re: hbase + lucene?

Posted by Mark Kerzner <ma...@gmail.com>.

How are index updates handled, since they lead to performance degradation?

Mark

On Thu, Jul 21, 2011 at 12:18 PM, Geoff Hendrey <gh...@decarta.com>wrote:

> Thanks for the pointer. If I understand correctly, the index
> partitioning strategy would simply mirror the region-server partitioning
> strategy. That is to say, each region would have a lucene index stored
> in HDFS to provide secondary indexing. When a user searches for "Fat
> Cat", then every region would receive the query, and execute the search
> and the results of this "query fanout" would then be merged together. Is
> that roughly the idea? Individual lucene indexes would remain reasonably
> sized because each index indexes only the data of a single region.
> Correct?
>
> This is really interesting stuff. Thanks for your help in understanding
> it.
>
> -geoff
>
> -----Original Message-----
> From: Gary Helmling [mailto:ghelmling@gmail.com]
> Sent: Tuesday, July 19, 2011 11:21 AM
> To: user@hbase.apache.org
> Subject: Re: hbase + lucene?
>
> I wasn't at the day-after presentation, but I believe these are the
> slides?
>
> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B2c-F
> WyLSJBCN2E5MTdmOGMtY2U5NS00NmEwLWE2NmItZTYxOTI0MTJmMzU5&hl=en_US
>
>
> On Tue, Jul 19, 2011 at 10:29 AM, Stack <st...@duboce.net> wrote:
>
> > Here is the issue: https://issues.apache.org/jira/browse/HBASE-3529
> >
> > And let me chase Jason to post his slides.
> >
> > St.Ack
> >
> > On Tue, Jul 19, 2011 at 9:12 AM, Geoff Hendrey <gh...@decarta.com>
> > wrote:
> > > Hi -
> > >
> > >
> > >
> > > At hadoop summit it was mentioned that there was a planning meeting
> for
> > > a project regarding hbase and lucene. I believe the meeting was
> > > scheduled for the day after the summit. I wasn't able to attend, but
> I
> > > would like to keep abreast of what's going on in this regard. Anyone
> > > know anything about this?
> > >
> > >
> > >
> > > -geoff
> > >
> > >
> >
>

Re: hbase + lucene?

Posted by Jason Rutherglen <ja...@gmail.com>.

Geoff,

> then every region would receive the query, and execute the search
> and the results of this "query fanout" would then be merged together. Is
> that roughly the idea?

This is one possibility, and/or querying only specific region ranges.

> Individual lucene indexes would remain reasonably
> sized because each index indexes only the data of a single region

Yes, this is the design of HBase Search.

Jason

On Thu, Jul 21, 2011 at 10:18 AM, Geoff Hendrey <gh...@decarta.com> wrote:
> Thanks for the pointer. If I understand correctly, the index
> partitioning strategy would simply mirror the region-server partitioning
> strategy. That is to say, each region would have a lucene index stored
> in HDFS to provide secondary indexing. When a user searches for "Fat
> Cat", then every region would receive the query, and execute the search
> and the results of this "query fanout" would then be merged together. Is
> that roughly the idea? Individual lucene indexes would remain reasonably
> sized because each index indexes only the data of a single region.
> Correct?
>
> This is really interesting stuff. Thanks for your help in understanding
> it.
>
> -geoff
>
> -----Original Message-----
> From: Gary Helmling [mailto:ghelmling@gmail.com]
> Sent: Tuesday, July 19, 2011 11:21 AM
> To: user@hbase.apache.org
> Subject: Re: hbase + lucene?
>
> I wasn't at the day-after presentation, but I believe these are the
> slides?
>
> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B2c-F
> WyLSJBCN2E5MTdmOGMtY2U5NS00NmEwLWE2NmItZTYxOTI0MTJmMzU5&hl=en_US
>
>
> On Tue, Jul 19, 2011 at 10:29 AM, Stack <st...@duboce.net> wrote:
>
>> Here is the issue: https://issues.apache.org/jira/browse/HBASE-3529
>>
>> And let me chase Jason to post his slides.
>>
>> St.Ack
>>
>> On Tue, Jul 19, 2011 at 9:12 AM, Geoff Hendrey <gh...@decarta.com>
>> wrote:
>> > Hi -
>> >
>> >
>> >
>> > At hadoop summit it was mentioned that there was a planning meeting
> for
>> > a project regarding hbase and lucene. I believe the meeting was
>> > scheduled for the day after the summit. I wasn't able to attend, but
> I
>> > would like to keep abreast of what's going on in this regard. Anyone
>> > know anything about this?
>> >
>> >
>> >
>> > -geoff
>> >
>> >
>>
>

RE: hbase + lucene?

Posted by Geoff Hendrey <gh...@decarta.com>.

Thanks for the pointer. If I understand correctly, the index
partitioning strategy would simply mirror the region-server partitioning
strategy. That is to say, each region would have a lucene index stored
in HDFS to provide secondary indexing. When a user searches for "Fat
Cat", then every region would receive the query, and execute the search
and the results of this "query fanout" would then be merged together. Is
that roughly the idea? Individual lucene indexes would remain reasonably
sized because each index indexes only the data of a single region.
Correct?

This is really interesting stuff. Thanks for your help in understanding
it.

-geoff 

-----Original Message-----
From: Gary Helmling [mailto:ghelmling@gmail.com] 
Sent: Tuesday, July 19, 2011 11:21 AM
To: user@hbase.apache.org
Subject: Re: hbase + lucene?

I wasn't at the day-after presentation, but I believe these are the
slides?

https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B2c-F
WyLSJBCN2E5MTdmOGMtY2U5NS00NmEwLWE2NmItZTYxOTI0MTJmMzU5&hl=en_US

On Tue, Jul 19, 2011 at 10:29 AM, Stack <st...@duboce.net> wrote:

> Here is the issue: https://issues.apache.org/jira/browse/HBASE-3529
>
> And let me chase Jason to post his slides.
>
> St.Ack
>
> On Tue, Jul 19, 2011 at 9:12 AM, Geoff Hendrey <gh...@decarta.com>
> wrote:
> > Hi -
> >
> >
> >
> > At hadoop summit it was mentioned that there was a planning meeting
for
> > a project regarding hbase and lucene. I believe the meeting was
> > scheduled for the day after the summit. I wasn't able to attend, but
I
> > would like to keep abreast of what's going on in this regard. Anyone
> > know anything about this?
> >
> >
> >
> > -geoff
> >
> >
>

Re: hbase + lucene?

Posted by Gary Helmling <gh...@gmail.com>.

I wasn't at the day-after presentation, but I believe these are the slides?

https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B2c-FWyLSJBCN2E5MTdmOGMtY2U5NS00NmEwLWE2NmItZTYxOTI0MTJmMzU5&hl=en_US


On Tue, Jul 19, 2011 at 10:29 AM, Stack <st...@duboce.net> wrote:

> Here is the issue: https://issues.apache.org/jira/browse/HBASE-3529
>
> And let me chase Jason to post his slides.
>
> St.Ack
>
> On Tue, Jul 19, 2011 at 9:12 AM, Geoff Hendrey <gh...@decarta.com>
> wrote:
> > Hi -
> >
> >
> >
> > At hadoop summit it was mentioned that there was a planning meeting for
> > a project regarding hbase and lucene. I believe the meeting was
> > scheduled for the day after the summit. I wasn't able to attend, but I
> > would like to keep abreast of what's going on in this regard. Anyone
> > know anything about this?
> >
> >
> >
> > -geoff
> >
> >
>

Re: hbase + lucene?

Posted by Stack <st...@duboce.net>.

Here is the issue: https://issues.apache.org/jira/browse/HBASE-3529

And let me chase Jason to post his slides.

St.Ack

On Tue, Jul 19, 2011 at 9:12 AM, Geoff Hendrey <gh...@decarta.com> wrote:
> Hi -
>
>
>
> At hadoop summit it was mentioned that there was a planning meeting for
> a project regarding hbase and lucene. I believe the meeting was
> scheduled for the day after the summit. I wasn't able to attend, but I
> would like to keep abreast of what's going on in this regard. Anyone
> know anything about this?
>
>
>
> -geoff
>
>