You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Paramasivam Srinivasan <sr...@indicus.net> on 2006/11/02 08:52:18 UTC

Re: Announcement: Lucene powering Monster job search index (Beta)

Hi Peter

When I use the CustomHitCollector, it affect the application performance. 
Also how you accomplish the grouping the results with out affecting 
performance. Also If possible give some code snippet for custome 
hitcollector.

TIA

Sri

"Peter Keegan" <pe...@gmail.com> wrote in message 
news:e994873a0610300831p1229ad0k69de63cba052bd22@mail.gmail.com...
> Joe,
>
> Fields with numeric values are stored in a separate file as binary values 
> in
> an internal format. Lucene is unaware of this file and unaware of the 
> range
> expression in the query. The range expression is parsed outside of Lucene
> and used in a custom HitCollector to filter out documents that aren't in 
> the
> requested range(s). A goal was to do this without having to modify Lucene.
> Our scheme is pretty efficient, but not very general purpose in its 
> current
> form, though.
>
> Peter
>
>
> On 10/30/06, Joe Shaw <jo...@novell.com> wrote:
>>
>> Hi Peter,
>>
>> On Fri, 2006-10-27 at 15:29 -0400, Peter Keegan wrote:
>> > Numeric range search is one of Lucene's weak points (performance-wise)
>> so we
>> > have implemented this with a custom HitCollector and an extension to 
>> > the
>> > Lucene index files that stores the numeric field values for all
>> documents.
>> >
>> > It is important to point out that this has all been implemented with 
>> > the
>> > stock Lucene 2.0 library. No code changes were made to the Lucene core.
>>
>> Can you give some technical details on the extension to the Lucene index
>> files?  How did you do it without making any changes to the Lucene core?
>>
>> Thanks,
>> Joe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Announcement: Lucene powering Monster job search index (Beta)

Posted by Peter Keegan <pe...@gmail.com>.

Paramasivam,

Take a look at Solr, in particular the DocSetHitCollector class. The
collector simply sets a bit in a BitSet, or saves the docIds in an array
(for low hit counts). Solr's BitSet was optimized (by Yonik, I believe) to
be faster than Java's BitSet, so this HitCollector is very fast. This is
essentially what we are doing for counting.

Peter

On 11/2/06, Paramasivam Srinivasan <sr...@indicus.net> wrote:
>
> Hi Peter
>
> When I use the CustomHitCollector, it affect the application performance.
> Also how you accomplish the grouping the results with out affecting
> performance. Also If possible give some code snippet for custome
> hitcollector.
>
> TIA
>
> Sri
>
> "Peter Keegan" <pe...@gmail.com> wrote in message
> news:e994873a0610300831p1229ad0k69de63cba052bd22@mail.gmail.com...
> > Joe,
> >
> > Fields with numeric values are stored in a separate file as binary
> values
> > in
> > an internal format. Lucene is unaware of this file and unaware of the
> > range
> > expression in the query. The range expression is parsed outside of
> Lucene
> > and used in a custom HitCollector to filter out documents that aren't in
> > the
> > requested range(s). A goal was to do this without having to modify
> Lucene.
> > Our scheme is pretty efficient, but not very general purpose in its
> > current
> > form, though.
> >
> > Peter
> >
> >
> > On 10/30/06, Joe Shaw <jo...@novell.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> On Fri, 2006-10-27 at 15:29 -0400, Peter Keegan wrote:
> >> > Numeric range search is one of Lucene's weak points
> (performance-wise)
> >> so we
> >> > have implemented this with a custom HitCollector and an extension to
> >> > the
> >> > Lucene index files that stores the numeric field values for all
> >> documents.
> >> >
> >> > It is important to point out that this has all been implemented with
> >> > the
> >> > stock Lucene 2.0 library. No code changes were made to the Lucene
> core.
> >>
> >> Can you give some technical details on the extension to the Lucene
> index
> >> files?  How did you do it without making any changes to the Lucene
> core?
> >>
> >> Thanks,
> >> Joe
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>