You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Tamer Gür <tg...@ebi.ac.uk> on 2013/04/30 16:45:39 UTC

Document boosting

Hi,

we are migrating from 3.6 to 4.2. since Document.setBoost() method 
removed we are trying to reimplement.

Currently we are using Document.setBoost() method as a scalar boost 
factor across our multiple different indexes.

With the lucene 4.2 setting this factor by field.setBoost() is not 
feasible for us since we have many indexes and complex fields.

So what is  possible drawback if i override DefaultSimilarity and put 
back the Document.setBoost() during norm calculation?
or is there any other way  to add a scalar factor to all document?

Thank you.
Tamer



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Block tree terms dict & index

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Apr 30, 2013 at 7:57 PM, Beale, Jim (US-KOP) <Ji...@hibu.com> wrote:

> We've just upgraded to 4.2 from 3.6 and suffered some performance degradation in both indexing and retrieval.  We've had to eliminate compression, even supplying our own NoCompression codec since there doesn't appear to be any built in support for this.  Hopefully we're not overlooking something with the compression.

Customizing your codec components to change or disable compression is
entirely normal... but it's curious you saw such a performance hit
from the compression.  Can you share more details?  Was it from
compressed stored fields or term vectors?  Or both?

> It did reduce the size of our indexes and thus our memory footprint but we lost more on the LZ4 decompression than we gained by having more free memory.

OK.

> DocValues didn't help us either.  We attempted to create an in-memory cache, using a separate index which we closed afterwards and performing a map reduce to speed up access, but we didn't see any significant performance gains.

What were you using DocValues for (and how did you do it in 3.6)?

> What about block tree terms?  What is the use case for that feature?  I noticed that benefits appeared in the spell correction tests but I'm still not clear about how best to employ the codec.  Has anyone had any experience with it?

Block tree terms dict should reduce the time to load the metadata for
a given term, and reduce memory required for the terms index (loaded
fully into RAM).  So term-heavy queries (PK Lookup, direct spell
checker, fuzzy, certain automaton queries) see the most gains.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Block tree terms dict & index

Posted by "Beale, Jim (US-KOP)" <Ji...@hibu.com>.

Hello all,

We've just upgraded to 4.2 from 3.6 and suffered some performance degradation in both indexing and retrieval. We've had to eliminate compression, even supplying our own NoCompression codec since there doesn't appear to be any built in support for this. Hopefully we're not overlooking something with the compression. It did reduce the size of our indexes and thus our memory footprint but we lost more on the LZ4 decompression than we gained by having more free memory.

DocValues didn't help us either. We attempted to create an in-memory cache, using a separate index which we closed afterwards and performing a map reduce to speed up access, but we didn't see any significant performance gains.

What about block tree terms? What is the use case for that feature? I noticed that benefits appeared in the spell correction tests but I'm still not clear about how best to employ the codec. Has anyone had any experience with it?

Thanks for any and all insights.

Best regards,
Jim Beale

The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Document boosting

Posted by Tamer Gür <tg...@ebi.ac.uk>.

Hi Ivan,
i was aware of that thread i also asked to learn about overriding 
DefaultSimilarity stuff or similar other approach.
Thanks.

On 30/04/2013 17:24, Ivan Brusic wrote:
> There was a similar question asked a couple of months ago, with a great
> answer by Uwe Schindler:
>
> http://search-lucene.com/m/Z2GP220szmS&subj=RE+What+is+equivalent+to+Document+setBoost+from+Lucene+3+6+inLucene+4+1+
>
> I am still on Lucene 3.x, so I have not yet had a chance to mimic document
> level boosts in 4.x.
>
> Cheers,
>
> Ivan
>
>
>
>
> On Tue, Apr 30, 2013 at 7:45 AM, Tamer Gür <tg...@ebi.ac.uk> wrote:
>
>> Hi,
>>
>> we are migrating from 3.6 to 4.2. since Document.setBoost() method removed
>> we are trying to reimplement.
>>
>> Currently we are using Document.setBoost() method as a scalar boost factor
>> across our multiple different indexes.
>>
>> With the lucene 4.2 setting this factor by field.setBoost() is not
>> feasible for us since we have many indexes and complex fields.
>>
>> So what is  possible drawback if i override DefaultSimilarity and put back
>> the Document.setBoost() during norm calculation?
>> or is there any other way  to add a scalar factor to all document?
>>
>> Thank you.
>> Tamer
>>
>>
>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Document boosting

Posted by Ivan Brusic <iv...@brusic.com>.

There was a similar question asked a couple of months ago, with a great
answer by Uwe Schindler:

http://search-lucene.com/m/Z2GP220szmS&subj=RE+What+is+equivalent+to+Document+setBoost+from+Lucene+3+6+inLucene+4+1+

I am still on Lucene 3.x, so I have not yet had a chance to mimic document
level boosts in 4.x.

Cheers,

Ivan




On Tue, Apr 30, 2013 at 7:45 AM, Tamer Gür <tg...@ebi.ac.uk> wrote:

> Hi,
>
> we are migrating from 3.6 to 4.2. since Document.setBoost() method removed
> we are trying to reimplement.
>
> Currently we are using Document.setBoost() method as a scalar boost factor
> across our multiple different indexes.
>
> With the lucene 4.2 setting this factor by field.setBoost() is not
> feasible for us since we have many indexes and complex fields.
>
> So what is  possible drawback if i override DefaultSimilarity and put back
> the Document.setBoost() during norm calculation?
> or is there any other way  to add a scalar factor to all document?
>
> Thank you.
> Tamer
>
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>
>