You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Beale, Jim (US-KOP)" <Ji...@hibu.com> on 2013/05/01 01:57:02 UTC

Block tree terms dict & index

Hello all,

We've just upgraded to 4.2 from 3.6 and suffered some performance degradation in both indexing and retrieval. We've had to eliminate compression, even supplying our own NoCompression codec since there doesn't appear to be any built in support for this. Hopefully we're not overlooking something with the compression. It did reduce the size of our indexes and thus our memory footprint but we lost more on the LZ4 decompression than we gained by having more free memory.

DocValues didn't help us either. We attempted to create an in-memory cache, using a separate index which we closed afterwards and performing a map reduce to speed up access, but we didn't see any significant performance gains.

What about block tree terms? What is the use case for that feature? I noticed that benefits appeared in the spell correction tests but I'm still not clear about how best to employ the codec. Has anyone had any experience with it?

Thanks for any and all insights.

Best regards,
Jim Beale

The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Block tree terms dict & index

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Apr 30, 2013 at 7:57 PM, Beale, Jim (US-KOP) <Ji...@hibu.com> wrote:

> We've just upgraded to 4.2 from 3.6 and suffered some performance degradation in both indexing and retrieval.  We've had to eliminate compression, even supplying our own NoCompression codec since there doesn't appear to be any built in support for this.  Hopefully we're not overlooking something with the compression.

Customizing your codec components to change or disable compression is
entirely normal... but it's curious you saw such a performance hit
from the compression.  Can you share more details?  Was it from
compressed stored fields or term vectors?  Or both?

> It did reduce the size of our indexes and thus our memory footprint but we lost more on the LZ4 decompression than we gained by having more free memory.

OK.

> DocValues didn't help us either.  We attempted to create an in-memory cache, using a separate index which we closed afterwards and performing a map reduce to speed up access, but we didn't see any significant performance gains.

What were you using DocValues for (and how did you do it in 3.6)?

> What about block tree terms?  What is the use case for that feature?  I noticed that benefits appeared in the spell correction tests but I'm still not clear about how best to employ the codec.  Has anyone had any experience with it?

Block tree terms dict should reduce the time to load the metadata for
a given term, and reduce memory required for the terms index (loaded
fully into RAM).  So term-heavy queries (PK Lookup, direct spell
checker, fuzzy, certain automaton queries) see the most gains.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org