You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Burton-West <tb...@umich.edu> on 2014/09/17 00:15:53 UTC

Solr 4.10 termsIndexInterval and termsIndexDivisor not supported with default PostingsFormat?

Hello,

I think the documentation and example files for Solr 4.x need to be
updated.  If someone will let me know I'll be happy to fix the example
and perhaps someone with edit rights could fix the reference guide.

Due to dirty OCR and over 400 languages we have over 2 billion unique
terms in our index.  In Solr 3.6 we set termIndexInterval to 1024 (8
times the default of 128) to reduce the size of the in-memory index.
Previously we used termIndexDivisor for a similar purpose.

We suspect that in Solr 4.10 (and probably previous Solr 4.x versions)
termIndexInterval and termIndexDivisor do not apply to the default
codec and are probably unnecessary (since the default terms index now
uses a much more efficient representation).

According to the JavaDocs for IndexWriterConfig, the Lucene level
implementations of these do not apply to the default PostingsFormat
implementation.
http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/index/IndexWriterConfig.html#setReaderTermsIndexDivisor%28int%29

Despite this statement in the Lucene JavaDocs, in the
example/solrconfig.xml there is the following:

<!-- Expert: Controls how often Lucene loads terms into memory
278 Default is 128 and is likely good for most everyone.
279 -->
280 <!-- <termIndexInterval>128</termIndexInterval> -->

In the 4.10 reference manual page 365 there is also an example showing
the termIndexInterval.

Can someone please confirm that these two parameter settings
termIndexInterval and termsIndexDivisor, do not apply to the default
PostingsFormat for Solr 4.10?

Tom

Re: Solr 4.10 termsIndexInterval and termsIndexDivisor not supported with default PostingsFormat?

Posted by Tom Burton-West <tb...@umich.edu>.
Thanks Hoss,

Just opened SOLR-6560 and attached a patch which removes the offending
section from the example solrconfig.xml file.

  We suspect that with the much more efficient block and FST based Solr 4
default postings format that the need to mess with the parameters in order
to reduce memory usage has gone away.  Haven't really tested yet.

If there is still a use case for configuring the Solr default
PostingsFormat  and the ability to set the parameters currently exists,
than maybe someone who understands this could put an example in the
solrconfig.xml file and documentation.   On the other hand if the use case
still exists and Solr doesn't have the ability to configure the parameters,
maybe another issue should be opened.  Looks like all that would be needed
is a mechanism to pass a couple of ints to the Lucene postings format:

" For example, Lucene41PostingsFormat
<http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html>implements
the term index instead based upon how terms share prefixes. To configure
its parameters (the minimum and maximum size for a block), you would
instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, int)
<http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat(int,%20int)>.
which can also be configured on a per-field basis:"

Tom

On Thu, Sep 18, 2014 at 1:42 PM, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : I think the documentation and example files for Solr 4.x need to be
> : updated.  If someone will let me know I'll be happy to fix the example
> : and perhaps someone with edit rights could fix the reference guide.
>
> I think you're correct - can you open a Jira with suggested improvements
> for the configs?  (i see you commented on the ref guide page which is
> helpful - but the jira issue wil also help serve sa a reminder to audit
> *all* the pages for refrences to these options, ie: in config snippets,
> etc...)
>
> : According to the JavaDocs for IndexWriterConfig, the Lucene level
> : implementations of these do not apply to the default PostingsFormat
> : implementation.
> :
> http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/index/IndexWriterConfig.html#setReaderTermsIndexDivisor%28int%29
> :
> : Despite this statement in the Lucene JavaDocs, in the
> : example/solrconfig.xml there is the following:
>
> Yeah ... I'm not sure what (if anything?) we should say about these in the
> example configs -- the *setting* is valid and supported by
> IndexWriterConfig no matter what posting format you use, so it's not an
> error to configure this, but it can be ignored in many cases.
>
> : Can someone please confirm that these two parameter settings
> : termIndexInterval and termsIndexDivisor, do not apply to the default
> : PostingsFormat for Solr 4.10?
>
> I was taking your word for it :)
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Solr 4.10 termsIndexInterval and termsIndexDivisor not supported with default PostingsFormat?

Posted by Chris Hostetter <ho...@fucit.org>.
: I think the documentation and example files for Solr 4.x need to be
: updated.  If someone will let me know I'll be happy to fix the example
: and perhaps someone with edit rights could fix the reference guide.

I think you're correct - can you open a Jira with suggested improvements 
for the configs?  (i see you commented on the ref guide page which is 
helpful - but the jira issue wil also help serve sa a reminder to audit 
*all* the pages for refrences to these options, ie: in config snippets, 
etc...)

: According to the JavaDocs for IndexWriterConfig, the Lucene level
: implementations of these do not apply to the default PostingsFormat
: implementation.
: http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/index/IndexWriterConfig.html#setReaderTermsIndexDivisor%28int%29
: 
: Despite this statement in the Lucene JavaDocs, in the
: example/solrconfig.xml there is the following:

Yeah ... I'm not sure what (if anything?) we should say about these in the 
example configs -- the *setting* is valid and supported by 
IndexWriterConfig no matter what posting format you use, so it's not an 
error to configure this, but it can be ignored in many cases.

: Can someone please confirm that these two parameter settings
: termIndexInterval and termsIndexDivisor, do not apply to the default
: PostingsFormat for Solr 4.10?

I was taking your word for it :)


-Hoss
http://www.lucidworks.com/