You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Clegg <an...@gmail.com> on 2009/10/26 12:11:00 UTC

Solr ignoring maxFieldLength?

Morning,

Last week I was having a problem with terms visible in my search results in
large documents not causing query hits:

http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351

Erick suggested it might be related to maxFieldLength, so I set this to
2147483647 in my solrconfig.xml and reindexed over the weekend.

Unfortunately I'm having the same problem now, even though Erick appears to
be right! I've narrowed it down to a single document for testing purposes,
and I can get it returned by querying for a term near the beginning, but
terms near the end cause no hit, and I can even find the point part way
through the document, after which, none of the remaining terms seem to cause
a hit.

The document is about 32000 terms long, most of which is in a single field
called related_ids of about 31000 terms. My first thought was that the text
was being chopped up into so many tokens that it was going over the
maxFieldLength anyway, but 2147483647/32000=67109, and it seems very
unlikely that 67109 tokens would be generated per term!

I've tried undeploying and redeploying the whole web app from Tomcat in case
the new maxFieldLength hadn't been read, but no difference. If I go to

http://localhost:8080/solr/admin/file/?file=solrconfig.xml

I can see

<maxFieldLength>2147483647</maxFieldLength>

as expected.

Does anyone have any more ideas? This could potentially be a showstopper for
us as we have quite a few long-ish documents to index. (32K words doesn't
seem that long to me, but still...)

I've tried it with today's nightly build (2009-10-26) and it makes no
difference. If this sounds like a bug, I'll open a JIRA and attach tars of
my config and data directories. Any thoughts?

Thanks,

Andrew.

-- 
View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26057808.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr ignoring maxFieldLength?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Oct 26, 2009 at 11:43 AM, Andrew Clegg <an...@gmail.com> wrote:
> Yonik Seeley-2 wrote:
>>
>> If you could, it would be great if you could test commenting out the
>> one in mainIndex and see if it inherits correctly from
>> indexDefaults... if so, I can comment it out in the example and remove
>> one other little thing that people could get wrong.
>>
>
> Yep, it seems perfectly happy like this.

Thanks for testing!  I'll update trunk.

-Yonik
http://www.lucidimagination.com


> I'm going to try commenting out all of mainIndex to see if it can
> successfully inherit everything from indexDefaults -- since I have
> <lockType>single</lockType> I won't need an unlockOnStartup entry, which
> doesn't appear in indexDefaults (at least in any of the config files I've
> seen).
>
> So...
>
>    <indexDefaults>
>        <useCompoundFile>false</useCompoundFile>
>        <mergeFactor>10</mergeFactor>
>        <maxMergeDocs>2147483647</maxMergeDocs>
>        <maxFieldLength>2147483647</maxFieldLength>
>        <writeLockTimeout>1000</writeLockTimeout>
>        <commitLockTimeout>10000</commitLockTimeout>
>        <lockType>single</lockType>
>    </indexDefaults>
>    <mainIndex>
>    </mainIndex>
>
> If the big overnight indexing job fails with these settings, I'll let you
> know.
>
> Cheers,
>
> Andrew.
>
> --
> View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26062113.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Solr ignoring maxFieldLength?

Posted by Andrew Clegg <an...@gmail.com>.

Yonik Seeley-2 wrote:
> 
> If you could, it would be great if you could test commenting out the
> one in mainIndex and see if it inherits correctly from
> indexDefaults... if so, I can comment it out in the example and remove
> one other little thing that people could get wrong.
> 

Yep, it seems perfectly happy like this.

I'm going to try commenting out all of mainIndex to see if it can
successfully inherit everything from indexDefaults -- since I have
<lockType>single</lockType> I won't need an unlockOnStartup entry, which
doesn't appear in indexDefaults (at least in any of the config files I've
seen).

So...

    <indexDefaults>
        <useCompoundFile>false</useCompoundFile>
        <mergeFactor>10</mergeFactor>
        <maxMergeDocs>2147483647</maxMergeDocs>
        <maxFieldLength>2147483647</maxFieldLength>
        <writeLockTimeout>1000</writeLockTimeout>
        <commitLockTimeout>10000</commitLockTimeout>
        <lockType>single</lockType>
    </indexDefaults>
    <mainIndex>
    </mainIndex>

If the big overnight indexing job fails with these settings, I'll let you
know.

Cheers,

Andrew.

-- 
View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26062113.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr ignoring maxFieldLength?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Oct 26, 2009 at 11:00 AM, Andrew Clegg <an...@gmail.com> wrote:
> Yonik Seeley-2 wrote:
>>
>> Sorry Andrew, this is something that's bitten people before.
>> search for maxFieldLength and you will see *2* of them in your config
>> - one for indexDefaults and one for mainIndex.
>> The one in mainIndex is set at 10000 and hence overrides the one in
>> indexDefaults.
>>
>
> Sorry -- schoolboy error. Glad I'm not the only one though. Yes, that seems
> to have fixed it...

Great!

If you could, it would be great if you could test commenting out the
one in mainIndex and see if it inherits correctly from
indexDefaults... if so, I can comment it out in the example and remove
one other little thing that people could get wrong.

-Yonik
http://www.lucidimagination.com

Re: Solr ignoring maxFieldLength?

Posted by Andrew Clegg <an...@gmail.com>.


Yonik Seeley-2 wrote:
> 
> Sorry Andrew, this is something that's bitten people before.
> search for maxFieldLength and you will see *2* of them in your config
> - one for indexDefaults and one for mainIndex.
> The one in mainIndex is set at 10000 and hence overrides the one in
> indexDefaults.
> 

Sorry -- schoolboy error. Glad I'm not the only one though. Yes, that seems
to have fixed it...

Cheers,

Andrew.

-- 
View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26061360.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr ignoring maxFieldLength?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Sorry Andrew, this is something that's bitten people before.
search for maxFieldLength and you will see *2* of them in your config
- one for indexDefaults and one for mainIndex.
The one in mainIndex is set at 10000 and hence overrides the one in
indexDefaults.

-Yonik
http://www.lucidimagination.com



On Mon, Oct 26, 2009 at 10:30 AM, Andrew Clegg <an...@gmail.com> wrote:
>
>
> Yep, I just re-indexed it again to make double sure -- same problem
> unfortunately.
>
> My solrconfig.xml and schema.xml are attached.
>
> In case you want to see it in action on the same data I've got, I've tarred
> up my data and conf directories here:
>
> http://biotext.org.uk/static/solr-issue-example.tar.gz
>
> That should be enough to reproduce it with.
>
> Thanks!
>
> Andrew.
>
>
> Yonik Seeley-2 wrote:
>>
>> Yes, please show us your solrconfig.xml, and verify that you reindexed
>> the document after changing maxFieldLength and restarting solr.
>>
>> I'll also see if I can reproduce a problem with maxFieldLength being
>> ignored.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> On Mon, Oct 26, 2009 at 7:11 AM, Andrew Clegg <an...@gmail.com>
>> wrote:
>>>
>>> Morning,
>>>
>>> Last week I was having a problem with terms visible in my search results
>>> in
>>> large documents not causing query hits:
>>>
>>> http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351
>>>
>>> Erick suggested it might be related to maxFieldLength, so I set this to
>>> 2147483647 in my solrconfig.xml and reindexed over the weekend.
>>>
>>> Unfortunately I'm having the same problem now, even though Erick appears
>>> to
>>> be right! I've narrowed it down to a single document for testing
>>> purposes,
>>> and I can get it returned by querying for a term near the beginning, but
>>> terms near the end cause no hit, and I can even find the point part way
>>> through the document, after which, none of the remaining terms seem to
>>> cause
>>> a hit.
>>>
>>> The document is about 32000 terms long, most of which is in a single
>>> field
>>> called related_ids of about 31000 terms. My first thought was that the
>>> text
>>> was being chopped up into so many tokens that it was going over the
>>> maxFieldLength anyway, but 2147483647/32000=67109, and it seems very
>>> unlikely that 67109 tokens would be generated per term!
>>>
>>> I've tried undeploying and redeploying the whole web app from Tomcat in
>>> case
>>> the new maxFieldLength hadn't been read, but no difference. If I go to
>>>
>>> http://localhost:8080/solr/admin/file/?file=solrconfig.xml
>>>
>>> I can see
>>>
>>> <maxFieldLength>2147483647</maxFieldLength>
>>>
>>> as expected.
>>>
>>> Does anyone have any more ideas? This could potentially be a showstopper
>>> for
>>> us as we have quite a few long-ish documents to index. (32K words doesn't
>>> seem that long to me, but still...)
>>>
>>> I've tried it with today's nightly build (2009-10-26) and it makes no
>>> difference. If this sounds like a bug, I'll open a JIRA and attach tars
>>> of
>>> my config and data directories. Any thoughts?
>>>
>>> Thanks,
>>>
>>> Andrew.
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26057808.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
> http://www.nabble.com/file/p26060882/solrconfig.xml solrconfig.xml
> http://www.nabble.com/file/p26060882/schema.xml schema.xml
> --
> View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26060882.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Solr ignoring maxFieldLength?

Posted by Andrew Clegg <an...@gmail.com>.

Yep, I just re-indexed it again to make double sure -- same problem
unfortunately.

My solrconfig.xml and schema.xml are attached.

In case you want to see it in action on the same data I've got, I've tarred
up my data and conf directories here:

http://biotext.org.uk/static/solr-issue-example.tar.gz

That should be enough to reproduce it with.

Thanks!

Andrew.


Yonik Seeley-2 wrote:
> 
> Yes, please show us your solrconfig.xml, and verify that you reindexed
> the document after changing maxFieldLength and restarting solr.
> 
> I'll also see if I can reproduce a problem with maxFieldLength being
> ignored.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Mon, Oct 26, 2009 at 7:11 AM, Andrew Clegg <an...@gmail.com>
> wrote:
>>
>> Morning,
>>
>> Last week I was having a problem with terms visible in my search results
>> in
>> large documents not causing query hits:
>>
>> http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351
>>
>> Erick suggested it might be related to maxFieldLength, so I set this to
>> 2147483647 in my solrconfig.xml and reindexed over the weekend.
>>
>> Unfortunately I'm having the same problem now, even though Erick appears
>> to
>> be right! I've narrowed it down to a single document for testing
>> purposes,
>> and I can get it returned by querying for a term near the beginning, but
>> terms near the end cause no hit, and I can even find the point part way
>> through the document, after which, none of the remaining terms seem to
>> cause
>> a hit.
>>
>> The document is about 32000 terms long, most of which is in a single
>> field
>> called related_ids of about 31000 terms. My first thought was that the
>> text
>> was being chopped up into so many tokens that it was going over the
>> maxFieldLength anyway, but 2147483647/32000=67109, and it seems very
>> unlikely that 67109 tokens would be generated per term!
>>
>> I've tried undeploying and redeploying the whole web app from Tomcat in
>> case
>> the new maxFieldLength hadn't been read, but no difference. If I go to
>>
>> http://localhost:8080/solr/admin/file/?file=solrconfig.xml
>>
>> I can see
>>
>> <maxFieldLength>2147483647</maxFieldLength>
>>
>> as expected.
>>
>> Does anyone have any more ideas? This could potentially be a showstopper
>> for
>> us as we have quite a few long-ish documents to index. (32K words doesn't
>> seem that long to me, but still...)
>>
>> I've tried it with today's nightly build (2009-10-26) and it makes no
>> difference. If this sounds like a bug, I'll open a JIRA and attach tars
>> of
>> my config and data directories. Any thoughts?
>>
>> Thanks,
>>
>> Andrew.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26057808.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
http://www.nabble.com/file/p26060882/solrconfig.xml solrconfig.xml 
http://www.nabble.com/file/p26060882/schema.xml schema.xml 
-- 
View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26060882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr ignoring maxFieldLength?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Yes, please show us your solrconfig.xml, and verify that you reindexed
the document after changing maxFieldLength and restarting solr.

I'll also see if I can reproduce a problem with maxFieldLength being ignored.

-Yonik
http://www.lucidimagination.com



On Mon, Oct 26, 2009 at 7:11 AM, Andrew Clegg <an...@gmail.com> wrote:
>
> Morning,
>
> Last week I was having a problem with terms visible in my search results in
> large documents not causing query hits:
>
> http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351
>
> Erick suggested it might be related to maxFieldLength, so I set this to
> 2147483647 in my solrconfig.xml and reindexed over the weekend.
>
> Unfortunately I'm having the same problem now, even though Erick appears to
> be right! I've narrowed it down to a single document for testing purposes,
> and I can get it returned by querying for a term near the beginning, but
> terms near the end cause no hit, and I can even find the point part way
> through the document, after which, none of the remaining terms seem to cause
> a hit.
>
> The document is about 32000 terms long, most of which is in a single field
> called related_ids of about 31000 terms. My first thought was that the text
> was being chopped up into so many tokens that it was going over the
> maxFieldLength anyway, but 2147483647/32000=67109, and it seems very
> unlikely that 67109 tokens would be generated per term!
>
> I've tried undeploying and redeploying the whole web app from Tomcat in case
> the new maxFieldLength hadn't been read, but no difference. If I go to
>
> http://localhost:8080/solr/admin/file/?file=solrconfig.xml
>
> I can see
>
> <maxFieldLength>2147483647</maxFieldLength>
>
> as expected.
>
> Does anyone have any more ideas? This could potentially be a showstopper for
> us as we have quite a few long-ish documents to index. (32K words doesn't
> seem that long to me, but still...)
>
> I've tried it with today's nightly build (2009-10-26) and it makes no
> difference. If this sounds like a bug, I'll open a JIRA and attach tars of
> my config and data directories. Any thoughts?
>
> Thanks,
>
> Andrew.
>
> --
> View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26057808.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>