You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Lord <to...@aptivate.org> on 2008/07/22 17:38:21 UTC

maximum length of string that Solr can index

Hi, we've looked for info about this issue online and in the code and am
none the wiser - help would be much appreciated.

We are indexing the full text of journals using Solr. We currently pass
in the journal text, up to maybe 130 pages, and index it in one go.

We are seeing Solr stop indexing after ~30 pages or so. That is, when we
look at the indexed text field using Luke, we can see where it gives up
collecting information from the text.

What is the maximum size that we can index on? Is this a known issue or
standard behaviour, or is something else amiss? 

If this is standard behaviour, what is the approved way of avoiding this
issue? Should we index on a per-page basis rather than trying to do 130
pages as a single document?

thanks in advance,
Tom.

-- 
Tom Lord | (toml@aptivate.org)

Aptivate | http://www.aptivate.org | Phone: +44 1223 760887 
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales 
with company number 04980791.


Re: maximum length of string that Solr can index

Posted by Yonik Seeley <yo...@apache.org>.
Lucene has a maxFieldLength (the number of tokens to index for a given
field name).
It can be configured via solrconfig.xml:
<maxFieldLength>10000</maxFieldLength>

-Yonik

On Tue, Jul 22, 2008 at 11:38 AM, Tom Lord <to...@aptivate.org> wrote:
> Hi, we've looked for info about this issue online and in the code and am
> none the wiser - help would be much appreciated.
>
> We are indexing the full text of journals using Solr. We currently pass
> in the journal text, up to maybe 130 pages, and index it in one go.
>
> We are seeing Solr stop indexing after ~30 pages or so. That is, when we
> look at the indexed text field using Luke, we can see where it gives up
> collecting information from the text.
>
> What is the maximum size that we can index on? Is this a known issue or
> standard behaviour, or is something else amiss?
>
> If this is standard behaviour, what is the approved way of avoiding this
> issue? Should we index on a per-page basis rather than trying to do 130
> pages as a single document?
>
> thanks in advance,
> Tom.
>
> --
> Tom Lord | (toml@aptivate.org)
>
> Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
> The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES
>
> Aptivate is a not-for-profit company registered in England and Wales
> with company number 04980791.
>
>