You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by d rj <dr...@gmail.com> on 2006/10/19 00:40:52 UTC

question regarding usage of IndexWriter.setMaxFieldLength()

Hello-

I was wondering about the usage of IndexWriter.setMaxFieldLength()
it is limited, by default, to 10k terms per field.  Can anyone tell me if
this is this a "per field" limit or a "per uniquely named field" limit?
I.e. in the following snippet I add many words to different Fields all w/
the same name.  Will all words be indexed w/ no problem allowing me to
conduct a search across the "text" field for any word occurring in any these
long strings?

string longString1 = <~9k words in string>;
string longString2 = <~9k words in string>;
string longString3 = <~9k words in string>;

Document doc = new Document();
doc.add(new Field("text", longString1, Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.add(new Field("text", longString2, Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.add(new Field("text", longString3, Field.Store.YES,
Field.Index.UN_TOKENIZED));


thanks.
-david

Re: question regarding usage of IndexWriter.setMaxFieldLength()

Posted by Erick Erickson <er...@gmail.com>.
I had a similar question a while ago and the answer is "you can't cheat".
According to what the guys said, this

doc.add("field", <a 10,000 word string>)
doc.add("field", <a 10,000 word string>)
doc.add("field", <a 10,000 word string>)

is just the same as this

doc.add("field", <a 30,000 word string>)

But go ahead and increase the maxfieldlength. I'm successfully indexing
(unstored) a 7,500 page book with all the text as a single field. I think I
set the maxfieldlength at something like 10,000,000.

Had to bump the max memory in the JVM to do it, but it worked.

Erick


On 10/18/06, d rj <dr...@gmail.com> wrote:
>
> Hello-
>
> I was wondering about the usage of IndexWriter.setMaxFieldLength()
> it is limited, by default, to 10k terms per field.  Can anyone tell me if
> this is this a "per field" limit or a "per uniquely named field" limit?
> I.e. in the following snippet I add many words to different Fields all w/
> the same name.  Will all words be indexed w/ no problem allowing me to
> conduct a search across the "text" field for any word occurring in any
> these
> long strings?
>
> string longString1 = <~9k words in string>;
> string longString2 = <~9k words in string>;
> string longString3 = <~9k words in string>;
>
> Document doc = new Document();
> doc.add(new Field("text", longString1, Field.Store.YES,
> Field.Index.UN_TOKENIZED));
> doc.add(new Field("text", longString2, Field.Store.YES,
> Field.Index.UN_TOKENIZED));
> doc.add(new Field("text", longString3, Field.Store.YES,
> Field.Index.UN_TOKENIZED));
>
>
> thanks.
> -david
>
>