You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Rob Staveley (Tom)" <rs...@seseit.com> on 2009/12/10 09:15:16 UTC

IndexWriter.MaxFieldLength.UNLIMITED at what price?

I was wondering where I might read about the cost of using
IndexWriter.MaxFieldLength.UNLIMITED versus
IndexWriter.MaxFieldLength.LIMITED.

 

Are thee any consequences over and above the obvious one that you are going
to analyse more content in your IndexWriter when you have more than 10,000
characters in a StringBuffer?


RE: IndexWriter.MaxFieldLength.UNLIMITED at what price?

Posted by "Rob Staveley (Tom)" <rs...@seseit.com>.
Many thanks, Mike. UNLIMITED is right for me then. Happily, it is a
reasonably controlled environment.

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: 10 December 2009 10:03
To: java-user@lucene.apache.org
Subject: Re: IndexWriter.MaxFieldLength.UNLIMITED at what price?

LIMITED is basically an insurance policy, protecting you from
accidentally indexing an immense document, leading to OOME.

It also protects you in case your analyzer is accidentally letting in
bogus terms (say, if you indexed a large exe file, or there was a
large base64-encoded attachment on an email message that you didn't
decode).

But, LIMITED is very bad for the user experience.  Users will
eventually catch on that your search engine is "buggy", and lose
trust.

I'd recommend always using UNLIMITED unless you're in a domain where
there are risks of getting massive docs.  And even then I'd first try
to create other mechanisms to try to not index such documents...

Mike

On Thu, Dec 10, 2009 at 3:15 AM, Rob Staveley (Tom)
<rs...@seseit.com> wrote:
> I was wondering where I might read about the cost of using
> IndexWriter.MaxFieldLength.UNLIMITED versus
> IndexWriter.MaxFieldLength.LIMITED.
>
>
>
> Are thee any consequences over and above the obvious one that you are
going
> to analyse more content in your IndexWriter when you have more than 10,000
> characters in a StringBuffer?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexWriter.MaxFieldLength.UNLIMITED at what price?

Posted by Michael McCandless <lu...@mikemccandless.com>.
LIMITED is basically an insurance policy, protecting you from
accidentally indexing an immense document, leading to OOME.

It also protects you in case your analyzer is accidentally letting in
bogus terms (say, if you indexed a large exe file, or there was a
large base64-encoded attachment on an email message that you didn't
decode).

But, LIMITED is very bad for the user experience.  Users will
eventually catch on that your search engine is "buggy", and lose
trust.

I'd recommend always using UNLIMITED unless you're in a domain where
there are risks of getting massive docs.  And even then I'd first try
to create other mechanisms to try to not index such documents...

Mike

On Thu, Dec 10, 2009 at 3:15 AM, Rob Staveley (Tom)
<rs...@seseit.com> wrote:
> I was wondering where I might read about the cost of using
> IndexWriter.MaxFieldLength.UNLIMITED versus
> IndexWriter.MaxFieldLength.LIMITED.
>
>
>
> Are thee any consequences over and above the obvious one that you are going
> to analyse more content in your IndexWriter when you have more than 10,000
> characters in a StringBuffer?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org