You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by roz dev <ro...@gmail.com> on 2011/08/28 03:45:09 UTC

Question about MaxFieldLength

Hi All

I have a question regarding MaxFieldLength. Is it a limit for number of
tokens in 1 field per document or entire index?

Example:

If MaxFieldLength is set to 100 and I add a document which has 105 tokens
for 1 field then I expect 5 tokens to be ignored.
but, if i add another document which has 95 tokens for same field then these
95 tokens should be added.

Please advise.

Thanks
Saroj

RE: Question about MaxFieldLength

Posted by Uwe Schindler <uw...@thetaphi.de>.
In Lucene 3.x, this is already deprecated, you should not limit tokens using
MaxFieldLength. There is already an TokenFilter and wrapping Analyzer
available, that does limiting and you can add it to your analyzer and do it
per-field with PerFieldAnalyzerWrapper:

http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/analysis/Limit
TokenCountAnalyzer.html

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Li Li [mailto:fancyerii@gmail.com]
> Sent: Sunday, August 28, 2011 4:28 AM
> To: java-user@lucene.apache.org
> Subject: Re: Question about MaxFieldLength
> 
>    It will affect the entire index because it 's a parameter of
IndexWriter.
> but you can modify it anytime you like before IndexWriter.addDocument.
> If you want to truncate different fields with different maxLength. you
should
> avoid multithreads' race condition.
>    maybe you can add a TokenFilter to the end of analyzer train
> 
>   AtomicInteger field1Counter;
>   public final boolean incrementToken() throws IOException {
>                if(field1Counter.get()>=maxLength) return false;
>                if (input.incrementToken()) {
>                    field1Counter.incrementAndGet();
>                    return true;
>                 }else{
>                    return false;
>                 }
>         }
>    }
> On Sun, Aug 28, 2011 at 9:45 AM, roz dev <ro...@gmail.com> wrote:
> 
> > Hi All
> >
> > I have a question regarding MaxFieldLength. Is it a limit for number
> > of tokens in 1 field per document or entire index?
> >
> > Example:
> >
> > If MaxFieldLength is set to 100 and I add a document which has 105
> > tokens for 1 field then I expect 5 tokens to be ignored.
> > but, if i add another document which has 95 tokens for same field then
> > these
> > 95 tokens should be added.
> >
> > Please advise.
> >
> > Thanks
> > Saroj
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question about MaxFieldLength

Posted by roz dev <ro...@gmail.com>.
Thanks Li. It makes sense

On Sat, Aug 27, 2011 at 7:28 PM, Li Li <fa...@gmail.com> wrote:

>   It will affect the entire index because it 's a parameter of IndexWriter.
> but you can modify it anytime you like before IndexWriter.addDocument.
> If you want to truncate different fields with different maxLength. you
> should  avoid multithreads' race condition.
>   maybe you can add a TokenFilter to the end of analyzer train
>
>  AtomicInteger field1Counter;
>  public final boolean incrementToken() throws IOException {
>               if(field1Counter.get()>=maxLength) return false;
>               if (input.incrementToken()) {
>                   field1Counter.incrementAndGet();
>                   return true;
>                }else{
>                   return false;
>                 }
>        }
>   }
> On Sun, Aug 28, 2011 at 9:45 AM, roz dev <ro...@gmail.com> wrote:
>
> > Hi All
> >
> > I have a question regarding MaxFieldLength. Is it a limit for number of
> > tokens in 1 field per document or entire index?
> >
> > Example:
> >
> > If MaxFieldLength is set to 100 and I add a document which has 105 tokens
> > for 1 field then I expect 5 tokens to be ignored.
> > but, if i add another document which has 95 tokens for same field then
> > these
> > 95 tokens should be added.
> >
> > Please advise.
> >
> > Thanks
> > Saroj
> >
>

Re: Question about MaxFieldLength

Posted by Li Li <fa...@gmail.com>.
   It will affect the entire index because it 's a parameter of IndexWriter.
but you can modify it anytime you like before IndexWriter.addDocument.
If you want to truncate different fields with different maxLength. you
should  avoid multithreads' race condition.
   maybe you can add a TokenFilter to the end of analyzer train

  AtomicInteger field1Counter;
  public final boolean incrementToken() throws IOException {
               if(field1Counter.get()>=maxLength) return false;
               if (input.incrementToken()) {
                   field1Counter.incrementAndGet();
                   return true;
                }else{
                   return false;
                }
        }
   }
On Sun, Aug 28, 2011 at 9:45 AM, roz dev <ro...@gmail.com> wrote:

> Hi All
>
> I have a question regarding MaxFieldLength. Is it a limit for number of
> tokens in 1 field per document or entire index?
>
> Example:
>
> If MaxFieldLength is set to 100 and I add a document which has 105 tokens
> for 1 field then I expect 5 tokens to be ignored.
> but, if i add another document which has 95 tokens for same field then
> these
> 95 tokens should be added.
>
> Please advise.
>
> Thanks
> Saroj
>