You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Flavio Eduardo de Cordova <fl...@datasul.com.br> on 2003/07/08 01:26:16 UTC
Problems with StandardTokenizer
People...
I've created a custom analyser that uses the StandardTokenizer class
to get the tokens from the reader.
It seemed to work fine but I just noticed that some large documents
are not having all their content properly indexed, but just [the starting]
part of them.
After some debuging I've found out that StandardTokenizer reads up
to 10001 tokens from the reader.
Have anybody went through something like that before ? What should I
do as a workaround ?
Thanks !
Flavio
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Problems with StandardTokenizer
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Please check the Lucene's jGuru FAQ, your question is answered there.
Otis
--- Flavio Eduardo de Cordova <fl...@datasul.com.br> wrote:
> People...
>
> I've created a custom analyser that uses the StandardTokenizer class
> to get the tokens from the reader.
> It seemed to work fine but I just noticed that some large documents
> are not having all their content properly indexed, but just [the
> starting]
> part of them.
> After some debuging I've found out that StandardTokenizer reads up
> to 10001 tokens from the reader.
>
> Have anybody went through something like that before ? What should I
> do as a workaround ?
>
> Thanks !
>
> Flavio
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org