You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2013/11/01 14:30:37 UTC

dangers of limiting tokenizers/disabling assertions in MockTokenizer?

All,
  I realize that we should be consuming all tokens from a stream.  I'd like to wrap a client's Analyzer with LimitTokenCountAnalyzer with consume=false. For the analyzers that I've used, this has caused no problems.  When I use MockTokenizer, I run into this assertion error: "end() called before incrementToken()".  The comment in MockTokenizer reads:

    // some tokenizers, such as limiting tokenizers, call end() before incrementToken() returns false.
    // these tests should disable this check (in general you should consume the entire stream)

 Disabling assertions gives me pause as does disobeying the workflow (http://lucene.apache.org/core/4_5_1/core/index.html).  I assume from the warnings that there are Analyzers and use cases that will fail unless the stream is entirely consumed.

  Is there a safe way to wrap a client Analyzer and only read x number of tokens?  Should I allow the client to decide whether or not to consume?

  Thank you!

             Best,

                  Tim


Re: dangers of limiting tokenizers/disabling assertions in MockTokenizer?

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Nov 1, 2013 at 9:30 AM, Allison, Timothy B. <ta...@mitre.org> wrote:
>
>  Disabling assertions gives me pause as does disobeying the workflow
> (http://lucene.apache.org/core/4_5_1/core/index.html).  I assume from the
> warnings that there are Analyzers and use cases that will fail unless the
> stream is entirely consumed.

The option has to be there, if this check was disabled by default,
then it would make too much leniency overall and lots of other useful
checks wouldnt work either.

Users also already have an option to the limiter 'consumeAllTokens' if
their analyzer has bugs here.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org