You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2015/07/17 16:28:04 UTC

[jira] [Created] (LUCENE-6682) StandardTokenizer performance bug: buffer is unnecessarily copied even when maxTokenLength doesn't change

Steve Rowe created LUCENE-6682:
----------------------------------

             Summary: StandardTokenizer performance bug: buffer is unnecessarily copied even when maxTokenLength doesn't change
                 Key: LUCENE-6682
                 URL: https://issues.apache.org/jira/browse/LUCENE-6682
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Steve Rowe


>From Piotr Idzikowski on java-user mailing list [http://markmail.org/message/af26kr7fermt2tfh]:

{quote}
I am developing own analyzer based on StandardAnalyzer.
I realized that tokenizer.setMaxTokenLength is called many times.

{code:java}
protected TokenStreamComponents createComponents(final String fieldName,
final Reader reader) {
    final StandardTokenizer src = new StandardTokenizer(getVersion(),
reader);
    src.setMaxTokenLength(maxTokenLength);
    TokenStream tok = new StandardFilter(getVersion(), src);
    tok = new LowerCaseFilter(getVersion(), tok);
    tok = new StopFilter(getVersion(), tok, stopwords);
    return new TokenStreamComponents(src, tok) {
      @Override
      protected void setReader(final Reader reader) throws IOException {
        src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);
        super.setReader(reader);
      }
    };
  }
{code}

Does it make sense if length stays the same? I see it finally calls this
one( in StandardTokenizerImpl ):

{code:java}
public final void setBufferSize(int numChars) {
     ZZ_BUFFERSIZE = numChars;
     char[] newZzBuffer = new char[ZZ_BUFFERSIZE];
     System.arraycopy(zzBuffer, 0, newZzBuffer, 0,
Math.min(zzBuffer.length, ZZ_BUFFERSIZE));
     zzBuffer = newZzBuffer;
   }
{code}

So it just copies old array content into the new one.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org