You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2008/05/14 08:07:55 UTC

[jira] Commented: (LUCENE-1227) NGramTokenizer to handle more than 1024 chars

    [ https://issues.apache.org/jira/browse/LUCENE-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596637#action_12596637 ] 

Otis Gospodnetic commented on LUCENE-1227:
------------------------------------------

Thanks for the test and for addressing this!

Could you add some examples for NO_OPTIMIZE and QUERY_OPTIMIZE?  I can't tell from looking at the patch what those are about.  Also, note how existing variables use camelCaseLikeThis.  It would be good to stick to the same pattern (instead of bufflen, buffpos, etc.), as well as to the existing style (e.g. space between if and open paren, spaces around == and =, etc.)

I'll commit as soon as you make these changes, assuming you can make them.  Thank you.


> NGramTokenizer to handle more than 1024 chars
> ---------------------------------------------
>
>                 Key: LUCENE-1227
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1227
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>            Reporter: Hiroaki Kawai
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-1227.patch, NGramTokenizer.patch, NGramTokenizer.patch
>
>
> Current NGramTokenizer can't handle character stream that is longer than 1024. This is too short for non-whitespace-separated languages.
> I created a patch for this issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org