You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steven Rowe (JIRA)" <ji...@apache.org> on 2009/06/19 17:54:07 UTC

[jira] Commented: (LUCENE-1702) Thai token type() bug

    [ https://issues.apache.org/jira/browse/LUCENE-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721833#action_12721833 ] 

Steven Rowe commented on LUCENE-1702:
-------------------------------------

+1 (I was involved in perpetuating the Thai grammar hack)

FWIW, JFlex 1.5, which hopefully will be released in the next few months, will have better Unicode support, including general category, script, and block property support, as well as the ability to select the Unicode version.  This will simplify the grammar.  (Note that JFlex 1.5-generated scanners will require Java 1.5, so we won't be using it in Lucene until after Lucene 3.0 has been released.)


> Thai token type() bug
> ---------------------
>
>                 Key: LUCENE-1702
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1702
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Priority: Minor
>
> While adding tests for offsets & type to ThaiAnalyzer, i discovered it does not type Thai numeric digits correctly.
> ThaiAnalyzer uses StandardTokenizer, and this is really an issue with the grammar, which adds the entire [:Thai:] block to ALPHANUM.
> i propose that alphanum be described a little bit differently in the grammar.
> Instead, [:letter:] should be allowed to have diacritics/signs/combining marks attached to it.
> this would allow the [:thai:] hack to be completely removed, would allow StandardTokenizer to parse complex writing systems such as Indian languages, and would fix LUCENE-1545.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org