You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Paul taylor (JIRA)" <ji...@apache.org> on 2009/08/21 18:48:14 UTC

[jira] Updated: (LUCENE-1787) Standard Tokenizer doesn't recognise I.B.M as Acronym, it requires it ends with a dot i.e I.B.M.

     [ https://issues.apache.org/jira/browse/LUCENE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul taylor updated LUCENE-1787:
--------------------------------

    Attachment: Patch1.txt

Fix so that Acronymns without trailing dot are parsed as acronym, amended related Acronymn test in Analyser.

(Sources were flexed and compiled using ant build, assume this uses correct Java version for flex file generation)

> Standard Tokenizer doesn't recognise I.B.M as Acronym, it requires it ends with a dot i.e I.B.M.
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1787
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1787
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Paul taylor
>         Attachments: Patch1.txt
>
>
> Standard Tokenzizer doesn't recognise I.B.M it requires it end with a dot i.e I.B.M. This is particulary problematic if I.B.M is added tot the index, with the StandardAnalyser it will get added as  IBM , a search for I.B.M will not match because I.B.M will be left as is, I would expect a match in this scenario
> I think it could be fixed by modifying the  grammar ACRONYM_DEP  in StandardTokenizerImpl.jflex so that it also supports
> {ALPHANUM} ("." {ALPHANUM})+
> dot only required between each character, (I'm not familiar with jflex syntax )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org