You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Luca Cavanna (JIRA)" <ji...@apache.org> on 2012/05/07 12:57:48 UTC

[jira] [Commented] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition

    [ https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269516#comment-13269516 ] 

Luca Cavanna commented on LUCENE-4019:
--------------------------------------

Thank you Robert for the explanation!
In this specific case it's hard to understand the differences between hunspell and Lucene, since Lucene doesn't even parse the affix file.
I've been in contact with the authors of those Ducth dictionaries, as well as with the hunspell author. It turned out that those affix rules are wrong and hunspell actually ignores them. I think it's better to ignore them in Lucene too, rather than throwing an exception, which makes impossible to use those dictionaries at all.
                
> Parsing Hunspell affix rules without regexp condition
> -----------------------------------------------------
>
>                 Key: LUCENE-4019
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4019
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.6
>            Reporter: Luca Cavanna
>
> We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules like the following:
> {code} 
> SFX Na N 1
> SFX Na 0 ste
> {code}
> The rule on the second line doesn't contain the 5th parameter, which should be the condition (a regexp usually). You can usually see a '.' as condition, meaning always (for every character). As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat the missing value as a kind of default value, like '.'.  On the other hand I haven't found any information about this within the spec. Any thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org