You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Chris Male (Created) (JIRA)" <ji...@apache.org> on 2012/04/12 11:13:28 UTC

[jira] [Created] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Improve error messages for unsupported Hunspell formats
-------------------------------------------------------

                 Key: LUCENE-3976
                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Chris Male


Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.

Recently we ran into the following suffix rule:

{noformat}SFX CA 0 /CaCp{noformat}

Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.

We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287286#comment-13287286 ] 

Chris Male commented on LUCENE-3976:
------------------------------------

Hi Luca,

I think I'm going to close this and instead we can tackle this on a per-error basis.
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Luca Cavanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287260#comment-13287260 ] 

Luca Cavanna commented on LUCENE-3976:
--------------------------------------

Hi Chris, 
I agree with you. On the other hand with the affix rule mentioned, before LUCENE-4019 we had an AOE, so the additional catch would have been useful just to throw a nicer error message like "Error while parsing the affix file". That one has been solved at its source, for now I don't see any other possible errors but I'm sure there are some, maybe plenty since we support only a subset of the formats and features.
It was just a way to introduce a generic error message but I totally agree that the right apporach would be fixing everything at the source.
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Luca Cavanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luca Cavanna updated LUCENE-3976:
---------------------------------

    Attachment: LUCENE-3976.patch

First draft patch: I added a check for that specific problem with an understandable error. In fact, since we are going to read the first 5 elements from an array, better to check if there are at least 5 elements.
Not sure how we can improve generic errors handling. Let me know your thoughts.
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Luca Cavanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luca Cavanna updated LUCENE-3976:
---------------------------------

    Attachment: LUCENE-3976.patch

The patch tries to address unexpected errors while parsing affix files and dictionaries. I just added an external try catch with a generic "Error while parsing the affix/dictionary file", in my opinion better than just eventually throwing some unchecked exception. Let me know if there's something else we can improve meanwhile.
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Luca Cavanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269520#comment-13269520 ] 

Luca Cavanna edited comment on LUCENE-3976 at 5/7/12 11:03 AM:
---------------------------------------------------------------

The specific case of affix rule with less than 5 elements has been addressed in LUCENE-4019. Please ignore my first patch here since it's related to that specific case which is now handled in a different way in LUCENE-4019.
I'm looking into improving error messages anyway, possibly in a more generic way.
                
      was (Author: lucacavanna):
    The specific case of affix rule with less than 5 elements has been addressed in LUCENE-4019. Please ignore my first patch here since it's related to that specific case which is now handled in a different way in LUCENE-4019.
I'm looking into improve error messages anyway, in a more generic way if possible.
                  
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Luca Cavanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269520#comment-13269520 ] 

Luca Cavanna edited comment on LUCENE-3976 at 5/7/12 11:04 AM:
---------------------------------------------------------------

The specific case of affix rule with less than 5 elements has been addressed in LUCENE-4019. Please ignore my first patch here since it's related to that specific case which is now handled in a different way in LUCENE-4019.
I'm looking into improving error messages anyway, possibly in a generic way.
                
      was (Author: lucacavanna):
    The specific case of affix rule with less than 5 elements has been addressed in LUCENE-4019. Please ignore my first patch here since it's related to that specific case which is now handled in a different way in LUCENE-4019.
I'm looking into improving error messages anyway, possibly in a more generic way.
                  
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Luca Cavanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269520#comment-13269520 ] 

Luca Cavanna commented on LUCENE-3976:
--------------------------------------

The specific case of affix rule with less than 5 elements has been addressed in LUCENE-4019. Please ignore my first patch here since it's related to that specific case which is now handled in a different way in LUCENE-4019.
I'm looking into improve error messages anyway, in a more generic way if possible.
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Luca Cavanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260490#comment-13260490 ] 

Luca Cavanna commented on LUCENE-3976:
--------------------------------------

We found out that some recent dutch dictionaries contain rule like the one mentioned (Starting from version 2.00 if I'm correct). I'm going to look at that specific problem and see how we can parse those affix rules.
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Male resolved LUCENE-3976.
--------------------------------

    Resolution: Won't Fix
      Assignee: Chris Male

We will tackle error messages on a per-error basis, thanks for your help none the less Luca.
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>            Assignee: Chris Male
>         Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Chris Male (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287255#comment-13287255 ] 

Chris Male commented on LUCENE-3976:
------------------------------------

Hi Luca,

I'm unsure about this approach.  What other kind of Exceptions can be thrown other than IOExceptions? I think we should explore what those possible errors are and fix them at their source, to provide targeted Exceptions.  If there is a problem parsing, then we should thrown a ParseException with the line number causing the problem.
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3976) Improve error messages for unsupported Hunspell formats

Posted by "Luca Cavanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287290#comment-13287290 ] 

Luca Cavanna commented on LUCENE-3976:
--------------------------------------

Ok, that's fine!
                
> Improve error messages for unsupported Hunspell formats
> -------------------------------------------------------
>
>                 Key: LUCENE-3976
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3976
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Chris Male
>         Attachments: LUCENE-3976.patch, LUCENE-3976.patch
>
>
> Our hunspell implementation is never going to be able to support the huge variety of formats that are out there, especially since our impl is based on papers written on the topic rather than being a pure port.
> Recently we ran into the following suffix rule:
> {noformat}SFX CA 0 /CaCp{noformat}
> Due to the missing regex conditional, an AOE was being thrown, which made it difficult to diagnose the problem.
> We should instead try to provide better error messages showing what we were unable to parse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org