You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Fabian Christ (JIRA)" <ji...@apache.org> on 2012/12/12 20:40:20 UTC

[jira] [Updated] (STANBOL-813) codification problem with the removeNonUtf8CompliantCharacters method used by the opennlp-ner engine

     [ https://issues.apache.org/jira/browse/STANBOL-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabian Christ updated STANBOL-813:
----------------------------------

    Component/s: Engine - OpenNLP NER
    
> codification problem with the removeNonUtf8CompliantCharacters method used by the opennlp-ner engine
> ----------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-813
>                 URL: https://issues.apache.org/jira/browse/STANBOL-813
>             Project: Stanbol
>          Issue Type: Bug
>          Components: Engine - OpenNLP NER
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>
> As reported by Jairo Sarabia the OpenNLP based NER engine removes some codifications.
> e.g. for the request
> curl -v -X POST -H "Accept: text/plain" -H "Content-type: text/html; charset=utf-8" -H "Accept-language:es-es;en" --data "<html><body><p>The Stanbol enhancer puede detectar personas famosas como Mariano Rajoy y ciudades como París.</p></body></html>" "http://ec2-50-16-118-169.compute-1.amazonaws.com:8080/enhancer/chain/notedlinks"
> The character 'í' for 'París' is replace with an ' ' (space) what causes enhancement for 'Par'
> [..]
> "enhancer:selected-text": {
>         "@language": "es",
>         "@literal": "Par"
>       },
>       "enhancer:selection-context": {
>         "@language": "es",
>         "@literal": "The Stanbol enhancer puede detectar personas famosas como Mariano Rajoy y ciudades como Par  s"
>       },
> [..]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira