You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Fabian Christ (JIRA)" <ji...@apache.org> on 2012/12/12 20:40:20 UTC
[jira] [Updated] (STANBOL-813) codification problem with the
removeNonUtf8CompliantCharacters method used by the opennlp-ner engine
[ https://issues.apache.org/jira/browse/STANBOL-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fabian Christ updated STANBOL-813:
----------------------------------
Component/s: Engine - OpenNLP NER
> codification problem with the removeNonUtf8CompliantCharacters method used by the opennlp-ner engine
> ----------------------------------------------------------------------------------------------------
>
> Key: STANBOL-813
> URL: https://issues.apache.org/jira/browse/STANBOL-813
> Project: Stanbol
> Issue Type: Bug
> Components: Engine - OpenNLP NER
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
> Priority: Minor
>
> As reported by Jairo Sarabia the OpenNLP based NER engine removes some codifications.
> e.g. for the request
> curl -v -X POST -H "Accept: text/plain" -H "Content-type: text/html; charset=utf-8" -H "Accept-language:es-es;en" --data "<html><body><p>The Stanbol enhancer puede detectar personas famosas como Mariano Rajoy y ciudades como París.</p></body></html>" "http://ec2-50-16-118-169.compute-1.amazonaws.com:8080/enhancer/chain/notedlinks"
> The character 'í' for 'París' is replace with an ' ' (space) what causes enhancement for 'Par'
> [..]
> "enhancer:selected-text": {
> "@language": "es",
> "@literal": "Par"
> },
> "enhancer:selection-context": {
> "@language": "es",
> "@literal": "The Stanbol enhancer puede detectar personas famosas como Mariano Rajoy y ciudades como Par s"
> },
> [..]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira