You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2012/11/23 06:04:58 UTC
[jira] [Created] (STANBOL-813) codification problem with the
removeNonUtf8CompliantCharacters method used by the opennlp-ner engine
Rupert Westenthaler created STANBOL-813:
-------------------------------------------
Summary: codification problem with the removeNonUtf8CompliantCharacters method used by the opennlp-ner engine
Key: STANBOL-813
URL: https://issues.apache.org/jira/browse/STANBOL-813
Project: Stanbol
Issue Type: Bug
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Priority: Minor
As reported by Jairo Sarabia the OpenNLP based NER engine removes some codifications.
e.g. for the request
curl -v -X POST -H "Accept: text/plain" -H "Content-type: text/html; charset=utf-8" -H "Accept-language:es-es;en" --data "<html><body><p>The Stanbol enhancer puede detectar personas famosas como Mariano Rajoy y ciudades como París.</p></body></html>" "http://ec2-50-16-118-169.compute-1.amazonaws.com:8080/enhancer/chain/notedlinks"
The character 'í' for 'París' is replace with an ' ' (space) what causes enhancement for 'Par'
[..]
"enhancer:selected-text": {
"@language": "es",
"@literal": "Par"
},
"enhancer:selection-context": {
"@language": "es",
"@literal": "The Stanbol enhancer puede detectar personas famosas como Mariano Rajoy y ciudades como Par s"
},
[..]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (STANBOL-813) codification problem with the
removeNonUtf8CompliantCharacters method used by the opennlp-ner engine
Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/STANBOL-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rupert Westenthaler resolved STANBOL-813.
-----------------------------------------
Resolution: Fixed
fixed with http://svn.apache.org/viewvc?rev=1412756&view=rev
> codification problem with the removeNonUtf8CompliantCharacters method used by the opennlp-ner engine
> ----------------------------------------------------------------------------------------------------
>
> Key: STANBOL-813
> URL: https://issues.apache.org/jira/browse/STANBOL-813
> Project: Stanbol
> Issue Type: Bug
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
> Priority: Minor
>
> As reported by Jairo Sarabia the OpenNLP based NER engine removes some codifications.
> e.g. for the request
> curl -v -X POST -H "Accept: text/plain" -H "Content-type: text/html; charset=utf-8" -H "Accept-language:es-es;en" --data "<html><body><p>The Stanbol enhancer puede detectar personas famosas como Mariano Rajoy y ciudades como París.</p></body></html>" "http://ec2-50-16-118-169.compute-1.amazonaws.com:8080/enhancer/chain/notedlinks"
> The character 'í' for 'París' is replace with an ' ' (space) what causes enhancement for 'Par'
> [..]
> "enhancer:selected-text": {
> "@language": "es",
> "@literal": "Par"
> },
> "enhancer:selection-context": {
> "@language": "es",
> "@literal": "The Stanbol enhancer puede detectar personas famosas como Mariano Rajoy y ciudades como Par s"
> },
> [..]
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira