You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@stanbol.apache.org by "Olivier Grisel (Updated) (JIRA)" <ji...@apache.org> on 2011/09/27 15:39:11 UTC

[jira] [Updated] (STANBOL-331) The default SolrYard configuration should have support for i18n analyzers (stemming and accents removal)

     [ https://issues.apache.org/jira/browse/STANBOL-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olivier Grisel updated STANBOL-331:
-----------------------------------

    Description: 
For instance some French newspapers use a spelling of foreign names with accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.

I think Solr and Lucene provide a variety of analyzers to deal with such language specific variability of the tokens. However they are currently not enabled in the default configuration of the SolrYard hence the recall is can be very bad for enhancers able to deal with i18n input.

  was:
For instance some news papers use a french spelling of foreign names with accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accentued version in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.

I think Solr and Lucene provide a variety of analyzers to deal with such language specific variability of the tokens. However they are currently not enabled in the default configuration of the SolrYard hence the recall is can be very bad for enhancers able to deal with i18n input.

    
> The default SolrYard configuration should have support for i18n analyzers (stemming and accents removal)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-331
>                 URL: https://issues.apache.org/jira/browse/STANBOL-331
>             Project: Stanbol
>          Issue Type: Bug
>            Reporter: Olivier Grisel
>
> For instance some French newspapers use a spelling of foreign names with accents (e.g. Benyamin Nétanyahou) while DBpedia uses a non accented variant in the rdfs:label field (with the @fr literal), e.g. Benyamin Netanyahou.
> I think Solr and Lucene provide a variety of analyzers to deal with such language specific variability of the tokens. However they are currently not enabled in the default configuration of the SolrYard hence the recall is can be very bad for enhancers able to deal with i18n input.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira