You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2012/12/30 15:48:12 UTC

[jira] [Created] (STANBOL-869) Entities with unicode escaped chars ('\u????') in the URI are not indexed

Rupert Westenthaler created STANBOL-869:
-------------------------------------------

             Summary: Entities with unicode escaped chars ('\u????') in the URI are not indexed 
                 Key: STANBOL-869
                 URL: https://issues.apache.org/jira/browse/STANBOL-869
             Project: Stanbol
          Issue Type: Bug
          Components: Entityhub
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


When reading Entity URIs with a EntityIterator implementation of the Entityhub Indexing Tool unicode escaped chars are not converted to their UTF representation. Because of that Entities with such URIs might not be found by the EntityDataProvider implementation. 

 For the JenaTDB indexing source this is the case and because of that any DBpedia entity that does use an unicode escaped character in its URI is currently not indexed. 

The EntityDataIterable implementation is not affected by this. Therefore given the currently used default configuration this will mainly affect the dbpedia indexing tool configuration and not users that use the generic RDF indexing tool configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira