You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2012/11/26 11:58:58 UTC

[jira] [Resolved] (STANBOL-809) Parse ConentItem URI to the Tika content type detector

     [ https://issues.apache.org/jira/browse/STANBOL-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-809.
-----------------------------------------

    Resolution: Implemented
      Assignee: Rupert Westenthaler

implemented in trunk with http://svn.apache.org/viewvc?rev=1413551&view=rev
                
> Parse ConentItem URI to the Tika content type detector
> ------------------------------------------------------
>
>                 Key: STANBOL-809
>                 URL: https://issues.apache.org/jira/browse/STANBOL-809
>             Project: Stanbol
>          Issue Type: Bug
>          Components: Engine - Tika
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>
> The content type detection could be improved by using the URI of the processed content item as the Tika API allows to explicitly parse the file name (or URI) of an resource as input parameter to the content type detection. (see https://tika.apache.org/1.2/detection.html#Resource_Name_Based_Detection)
>     Metadata m = new Metadata();
>     m.add(Metadata.RESOURCE_NAME_KEY,
>         contentItem.getUri().getUnicodeString());
>     detector.detect(is, m)
> this would mean that the filename pattern based recognition would
> work when you manually set the contentItem URI in the request to the Stanbol enhancer e.g.
>      curl -X POST -H "Accept: text/turtle" -T test.docx \
>          http://dev.iks-project.eu:8080/enhancer/engine/tika?id=\
>          http://www.example.com/test.docx

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira