You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "David Pilato (JIRA)" <ji...@apache.org> on 2016/12/22 20:35:58 UTC

[jira] [Commented] (TIKA-2227) Replacement of MSOffice#KEYWORDS for RTF and ODT docs

    [ https://issues.apache.org/jira/browse/TIKA-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771026#comment-15771026 ] 

David Pilato commented on TIKA-2227:
------------------------------------

Sorry. Answer is {{TikaCoreProperties.KEYWORDS}}.

Don't know I missed it... :) 

> Replacement of MSOffice#KEYWORDS for RTF and ODT docs
> -----------------------------------------------------
>
>                 Key: TIKA-2227
>                 URL: https://issues.apache.org/jira/browse/TIKA-2227
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.14
>            Reporter: David Pilato
>            Priority: Minor
>
> I'm trying to extract metadata from different type of documents.
> I'm using for that {{metadata.get(MSOffice.KEYWORDS)}} but it's marked as {{Deprecated}} by {{Office}} class.
> So I changed my code to use now {{metadata.get(Office.KEYWORDS)}} instead.
> It does not work for 2 types of docs: 
> * RTF: https://github.com/dadoonet/fscrawler/blob/master/src/test/resources/documents/test.rtf
> * ODT: https://github.com/dadoonet/fscrawler/blob/master/src/test/resources/documents/test.odt
> It seems that RTF and ODT keywords are extracted to a {{"Keyword"}} metadata name although they should probably be generated to {{"meta:keyword"}}.
> You can reuse if needed the documents I linked to here as test case if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)