You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "David Pilato (JIRA)" <ji...@apache.org> on 2016/12/22 20:35:58 UTC
[jira] [Commented] (TIKA-2227) Replacement of MSOffice#KEYWORDS for
RTF and ODT docs
[ https://issues.apache.org/jira/browse/TIKA-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771026#comment-15771026 ]
David Pilato commented on TIKA-2227:
------------------------------------
Sorry. Answer is {{TikaCoreProperties.KEYWORDS}}.
Don't know I missed it... :)
> Replacement of MSOffice#KEYWORDS for RTF and ODT docs
> -----------------------------------------------------
>
> Key: TIKA-2227
> URL: https://issues.apache.org/jira/browse/TIKA-2227
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.14
> Reporter: David Pilato
> Priority: Minor
>
> I'm trying to extract metadata from different type of documents.
> I'm using for that {{metadata.get(MSOffice.KEYWORDS)}} but it's marked as {{Deprecated}} by {{Office}} class.
> So I changed my code to use now {{metadata.get(Office.KEYWORDS)}} instead.
> It does not work for 2 types of docs:
> * RTF: https://github.com/dadoonet/fscrawler/blob/master/src/test/resources/documents/test.rtf
> * ODT: https://github.com/dadoonet/fscrawler/blob/master/src/test/resources/documents/test.odt
> It seems that RTF and ODT keywords are extracted to a {{"Keyword"}} metadata name although they should probably be generated to {{"meta:keyword"}}.
> You can reuse if needed the documents I linked to here as test case if needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)