You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2015/10/07 20:53:26 UTC
[jira] [Commented] (TIKA-1765) Some doc and docx store multiple
authors as semi-colon delimited list
[ https://issues.apache.org/jira/browse/TIKA-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947361#comment-14947361 ]
Tim Allison commented on TIKA-1765:
-----------------------------------
Would anyone mind if I changed {{OfficeOpenXMLExtended.MANAGER}} to {{Property.externalTextBag}} from {{externalText}}?
The reason that I'd want to make manager multi-valued is that we can store multiple managers in MSOffice Word, Excel and PPT just as we can store multiple authors (semicolon delimited).
I tried to find any reference in ECMA to the standard for handling multiple authors, and all examples (that I found) show a single author. There's even less documentation for "manager".
> Some doc and docx store multiple authors as semi-colon delimited list
> ---------------------------------------------------------------------
>
> Key: TIKA-1765
> URL: https://issues.apache.org/jira/browse/TIKA-1765
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Trivial
>
> It looks like doc and docx are storing multiple authors in a single author field delimited by semi-colons. We should parse this value and add multiple authors where appropriate.
> Notes: when I tried to add an author with a semicolon in the name, the result was two authors...doesn't look like there is any escaping going on.
> We should check to see what's going on in the other MS formats and with other metadata items that are allowed to be multivalued in Dublin Core.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)