You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/10/16 09:54:20 UTC

[jira] [Commented] (TIKA-2122) Extract all email headers from Outlook .msg files into Metadata

    [ https://issues.apache.org/jira/browse/TIKA-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579692#comment-15579692 ] 

Nick Burch commented on TIKA-2122:
----------------------------------

I'm not sure if we want to be dumping these raw into the Tika metadata - maybe we could do with a prefix though? (Would probably want syncing up with RFC822 and MBox parsers though for consistency)

Also note that HMEF doesn't currently pull out all the possible properties from the MSG level (support for fixed-length properties is incomplete and in need of volunteer energy), so there may be more bits of metadata we could get from the MSG file "properly", which may negate some of the need for this. (Pending suitable POI work!)

> Extract all email headers from Outlook .msg files into Metadata
> ---------------------------------------------------------------
>
>                 Key: TIKA-2122
>                 URL: https://issues.apache.org/jira/browse/TIKA-2122
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.13
>            Reporter: Chris Knott
>            Priority: Minor
>             Fix For: 2.0, 1.14
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently most email headers are not added to the Metadata when extracting Outlook .msg files.
> http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
> The headers - {{msg.getHeaders()}} - are already being looped through as a way to estimate the date.
> All headers should be added to Metadata, using the name of the header with a prefix such as {{"raw-header:"}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)