You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/02/26 14:18:18 UTC

[jira] [Comment Edited] (TIKA-1865) Save sender email address in Outlook MSG metadata

    [ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168945#comment-15168945 ] 

Tim Allison edited comment on TIKA-1865 at 2/26/16 1:17 PM:
------------------------------------------------------------

With the handful of MSG files in our "test-documents", I get this:

{noformat}
test-outlook2003.msg
emailFromChunk:olteam@microsoft.com
header_from:null

testMSG.msg
emailFromChunk:jukka.zitting@gmail.com
header_from:From: Jukka Zitting <ju...@gmail.com>

testMSG_att_doc.msg
emailFromChunk:nicolas1.23456@free.fr
header_from:null

testMSG_att_msg.msg
emailFromChunk:/O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH
header_from:From: Nick Booth <ni...@pof.com.au>

testMSG_chinese.msg
emailFromChunk:/O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG
header_from:null

testMSG_forwarded.msg
emailFromChunk:/O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE
header_from:From: Paul Allan Hill <pa...@metajure.com>
{noformat}

Perhaps a strategy of try emailFromChunk and then back off to a regex on the header {{From}} if that's there?  That would get a "regular" email address from the above except for {{testMSG_chinese.msg}}.  Or, is the exchange info useful  to you if that's all we can get, as well?





was (Author: tallison@mitre.org):
With the handful of MSG files in our "test-documents", I get this:

{noformat}
test-outlook2003.msg : olteam@microsoft.com
testMSG.msg : jukka.zitting@gmail.com
testMSG_att_doc.msg : nicolas1.23456@free.fr
testMSG_att_msg.msg : /O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH
testMSG_chinese.msg : /O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG
testMSG_forwarded.msg : /O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE
{noformat}




> Save sender email address in Outlook MSG metadata
> -------------------------------------------------
>
>                 Key: TIKA-1865
>                 URL: https://issues.apache.org/jira/browse/TIKA-1865
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.11
>         Environment: Windows 7 x64, jre 1.8.0_60 x64
>            Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. Currently only sender name is extracted. That is an important information to be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)