You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/02/26 14:18:18 UTC
[jira] [Comment Edited] (TIKA-1865) Save sender email address in
Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168945#comment-15168945 ]
Tim Allison edited comment on TIKA-1865 at 2/26/16 1:17 PM:
------------------------------------------------------------
With the handful of MSG files in our "test-documents", I get this:
{noformat}
test-outlook2003.msg
emailFromChunk:olteam@microsoft.com
header_from:null
testMSG.msg
emailFromChunk:jukka.zitting@gmail.com
header_from:From: Jukka Zitting <ju...@gmail.com>
testMSG_att_doc.msg
emailFromChunk:nicolas1.23456@free.fr
header_from:null
testMSG_att_msg.msg
emailFromChunk:/O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH
header_from:From: Nick Booth <ni...@pof.com.au>
testMSG_chinese.msg
emailFromChunk:/O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG
header_from:null
testMSG_forwarded.msg
emailFromChunk:/O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE
header_from:From: Paul Allan Hill <pa...@metajure.com>
{noformat}
Perhaps a strategy of try emailFromChunk and then back off to a regex on the header {{From}} if that's there? That would get a "regular" email address from the above except for {{testMSG_chinese.msg}}. Or, is the exchange info useful to you if that's all we can get, as well?
was (Author: tallison@mitre.org):
With the handful of MSG files in our "test-documents", I get this:
{noformat}
test-outlook2003.msg : olteam@microsoft.com
testMSG.msg : jukka.zitting@gmail.com
testMSG_att_doc.msg : nicolas1.23456@free.fr
testMSG_att_msg.msg : /O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH
testMSG_chinese.msg : /O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG
testMSG_forwarded.msg : /O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE
{noformat}
> Save sender email address in Outlook MSG metadata
> -------------------------------------------------
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
> Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. Currently only sender name is extracted. That is an important information to be extracted for search engines.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)