You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/07/26 14:57:00 UTC

[jira] [Commented] (TIKA-2694) "From" headers is not always extracted correctly on msg mails

    [ https://issues.apache.org/jira/browse/TIKA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558384#comment-16558384 ] 

Tim Allison commented on TIKA-2694:
-----------------------------------

I'm pretty sure this is the way that "addresses" can be stored in Outlook.  I've seen actual email addresses in .msg, but these Outlook exchange addresses are quite common, and very annoying if you're expecting actual email addresses.  If you can find that the actual email address is stored somewhere in the MAPIMessage object for this file, let us know.

> "From" headers is not always extracted correctly on msg mails
> -------------------------------------------------------------
>
>                 Key: TIKA-2694
>                 URL: https://issues.apache.org/jira/browse/TIKA-2694
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>         Environment: CentOS 7
> Windows 10
>            Reporter: Celpan Valeria
>            Priority: Major
>         Attachments: Fw Anime User Analysis.msg
>
>
> For some emails we get instead of the email address for "From" field a value which looks like `/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER`.
>  The issue seems to be connected to the library `org.apache.poi:poi-scratchpad:3.17` as when running   `org.apache.tika.parser.microsoft.OutlookExtractor::OutlookExtractor(DirectoryNode, ParserContext)` we get `this.msg.mainChunks.allChunks.SenderEmailAddress = "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"`.
>  Check attachment to reproduce this defect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)