You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Celpan Valeria (JIRA)" <ji...@apache.org> on 2018/07/26 14:16:00 UTC

[jira] [Created] (TIKA-2694) "From" headers is not always extracted correctly on msg mails

Celpan Valeria created TIKA-2694:
------------------------------------

             Summary: "From" headers is not always extracted correctly on msg mails
                 Key: TIKA-2694
                 URL: https://issues.apache.org/jira/browse/TIKA-2694
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.17
         Environment: CentOS 7
Windows 10
            Reporter: Celpan Valeria
         Attachments: Fw Anime User Analysis.msg

For some emails we get instead of the email address for "From" field a value which looks like `/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER`. The issue seems to be connected to the library `org.apache.poi:poi-scratchpad:3.17` as when running   `org.apache.tika.parser.microsoft.OutlookExtractor::OutlookExtractor(DirectoryNode, ParserContext)` we get `this.msg.mainChunks.allChunks.SenderEmailAddress = "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"`.
Check attachment to reproduce this defect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)