You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/03/07 20:24:38 UTC

[jira] [Commented] (TIKA-1879) Extract recipient information in MSG files with more granularity

    [ https://issues.apache.org/jira/browse/TIKA-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900088#comment-15900088 ] 

Tim Allison commented on TIKA-1879:
-----------------------------------

For "from", I assumed a single sender (which isn't always the case with "on behalf of" and/or "via"), and I created separate fields for Exchange email formats, e.g.
"/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=polyspot1.onmicrosoft.com-50609-Some-One

was mapped to: 
message_from_o=ExchangeLabs,
message_from_ou=Exchange AdministrativeGroup (FY...)
message_from_cn=polyspot1....

However, this won't map neatly to handling the "to" fields.  One unsatisfactory option is to keep a parallel arrays of names, smtpemails and exchangeemails, with empty cells in the smtpemails when there is an exchange formatted email and vice versa.  A cleaner option would be to have a single pair of parallel arrays with name[] and email[], where email[] would include the literal email value, whether it is smtp or exchange; the user would then have to parse an Exchange email address if they wanted to differentiate _o, _ou and _cn.

[~mcaruanagalizia] and [~lfcnassif], any recommendations?

> Extract recipient information in MSG files with more granularity
> ----------------------------------------------------------------
>
>                 Key: TIKA-1879
>                 URL: https://issues.apache.org/jira/browse/TIKA-1879
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
>
> As proposed in the parent task, it might be nice to have a parallel array for recipient name/recipient email for TO, CC and BCC.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)