You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by DAVID SMITH <da...@btinternet.com> on 2017/01/14 21:30:05 UTC

Help with ExtractText processor and Mime format files

Hi
I have received some mime format files and I want to extract certain parts of them to use as attributes, such as the 'To' and 'From'  fields.
I have tried using the ExtractText processor but I can never get the regex to give me a match.
My questions are :1)    Am I using the best processor to decode mime format messages?2)    If the extractText processor is the best way of decoded a mime format message, can someone give me an example of a regex which  would give me either the sender or the recipient .

The following is an example of a file I have received:
Date: Fri, 13 Jan 2017 10:26:12 -0400 (EDT)
From: David Smith <da...@home.example.com>
To: John Doe <jo...@example.com>
Subject: =?iso-8859-1?Q?Test Email?=
Message-ID: <Pi...@home.example.com>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="-123456789-987654321-958746372=:6982"

  This message is in MIME format.  The first part should be plain text. 

---123456789-987654321-958746372=:6982
Content-Type: TEXT/PLAIN; charset=US-ASCII


Many thanksDave


Re: Help with ExtractText processor and Mime format files

Posted by Koji Kawamura <ij...@gmail.com>.
Hi David,

If you'd like to extract attributes such as 'To' and 'From', then
ExtractEmailHeaders will work for you.
ExtractEmailHeaders is added since NiFi 1.0.0.

Thanks,
Koji


On Sun, Jan 15, 2017 at 6:30 AM, DAVID SMITH <da...@btinternet.com> wrote:
> Hi
> I have received some mime format files and I want to extract certain parts of them to use as attributes, such as the 'To' and 'From'  fields.
> I have tried using the ExtractText processor but I can never get the regex to give me a match.
> My questions are :1)    Am I using the best processor to decode mime format messages?2)    If the extractText processor is the best way of decoded a mime format message, can someone give me an example of a regex which  would give me either the sender or the recipient .
>
> The following is an example of a file I have received:
> Date: Fri, 13 Jan 2017 10:26:12 -0400 (EDT)
> From: David Smith <da...@home.example.com>
> To: John Doe <jo...@example.com>
> Subject: =?iso-8859-1?Q?Test Email?=
> Message-ID: <Pi...@home.example.com>
> MIME-Version: 1.0
> Content-Type: MULTIPART/MIXED; BOUNDARY="-123456789-987654321-958746372=:6982"
>
>   This message is in MIME format.  The first part should be plain text.
>
> ---123456789-987654321-958746372=:6982
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
>
> Many thanksDave
>