You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by djmdata <da...@gmail.com> on 2016/07/17 00:53:56 UTC

Re: ListenSMTP processor

What is the JIRA #?

I have a production system that reads email from a custom SMTP listener and
places the SMTP payload into Kafka. A Storm topology reads messages from
Kafka and parses the emails (Java code using JavaMail API) into useful info
(subject, text, attachments, body, etc...).

I'm looking at plugging NiFi into this to replace the custom SMTP listener.
If you had a processor that could act as a reliable (we can't lose emails)
and performant SMTP listener alternative we would use it.

Your "email parser processor" is an interesting idea - but beware of the
mess you'll find in the wild with email. In our case, we try to parse
Exchange (full of non-standard wonders like "TNEF" attachments") as well as
email from virtually anywhere (GMail, Yahoo, Joe's email client...). If you
can crack that you'll be on to something. We have even more complexity in
that we read "Microsoft Journals" which wrap the standard SMTP layout in a
Microsoft layer (you'll see this at large Exchange shops doing this kind of
thing for use cases like compliance).



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: ListenSMTP processor

Posted by Joe Witt <jo...@gmail.com>.
@djmdata please consider subscribing to the dev list by sending an
email to dev-subscribe@nifi.apache.org

The JIRA for this work is here https://issues.apache.org/jira/browse/NIFI-1899

Feel free to help drive the conversation on there.  Might not be a 1.0
release thing but looks like some good effort going on so should be in
a release soon.  As you point out parsing mail in the wild is...wild.
We can just iterate on it and improve it.  If you're interested in
helping with coding some of it up too please feel free to jump in
there.  If not that is totally fine as well.  Just having good
requirements/perspective is valuable.

Thanks
Joe

On Sat, Jul 16, 2016 at 8:53 PM, djmdata <da...@gmail.com> wrote:
> What is the JIRA #?
>
> I have a production system that reads email from a custom SMTP listener and
> places the SMTP payload into Kafka. A Storm topology reads messages from
> Kafka and parses the emails (Java code using JavaMail API) into useful info
> (subject, text, attachments, body, etc...).
>
> I'm looking at plugging NiFi into this to replace the custom SMTP listener.
> If you had a processor that could act as a reliable (we can't lose emails)
> and performant SMTP listener alternative we would use it.
>
> Your "email parser processor" is an interesting idea - but beware of the
> mess you'll find in the wild with email. In our case, we try to parse
> Exchange (full of non-standard wonders like "TNEF" attachments") as well as
> email from virtually anywhere (GMail, Yahoo, Joe's email client...). If you
> can crack that you'll be on to something. We have even more complexity in
> that we read "Microsoft Journals" which wrap the standard SMTP layout in a
> Microsoft layer (you'll see this at large Exchange shops doing this kind of
> thing for use cases like compliance).
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: ListenSMTP processor

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
Hello,

The patch # for ListenSMTP is NIFI-1899[1]. A PR[2] has been submitted and it is currently being reviewed. It should be adequately reliable and performant (if it's not it more than likely should be fixed). If you're able, it would be great to have another pair of eyes on it or even just help testing is always appreciated.


The prospective implementation of the ExtractEmailAttachments processor uses the Apache Commons - Mail MimeMessageParser to see if the message has any attachments. If it does it creates new FlowFiles from those attachments by streaming the bytes to the new FlowFile's content. I believe this method should allow for schema-less attachments. 

[1] https://issues.apache.org/jira/browse/NIFI-1899
[2] https://github.com/apache/nifi/pull/483

Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Saturday, July 16, 2016 10:08 PM, djmdata <da...@gmail.com> wrote:



What is the JIRA #?

I have a production system that reads email from a custom SMTP listener and
places the SMTP payload into Kafka. A Storm topology reads messages from
Kafka and parses the emails (Java code using JavaMail API) into useful info
(subject, text, attachments, body, etc...).

I'm looking at plugging NiFi into this to replace the custom SMTP listener.
If you had a processor that could act as a reliable (we can't lose emails)
and performant SMTP listener alternative we would use it.

Your "email parser processor" is an interesting idea - but beware of the
mess you'll find in the wild with email. In our case, we try to parse
Exchange (full of non-standard wonders like "TNEF" attachments") as well as
email from virtually anywhere (GMail, Yahoo, Joe's email client...). If you
can crack that you'll be on to something. We have even more complexity in
that we read "Microsoft Journals" which wrap the standard SMTP layout in a
Microsoft layer (you'll see this at large Exchange shops doing this kind of
thing for use cases like compliance).



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: ListenSMTP processor

Posted by Toivo Adams <to...@gmail.com>.
I support Oleg opinion.
Do one thing and do it well.

Thanks
Toivo



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: ListenSMTP processor

Posted by Oleg Zhurakousky <oz...@hortonworks.com>.
Andre

Unless I am missing something, I would stay on the side of "keeping it simple and modularized” where if any extraction/transformation/modification etc., of anything is required that is a job of another component and in fact faced the very similar question few days ago about attachments from IMAP/POP3 etc. 
As you mentioned already using MimeMessageParser is straight forward and allows one to restore InputStream back into a java.mail.Message from which you can extract and get to pretty much anything and you are already doing in in ExtactAttachment processor so I would continue on that pass.

That is of course my opinion, so would be nice to see what other’s think.
Cheers
Oleg

> On Jul 24, 2016, at 8:38 AM, Andre <an...@fucs.org> wrote:
> 
> I have raised NIFI-2380 to track this improvement.
> 
> While raising the ticket I was wondering:
> 
> are you happy to give the use the option to chose if to extract the
> winmail.dat or not?
> 
> I mean something like:
> 
> - PROPERTY: "Extract Attachments within a TNEF (i.e. winmail.data): true /
> false
> 
> If yes, then every time a decoding occur we test the name (or something
> better in case it is possible) and then extract it. An attachment created
> by a TNEF file would have an attribute email.attachment.tnefdecoded (or
> whatever name we decide) set to yes.
> 
> If no, processing continues as it is today (i.e. purely based on Apache
> Commons MimeMessageParser).
> 
> 
> Another possible solution would be an additional processor but IMNSHO this
> would be overkill and counter productive.
> 
> Ken to hear your thoughts
> 
> On Sun, Jul 17, 2016 at 4:46 PM, Andre <an...@fucs.org> wrote:
> 
>> Dan,
>> 
>> Ingesting Microsoft Journals seem like a great suggestion for a new
>> processor ( ParseExchangeJounal ?).
>> 
>> Regarding TNEF: As far as I know, Apache Commons - Mail does not pase "winmail.dat"
>> type attachments. As far as I understand the only ASL compatible
>> implementation of a TNEF extractor is Apache's POI and even that
>> implementation is not part of POI's main release.
>> 
>> If TNEF support is required we will ether have to code from scratch or
>> perhaps use https://github.com/koodaamo/tnefparse together with
>> ExecuteScript (although since tnefparse  is LGPL, this solution cannot be
>> packaged as part of NiFi).
>> 
>> Cheers
>> 
>> On Sun, Jul 17, 2016 at 10:53 AM, djmdata <da...@gmail.com> wrote:
>> 
>>> What is the JIRA #?
>>> 
>>> I have a production system that reads email from a custom SMTP listener
>>> and
>>> places the SMTP payload into Kafka. A Storm topology reads messages from
>>> Kafka and parses the emails (Java code using JavaMail API) into useful
>>> info
>>> (subject, text, attachments, body, etc...).
>>> 
>>> I'm looking at plugging NiFi into this to replace the custom SMTP
>>> listener.
>>> If you had a processor that could act as a reliable (we can't lose emails)
>>> and performant SMTP listener alternative we would use it.
>>> 
>>> Your "email parser processor" is an interesting idea - but beware of the
>>> mess you'll find in the wild with email. In our case, we try to parse
>>> Exchange (full of non-standard wonders like "TNEF" attachments") as well
>>> as
>>> email from virtually anywhere (GMail, Yahoo, Joe's email client...). If
>>> you
>>> can crack that you'll be on to something. We have even more complexity in
>>> that we read "Microsoft Journals" which wrap the standard SMTP layout in a
>>> Microsoft layer (you'll see this at large Exchange shops doing this kind
>>> of
>>> thing for use cases like compliance).
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
>>> Sent from the Apache NiFi Developer List mailing list archive at
>>> Nabble.com.
>>> 
>> 
>> 


Re: ListenSMTP processor

Posted by Andre <an...@fucs.org>.
I have raised NIFI-2380 to track this improvement.

While raising the ticket I was wondering:

are you happy to give the use the option to chose if to extract the
winmail.dat or not?

I mean something like:

- PROPERTY: "Extract Attachments within a TNEF (i.e. winmail.data): true /
false

If yes, then every time a decoding occur we test the name (or something
better in case it is possible) and then extract it. An attachment created
by a TNEF file would have an attribute email.attachment.tnefdecoded (or
whatever name we decide) set to yes.

If no, processing continues as it is today (i.e. purely based on Apache
Commons MimeMessageParser).


Another possible solution would be an additional processor but IMNSHO this
would be overkill and counter productive.

Ken to hear your thoughts

On Sun, Jul 17, 2016 at 4:46 PM, Andre <an...@fucs.org> wrote:

> Dan,
>
> Ingesting Microsoft Journals seem like a great suggestion for a new
> processor ( ParseExchangeJounal ?).
>
> Regarding TNEF: As far as I know, Apache Commons - Mail does not pase "winmail.dat"
> type attachments. As far as I understand the only ASL compatible
> implementation of a TNEF extractor is Apache's POI and even that
> implementation is not part of POI's main release.
>
> If TNEF support is required we will ether have to code from scratch or
> perhaps use https://github.com/koodaamo/tnefparse together with
> ExecuteScript (although since tnefparse  is LGPL, this solution cannot be
> packaged as part of NiFi).
>
> Cheers
>
> On Sun, Jul 17, 2016 at 10:53 AM, djmdata <da...@gmail.com> wrote:
>
>> What is the JIRA #?
>>
>> I have a production system that reads email from a custom SMTP listener
>> and
>> places the SMTP payload into Kafka. A Storm topology reads messages from
>> Kafka and parses the emails (Java code using JavaMail API) into useful
>> info
>> (subject, text, attachments, body, etc...).
>>
>> I'm looking at plugging NiFi into this to replace the custom SMTP
>> listener.
>> If you had a processor that could act as a reliable (we can't lose emails)
>> and performant SMTP listener alternative we would use it.
>>
>> Your "email parser processor" is an interesting idea - but beware of the
>> mess you'll find in the wild with email. In our case, we try to parse
>> Exchange (full of non-standard wonders like "TNEF" attachments") as well
>> as
>> email from virtually anywhere (GMail, Yahoo, Joe's email client...). If
>> you
>> can crack that you'll be on to something. We have even more complexity in
>> that we read "Microsoft Journals" which wrap the standard SMTP layout in a
>> Microsoft layer (you'll see this at large Exchange shops doing this kind
>> of
>> thing for use cases like compliance).
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>
>
>

Re: ListenSMTP processor

Posted by Andre <an...@fucs.org>.
Dan,

Ingesting Microsoft Journals seem like a great suggestion for a new
processor ( ParseExchangeJounal ?).

Regarding TNEF: As far as I know, Apache Commons - Mail does not pase
"winmail.dat"
type attachments. As far as I understand the only ASL compatible
implementation of a TNEF extractor is Apache's POI and even that
implementation is not part of POI's main release.

If TNEF support is required we will ether have to code from scratch or
perhaps use https://github.com/koodaamo/tnefparse together with
ExecuteScript (although since tnefparse  is LGPL, this solution cannot be
packaged as part of NiFi).

Cheers

On Sun, Jul 17, 2016 at 10:53 AM, djmdata <da...@gmail.com> wrote:

> What is the JIRA #?
>
> I have a production system that reads email from a custom SMTP listener and
> places the SMTP payload into Kafka. A Storm topology reads messages from
> Kafka and parses the emails (Java code using JavaMail API) into useful info
> (subject, text, attachments, body, etc...).
>
> I'm looking at plugging NiFi into this to replace the custom SMTP listener.
> If you had a processor that could act as a reliable (we can't lose emails)
> and performant SMTP listener alternative we would use it.
>
> Your "email parser processor" is an interesting idea - but beware of the
> mess you'll find in the wild with email. In our case, we try to parse
> Exchange (full of non-standard wonders like "TNEF" attachments") as well as
> email from virtually anywhere (GMail, Yahoo, Joe's email client...). If you
> can crack that you'll be on to something. We have even more complexity in
> that we read "Microsoft Journals" which wrap the standard SMTP layout in a
> Microsoft layer (you'll see this at large Exchange shops doing this kind of
> thing for use cases like compliance).
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>