You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Tyler Nally <tn...@technally.com> on 2006/10/11 23:24:46 UTC

Parsing Email

Hello,

I've a project that I'm needing to solve.  Fax machines (for a client)
have been replaced with the phone company's fax server that e-mails
the incomming fax (.tif) images to a specific e-address at the clients
place of business.

Just so happens, the e-mail passes through a mail server that will
inspect it for e-viri as well as run it through spamassassin before
it forwards it onto their machine.  That mail server that pre-processes
the clients e-mail is a machine I administer.

What I'd like to do... is capture the contents of these particular
fax e-mails as its passing through the machine I administer and either:

  1- copy the fax images (detach the images from e-mail messages)
      and store these images on that server (whether as a file
      or put into a database as a blob)
  2- create a database record that will essentially catalog the
      incoming fax to associate a fax file image (or db blob ID)

      A- and also search a database for existing origination fax #'s
          so that the fax can be associated as to the right company
          that sent it.  In this case.. the DB used is a MySQL
          database that exists on this particular machine as well.


Now.. what I need help in understanding... is ... assuming that
I can handle each e-mail separately as it comes through, how do I
parse the e-mail (like the way Spamassassin does) to have the
ability to pull the component parts from the e-mail (from:,
subject:, and MIME-encapsulated fax image) in order to be able
to use these pieces (somehow) for the customer care module.

I'm well versed in PHP... I used to do a lot of perl (many moons
ago) and I'd like to make this work without too awful much pain.

I think ultimately, I'll probably let the normal copy of the e-mail
go onto the customers destination.  I'd cause an extra Cc: to
go through a specific e-mail account on the server where anything
that is delivered to this account is strained by this e-mail
parsing program that'll split the e-mail up into it's pieces,
and distribute/use the chunks it in a manner that I can manipulate
it later in the process.

Any help to point me in the right direction?

Thanks a lot....

Tyler Nally

Re: Parsing Email

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Oct 11, 2006 at 02:48:28PM -0700, Vincent Li wrote:
> my $fh;
> open $fh, "<", shift;
> my @message = <$fh>;
> 
> use Mail::SpamAssassin::Message;
> my $msg = Mail::SpamAssassin::Message->new(
>     {
>       'message' => \@message,
>     }

FYI, new() accepts a file handle, an array, a scalar, or undef (which
causes it to use \*STDIN).  So you don't need to slurp the message data
in first. :)

-- 
Randomly Selected Tagline:
All in a days work for "Confuse-a-Cat".

Re: Parsing Email

Posted by Vincent Li <vl...@vcn.bc.ca>.
On Wed, 11 Oct 2006, Theo Van Dinter wrote:

> On Wed, Oct 11, 2006 at 05:24:46PM -0400, Tyler Nally wrote:
>> Now.. what I need help in understanding... is ... assuming that
>> I can handle each e-mail separately as it comes through, how do I
>> parse the e-mail (like the way Spamassassin does) to have the
>> ability to pull the component parts from the e-mail (from:,
>> subject:, and MIME-encapsulated fax image) in order to be able
>> to use these pieces (somehow) for the customer care module.
>
> :)  I answered this kind of question for someone on IRC a week or two ago,
> here's a quick example of how to use Mail::SpamAssassin::Message:

Yeah, I learned to use Message.pm from felicity :)

>
> use Mail::SpamAssassin::Message;
> my $msg = Mail::SpamAssassin::Message->new() || die "Message error?";
> my $count = 0;
> foreach my $p ($msg->find_parts(qr/^image\b/i, 1)) {
>  open(OUT, ">message.".$count++) || die "can't write file message.$count: $!";
>  binmode OUT;
>  print OUT $p->decode();
>  close(OUT);
> }
>
>
> So that parses a message from STDIN, goes through and finds all image parts,
> and writes them out to files called message.#.

I used code below to retrieve the spam forwarded as attachment from 
squirrelmail and feeds spam to sa-learn
-------

#!/usr/bin/perl

use strict;
use warnings;

my $fh;
open $fh, "<", shift;
my @message = <$fh>;

use Mail::SpamAssassin::Message;
my $msg = Mail::SpamAssassin::Message->new(
     {
       'message' => \@message,
     }
) || die "Message error?";

#foreach my $p ($msg->find_parts(qr/^(text|image|application)\b/i, 1)) {
foreach my $p ($msg->find_parts(qr/^message\b/i, 0)) {
     eval {
            no warnings ;
            my $type = $p->{'type'};
            my $attachname = $p->{'name'};
            print "Content type is: $type\n";
            print "write file name: $attachname\n";
            open my $out, ">", "$attachname" || die "Can't write file 
$attachname:$!";
            binmode $out;
            print $out $p->decode();
     };
#    warn $@ if $@;
}
__END__

>
> Use "perldoc Mail::SpamAssassin::Message" and "perldoc
> Mail::SpamAssassin::Message::Node" for more information about functions and
> such. :)
>
> -- 
> Randomly Selected Tagline:
> "Zero equals Zero"               - Prof. Farr
>

Vincent Li    	http://pingpongit.homelinux.com
Opensource	.Implementation. .Consulting.
Platform	.Fedora. .Debian. .Mac OS X.
Blog		http://bl0g.blogdns.com

Re: Parsing Email

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Oct 11, 2006 at 05:24:46PM -0400, Tyler Nally wrote:
> Now.. what I need help in understanding... is ... assuming that
> I can handle each e-mail separately as it comes through, how do I
> parse the e-mail (like the way Spamassassin does) to have the
> ability to pull the component parts from the e-mail (from:,
> subject:, and MIME-encapsulated fax image) in order to be able
> to use these pieces (somehow) for the customer care module.

:)  I answered this kind of question for someone on IRC a week or two ago,
here's a quick example of how to use Mail::SpamAssassin::Message:

use Mail::SpamAssassin::Message;
my $msg = Mail::SpamAssassin::Message->new() || die "Message error?";
my $count = 0;
foreach my $p ($msg->find_parts(qr/^image\b/i, 1)) {
  open(OUT, ">message.".$count++) || die "can't write file message.$count: $!";
  binmode OUT;
  print OUT $p->decode();
  close(OUT);
}


So that parses a message from STDIN, goes through and finds all image parts,
and writes them out to files called message.#.

Use "perldoc Mail::SpamAssassin::Message" and "perldoc
Mail::SpamAssassin::Message::Node" for more information about functions and
such. :)

-- 
Randomly Selected Tagline:
"Zero equals Zero"               - Prof. Farr

Re: Parsing Email

Posted by Kelson <ke...@speed.net>.
Tyler Nally wrote:
>   1- copy the fax images (detach the images from e-mail messages)
>       and store these images on that server (whether as a file
>       or put into a database as a blob)

If you're running Sendmail, you can use MIMEdefang <www.mimedefang.org> 
for this.  It has a built-in function, action_replace_with_url, which 
does exactly what you want.

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>