You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Sean Tout <se...@gmail.com> on 2012/12/28 09:45:03 UTC

Spamassassin not parsing email messages

Hello,

I wrote a short Perl program that reads email from an existing mbox
formatted file, passes each individual email to Spamassassin for parse and
score, then prints a report for each email. The strange thing is that I keep
getting the same report score for all messages. I did confirm that I'm
reading each message by printing it after reading it. I tried the below code
on many different emails (spam and ham) yet I get the same report score for
all of them. What am I doing wrong? 

SpamAssassin Version: 3.3.3
OS: Debian

===Begin Perl Script===
#!/usr/bin/perl

  use Mail::Mbox::MessageParser;
  use Mail::SpamAssassin;
  

  my $file_name = 'tshort.spam';
  my $file_handle = new FileHandle($file_name);
  my $reportfile_name = '>>reportfile_out.txt';
  open (RFILE, $reportfile_name);
  # Set up cache. (Not necessary if enable_cache is false.)
  Mail::Mbox::MessageParser::SETUP_CACHE(
    { 'file_name' => '/tmp/cache' } );

  my $folder_reader =
    new Mail::Mbox::MessageParser( {
      'file_name' => $file_name,
      'file_handle' => $file_handle,
      'enable_cache' => 1,
      'enable_grep' => 1,
    } );

  die $folder_reader unless ref $folder_reader;

  # Any newlines or such before the start of the first email
  my $prologue = $folder_reader->prologue;
  print $prologue;

  my $spamtest = Mail::SpamAssassin->new();

  # This is the main loop. It's executed once for each email
  while(!$folder_reader->end_of_file())
  {
    $email = $folder_reader->read_next_email();
    $mail = $spamtest->parse($email);
    $status = $spamtest->check($mail);
    print RFILE $status->get_report();
    print RFILE "\n";
  }

close(RFILE);

===End Perl Script===

Report file below was run on 3 messages. 
===Begin Report File===
Spam detection software, running on the system "stout-lnx", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  [...] 

Content analysis details:   (6.9 points, 5.0 required)

 pts rule name              description
---- ----------------------
--------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
 1.2 MISSING_HEADERS        Missing To: header
 0.1 MISSING_MID            Missing Message-Id: header
 1.8 MISSING_SUBJECT        Missing Subject: header
 2.3 EMPTY_MESSAGE          Message appears to have no textual parts and no
                            Subject: text
-0.0 NO_RECEIVED            Informational: message has no Received headers
 1.4 MISSING_DATE           Missing Date: header
 0.0 NO_HEADERS_MESSAGE     Message appears to be missing most RFC-822
headers


Spam detection software, running on the system "stout-lnx", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  [...] 

Content analysis details:   (6.9 points, 5.0 required)

 pts rule name              description
---- ----------------------
--------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
 1.2 MISSING_HEADERS        Missing To: header
 0.1 MISSING_MID            Missing Message-Id: header
 1.8 MISSING_SUBJECT        Missing Subject: header
 2.3 EMPTY_MESSAGE          Message appears to have no textual parts and no
                            Subject: text
-0.0 NO_RECEIVED            Informational: message has no Received headers
 1.4 MISSING_DATE           Missing Date: header
 0.0 NO_HEADERS_MESSAGE     Message appears to be missing most RFC-822
headers


Spam detection software, running on the system "stout-lnx", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  [...] 

Content analysis details:   (6.9 points, 5.0 required)

 pts rule name              description
---- ----------------------
--------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
 1.2 MISSING_HEADERS        Missing To: header
 0.1 MISSING_MID            Missing Message-Id: header
 1.8 MISSING_SUBJECT        Missing Subject: header
 2.3 EMPTY_MESSAGE          Message appears to have no textual parts and no
                            Subject: text
-0.0 NO_RECEIVED            Informational: message has no Received headers
 1.4 MISSING_DATE           Missing Date: header
 0.0 NO_HEADERS_MESSAGE     Message appears to be missing most RFC-822
headers

===End Report File===

all below message have headers, body, & subject, yet the report states that
they do not! Am I reading the messages the wrong way?
===Begin Sample Email Messages Used as Input===

>From aw-confirm@ebay.com  Tue Jun 14 19:52:09 2005
Return-Path: <aw...@ebay.com>
X-Original-To: username@login.domain.com
Delivered-To: username@login.domain.com
Received: from mail1.domain.com (mail1.domain.com [10.0.2.3])
	by naughty.domain.com (Postfix) with ESMTP id A48B3536E6B
	for <us...@login.domain.com>; Tue, 14 Jun 2005 19:52:09 -0400 (EDT)
Received: from calculator (unknown [195.245.214.83])
	by mail1.domain.com (Postfix) with ESMTP id 3D12685ACE0
	for <us...@domain.com>; Tue, 14 Jun 2005 19:52:07 -0400 (EDT)
Received: from 216.231.36.64 by ; Fri, 17 Jun 2005 19:52:22 -0500
Message-ID: <WI...@yahoo.com>
From: "aw-confirm@ebay.com" <aw...@ebay.com>
Reply-To: "aw-confirm@ebay.com" <aw...@ebay.com>
To: username@domain.com
Subject: TKO Notice: ***Urgent Safeharbor Department Notice*** 
Date: Sat, 18 Jun 2005 01:57:22 +0100
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="--815502203823033306"
X-IP: 185.49.182.244
X-Priority: 3
Status: RO
X-Status: 
X-Keywords:                 
X-UID: 1

----815502203823033306
Content-Type: text/html;
Content-Transfer-Encoding: quoted-printable

<html>
<head>


<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; charset=3DISO-8859=
-1">
<title>eBay Suspension</title>
</head>
<body bgcolor=3D"#ffffff">
<3D"http://pics.ebaystatic.com/aw/pics/spacer.gif"> 

<img
src=3D"http://pics.ebaystatic.com/aw/pics/register/HeaderRegister_387x40.g=
if"
alt=3D"From collectibles to cars, buy and sell all kinds of items on eBay"=

border=3D"0"> <3D"http://pages.ebay.com/">  



<3D"http://pics.ebaystatic.com/aw/pics/spacer.gi=> 


<3D"http://pics.ebaystatic.com/aw/pics/spacer.gif"> 


<3D"http://pics.ebaystatic.com/aw/pics/sitewide/leftLine_16x3.gif"> 




*eBay Su=
spension*

<img
src=3D"http://pics.ebaystatic.com/aw/pics/listings/questionMark_14x14.gif"=

width=3D"14" HEIGHT=3D"14" border=3D"0">
<3D"http://pages.ebay.com/help/new/signin.html">  
<3D"http://pics.ebaystatic.com/aw/pics/spacer.gif">  Need Help?
<3D"http://pages.ebay.com/help/new/signin.html">  
<3D"http://pics.ebaystatic.com/aw/pics/spacer.gif"> 





<3D"http://pics.ebaystatic.com/aw/pics/spacer.gif"> 





<3D"http://pics.ebaystatic.com/aw/pics/spacer.g=> 








<3D"http://pics.ebaystatic.com/aw/pics/spacer.gif"> 




<3D"http://pics.ebaystatic.com/aw/pics/spacer.gif"> 

</td=
>

<br>Dear valued eBay member,
<br>
<br>During our regularly scheduled account maintenance and verification pr=
ocedures, we 
have detected a slight error in your billing information.
<br>
<br>This might be due to either of the following reasons:
<br>
<br>1. A recent change in your personal information ( i.e.change of addres=
s).
<br>2. Submiting invalid information during the initial sign up process.
<br>3. An inability to accurately verify your selected option of payment d=
ue to an 
internal error within our processors.
<br>
<br>Once you have updated your account records your eBay session will not =
be
interrupted and will 
<br>continue as normal.
<br>
<br>To update your eBay records click on the following link:
<br> http://cgi1.ebay.com/aw-cgi/ebayISAPI.dll?UPdate
<3D"http://70.96.188.24/~backfoul/secure/aw-cgi/DllUpdate/ws2/ISAPIDll=>  
<br>
<br>If your account information is not updated within 48 hours then your a=
bility to use 
eBay will become restricted.
<br><br>
<br>Regards,
<br>
<br>Safeharbor Department
<br>eBay, Inc.




<3D"http://pics.ebaystatic.com/aw/pics/spacer.gif=> 

<SCR=
IPT
SRC=3D&quot;http://include.ebaystatic.com/aw/pics/js/stats/ss2.js&quot;></SCRIPT><p>=



Copyright =
=A9
1995-2005 eBay Inc. All Rights Reserved.<br>Designated trademarks and bran=
ds
are the property of their respective owners.<br>Use of this Web site
constitutes acceptance of the eBay  User
Agreement <3D"http://pages.ebay.com/help/policies/user-agreement.html">  
and  Privacy
Policy <3D"http://pages.ebay.com/help/policies/privacy-policy.html">  .<br>

<img
src=3D"http://pics.ebaystatic.com/aw/pics/truste_button.gif" align=3D"midd=
le"
width=3D"116" height=3D"31" ALT=3D"TrustE" border=3D"0">
<3D"http://pages.ebay.com/help/policies/privacy-policy.html">  



</p>
</body>
</html>

----815502203823033306--

>From online-banking@lasallebank.com  Wed Jun 15 16:45:29 2005
Return-Path: <on...@lasallebank.com>
X-Original-To: username@login.domain.com
Delivered-To: username@login.domain.com
Received: from mail1.domain.com (mail1.domain.com [10.0.2.3])
	by naughty.domain.com (Postfix) with ESMTP
	id 876C1536E9E; Wed, 15 Jun 2005 16:45:29 -0400 (EDT)
Received: from pcp04337700pcs.stclco01.mi.comcast.net
(pcp04337700pcs.stclco01.mi.comcast.net [68.60.190.245])
	by mail1.domain.com (Postfix) with SMTP id 93B7C85AAD7;
	Wed, 15 Jun 2005 16:45:28 -0400 (EDT)
Received: from [67.38.206.72] by pcp04337700pcs.stclco01.mi.comcast.net SMTP
id ylPJMCmIr29065 for <jo...@domain.com>; Wed, 15 Jun 2005 17:36:17 -0400
Message-ID: <7x...@m6zklvchq>
From: "" <on...@lasallebank.com>
Reply-To: "" <on...@lasallebank.com>
To: joewee@domain.com
Subject: ATM cards attention. conformation code? yftue  fvkg ts r
Date: Wed, 15 Jun 05 17:36:17 GMT
X-Mailer: guest v.1.0
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary=".D..___72A497_9"
X-Priority: 3
X-MSMail-Priority: Normal
Status: RO
X-Status: 
X-Keywords:                 
X-UID: 2


--.D..___72A497_9
Content-Type: text/html;
Content-Transfer-Encoding: quoted-printable

&nbsp;<body bgcolor=3D#ffffff><div align=3D"left">


<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">
Dear LaSalle Member,</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">As part of our=
 continuing commitment to protect your account and to reduce
the instance of fraud on our website, we are undertaking a period review o=
f our member accounts.</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">
You are requested to visit our  site, and fill in the required information=
<br>Click the link below: <br> =
https://www.lasallebank-online.com/ <3D"http://lasallebank-online.com/"> 
.<br>This site is our  verify site,=
encrypted with 128bits encryption</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;
</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">

</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;
</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">This is requir=
ed for us to continue to offer you a safe and risk free
environment to send and receive money online and maintain the experience.Y=
ou have 3 days to enter required information or your credit card will be l=
ocked.</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">
Thank you,</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">Sincerely,LaSa=
lle Online Banking Customer Service</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">As outlined in=
 our User Agreement, 
LaSalle will periodically send you
information about site changes and enhancements. Visit our Privacy Policy =
and User Agreement if you have any questions.</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">--------------=
-------------------------------</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">Thank you for =
using 
LaSalle!</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">--------------=
-------------------------------</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">Do not reply t=
o this email.</p>


</div>
</body>
</html>

--.D..___72A497_9--

>From support-team@lasallebank.com  Wed Jun 15 18:56:01 2005
Return-Path: <su...@lasallebank.com>
X-Original-To: username@login.domain.com
Delivered-To: username@login.domain.com
Received: from mail1.domain.com (mail1.domain.com [10.0.2.3])
	by naughty.domain.com (Postfix) with ESMTP id 171B3536E89
	for <us...@login.domain.com>; Wed, 15 Jun 2005 18:56:01 -0400 (EDT)
Received: from c-67-165-254-110.hsd1.co.comcast.net
(c-67-165-254-110.hsd1.co.comcast.net [67.165.254.110])
	by mail1.domain.com (Postfix) with SMTP id D5BA285ADC3
	for <us...@domain.com>; Wed, 15 Jun 2005 18:55:59 -0400 (EDT)
Received: from [193.23.217.29]
	by c-67-165-254-110.hsd1.co.comcast.net with SMTP
	for <us...@domain.com>; Thu, 16 Jun 2005 05:49:48 +0600
Message-ID: <l9...@yms.gjp7>
From: "" <su...@lasallebank.com>
Reply-To: "" <su...@lasallebank.com>
To: username@domain.com
Subject: Imporant information for LaSalle Bank Customers conformation code?
rouhevzrkjd mp
Date: Thu, 16 Jun 05 05:49:48 GMT
X-Mailer: fiNiSh v.0.0.1
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="B709193_F79A_D8_"
X-Priority: 3
X-MSMail-Priority: Normal
Status: RO
X-Status: 
X-Keywords:                 
X-UID: 3


--B709193_F79A_D8_
Content-Type: text/html;
Content-Transfer-Encoding: quoted-printable

&nbsp;<body bgcolor=3D#ffffff><div align=3D"left">


<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">
Dear LaSalle Member,</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">As part of our=
 continuing commitment to protect your account and to reduce
the instance of fraud on our website, we are undertaking a period review o=
f our member accounts.</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">
You are requested to visit our  site, and fill in the required information=
<br>Click the link below: <br> =
https://www.lasallebank-online.com/ <3D"http://lasallebank-online.com/"> 
.<br>This site is our  verify site,=
encrypted with 128bits encryption</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;
</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">

</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;
</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">This is requir=
ed for us to continue to offer you a safe and risk free
environment to send and receive money online and maintain the experience.Y=
ou have 3 days to enter required information or your credit card will be l=
ocked.</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">
Thank you,</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">Sincerely,LaSa=
lle Online Banking Customer Service</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">As outlined in=
 our User Agreement, 
LaSalle will periodically send you
information about site changes and enhancements. Visit our Privacy Policy =
and User Agreement if you have any questions.</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">&nbsp;</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">--------------=
-------------------------------</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">Thank you for =
using 
LaSalle!</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">--------------=
-------------------------------</p>
<p style=3D"margin-top: 0; margin-bottom: 0" align=3D"left">Do not reply t=
o this email.</p>


</div>
</body>
</html>

--B709193_F79A_D8_--

===End Sample Email Messages Used as Input===




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by Martin Gregorie <ma...@gregorie.org>.
On Fri, 2012-12-28 at 21:48 -0800, Sean Tout wrote:

> I have practically given up on the original
> perl code since I'm unable to find out the issue. With spamc, I can get a
> decent performance.
> 
IMO, unless you need the extra facilities of amavis-new or one of the
other smart wrappers for SA and Clamav, you're almost always better off
using spamc/spamd for the reasons already given. FYI amavis-new is
written in Perl and works by loading the SA code so it can directly pass
messages to SA and read its responses.

However, don't mistake using spamc/spamd for 'not using the original
Perl code' - it isn't. Although spamc is a simple purpose-built, fast C
program which adds minimal runtime overheads, spamd is little more than
simple daemon launcher wrapped round the standard SA code. Look at it
with less and you'll see what I mean...


Martin




Re: Spamassassin not parsing email messages

Posted by RW <rw...@googlemail.com>.
On Fri, 28 Dec 2012 21:48:25 -0800 (PST)
Sean Tout wrote:

> Hi Martin,
> 
> You certainly did not miss anything....but I did! Being new to
> spamassassin, I was only familiar with spamassassin command. which
> was awfully slow for a large number of emails. But now that I used
> spamc, I'm getting 5+ messages per second.
> 
> Thank you much for the advise. I have practically given up on the
> original perl code since I'm unable to find out the issue. With
> spamc, I can get a decent performance.
> 


Using spamc avoids repeated initialisation, but if I want it to be
really fast I do it something like this:


   for m in /home/sean/code/spam/spfiles/*
   do
      spamc <$m  ... &
      [ "$(( n=(n+1) % 20 ))" -eq 0 ] && spamc -K >/dev/null
   done

It puts spamc processes into the background in parallel. Occasionally
running spamc -K in the foreground prevents unnecessary timeouts by
limiting the number of spamc process waiting to be assigned to a spamd
child process.

At very least there's a speed-up from using all cpu cores, but with slow
or unreliable network tests the speed-up can be enormous. You need to
set --max-children in spamd appropriately.


Re: Spamassassin not parsing email messages

Posted by Sean Tout <se...@gmail.com>.
Hi Martin,

You certainly did not miss anything....but I did! Being new to spamassassin,
I was only familiar with spamassassin command. which was awfully slow for a
large number of emails. But now that I used spamc, I'm getting 5+ messages
per second.

Thank you much for the advise. I have practically given up on the original
perl code since I'm unable to find out the issue. With spamc, I can get a
decent performance.

Regards,

-Sean.




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102801.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by Martin Gregorie <ma...@gregorie.org>.
On Fri, 2012-12-28 at 16:51 -0800, Sean Tout wrote:
> Hi John,
> 
> Thank you much for the help. I have been trying to avoid executing
> spamassassin shell commands from perl since it takes a significant amount of
> time~=12 seconds for each email. I have tried the below script, which works
> but of course not in a favorable especially for processing 20,000+ emails in
> spfiles folder.
> 
> @files = </home/sean/code/spam/spfiles/*>;
> my $outfile = '>>mailrep_out.txt';
> open (MYFILE, $outfile);
> foreach $file (@files) {
>    $cmd = "spamassassin --test-mode < ".$file." >>mail_out.txt";
>    system ($cmd);
> }
> close(MYFILE);
> 
> Regards,
> 
> -Sean.
> 
As, from this, it seems that you have already got the messages held as
individual files in the /home/sean/code/spam/spfiles/ directory, why not
feed them directly to spamd with a small bash script:

for m in /home/sean/code/spam/spfiles/*
do
	spamc <$m | pipeline to analyse and store spamd replies
done

which should run a lot faster than calling spamassassin directly because
spamd will only need to be loaded once at the start of the run.

... or did I miss something obvious?


Martin




Re: Spamassassin not parsing email messages

Posted by Sean Tout <se...@gmail.com>.
Hi John,

Thank you much for the help. I have been trying to avoid executing
spamassassin shell commands from perl since it takes a significant amount of
time~=12 seconds for each email. I have tried the below script, which works
but of course not in a favorable especially for processing 20,000+ emails in
spfiles folder.

@files = </home/sean/code/spam/spfiles/*>;
my $outfile = '>>mailrep_out.txt';
open (MYFILE, $outfile);
foreach $file (@files) {
   $cmd = "spamassassin --test-mode < ".$file." >>mail_out.txt";
   system ($cmd);
}
close(MYFILE);

Regards,

-Sean.




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102791.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by John Hardin <jh...@impsec.org>.
On Fri, 28 Dec 2012, Sean Tout wrote:

> Hi John,
>
> Per your response below, here is what I did to confirm it's not a content
> problem.
> open (RFILE, $reportfile_name);
> while(!$folder_reader->end_of_file())
>  {
>    $email = $folder_reader->read_next_email();
>    chomp($email);
>    $mail = $spamtest->parse($email);
>    $status = $spamtest->check($mail);
>    print RFILE $$email;
> }
>
> then issued the following command:
> spamassassin --test-mode < /home/stout/spam/reportfile_in.txt
>
> the above worked just fine. the contents of reportfile_in.txt are created by
> "print RFILE $$email".
>
> Thoughts!

Unfortunately that's all I can recommend. I am not familiar with using the 
SpamAssassin libraries directly from Perl. If I were in your situation I'd 
do something hackish like system("spamc $RFILE") or an equally ugly shell 
script... :)

Sorry.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Justice is justice, whereas "social justice" is code for one set
   of rules for the rich, another for the poor; one set for whites,
   another set for minorities; one set for straight men, another for
   women and gays. In short, it's the opposite of actual justice.
                                                     -- Burt Prelutsky
-----------------------------------------------------------------------
  211 days since the first successful private support mission to ISS (SpaceX)

Re: Spamassassin not parsing email messages

Posted by Sean Tout <se...@gmail.com>.
Hi John,

Per your response below, here is what I did to confirm it's not a content
problem. 
open (RFILE, $reportfile_name);
while(!$folder_reader->end_of_file())
  {
    $email = $folder_reader->read_next_email();
    chomp($email);
    $mail = $spamtest->parse($email);
    $status = $spamtest->check($mail);
    print RFILE $$email;
}

then issued the following command:
spamassassin --test-mode < /home/stout/spam/reportfile_in.txt

the above worked just fine. the contents of reportfile_in.txt are created by
"print RFILE $$email".

Thoughts!

Regards,

-Sean.





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102789.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by John Hardin <jh...@impsec.org>.
On Fri, 28 Dec 2012, Sean Tout wrote:

> Hi John,
>
> I wrote every email read to an output file. The output file is identical to
> the input file I'm reading the emails from according to diff!

The concern is the format of the single mail object being sent to 
SpamAssassin for scanning. Having the very first line of that object be a 
blank line would explain the "misformatted message" rule hits you've 
reported.

Capturing the entire mailbox and running a diff is certainly suggestive, 
but to be *sure* you want to look at the messages individually.

If you capture that one mail object to a file, and it is a 
properly-formatted RFC-822 message with no leading blank lines, and you 
can successfully pipe that file through SA and get a sensible score, then 
the problem is not in the data, it's how it's being fed to SpamAssassin 
within that script.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The more you believe you can create heaven on earth the more
   likely you are to set up guillotines in the public square to
   hasten the process.                                 -- James Lileks
-----------------------------------------------------------------------
  211 days since the first successful private support mission to ISS (SpaceX)

Re: Spamassassin not parsing email messages

Posted by Sean Tout <se...@gmail.com>.
Hi John,

I wrote every email read to an output file. The output file is identical to
the input file I'm reading the emails from according to diff! 

Regards,

-Sean.




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102786.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by John Hardin <jh...@impsec.org>.
On Fri, 28 Dec 2012, Sean Tout wrote:

> That's most likely the case. But I'm not sure what's going in there and how
> to get rid of it. I tried with and without chomp() but got the same results.
> below is a snippet with chomp, which I applied before parsing the email with
> spamassassin.
>
> my $spamtest = Mail::SpamAssassin->new();
>
>  # This is the main loop. It's executed once for each email
>  while(!$folder_reader->end_of_file())
>  {
>    $email = $folder_reader->read_next_email();

Write $email to a file here and take a look at it.

>    chomp($email);
>    $mail = $spamtest->parse($email);
>    $status = $spamtest->check($mail);
>  #rest of code per above.
>  }

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   How can you reason with someone who thinks we're on a glidepath to
   a police state and yet their solution is to grant the government a
   monopoly on force? They are insane.
-----------------------------------------------------------------------
  211 days since the first successful private support mission to ISS (SpaceX)

Re: Spamassassin not parsing email messages

Posted by Sean Tout <se...@gmail.com>.
Hi Dave,

That's most likely the case. But I'm not sure what's going in there and how
to get rid of it. I tried with and without chomp() but got the same results.
below is a snippet with chomp, which I applied before parsing the email with
spamassassin.

my $spamtest = Mail::SpamAssassin->new();

  # This is the main loop. It's executed once for each email
  while(!$folder_reader->end_of_file())
  {
    $email = $folder_reader->read_next_email();
    chomp($email);
    $mail = $spamtest->parse($email);
    $status = $spamtest->check($mail);
  #rest of code per above.
  }

Regards,

-Sean.




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102784.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by Dave Funk <db...@engineering.uiowa.edu>.
That implies that what ever mechanism you're using in the original process
is adding a blank line (or bare 'nl' or 'cr') to the beginning of the
message that you're then handing to SA.

Idiot question, are you doing (or not) a "chomp" in the initial read 
process?


On Fri, 28 Dec 2012, Sean Tout wrote:

> Hi Henrik & Jeff,
>
> One more input that might shed more light. I copied one of the emails from
> the above 3 emails into its own file and ran spamassassin from the command
> line in test mode against it and it worked fine. the command is
> spamassassin --test-mode < /spamemails/singleemail.spam
>
> where singleemail.spam contains a single spam email.
>
> Regards,
>
> -Sean.
>
>
>
>
> --
> View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
>

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Spamassassin not parsing email messages

Posted by Sean Tout <se...@gmail.com>.
Hi Henrik & Jeff,

One more input that might shed more light. I copied one of the emails from
the above 3 emails into its own file and ran spamassassin from the command
line in test mode against it and it worked fine. the command is 
spamassassin --test-mode < /spamemails/singleemail.spam

where singleemail.spam contains a single spam email.

Regards,

-Sean.




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by Sean Tout <se...@gmail.com>.
Hi Jeff,

You are correct. it's clear Spamassassin is unable to parse the email. so
there is something in the email that's causing SpamAssassin to not parse the
email, which I'm trying to find out what it is and why! 
I have tried multiple sources of emails, many of which are from known spam
corpus and from my own email client. All of which are in mbox format. in
fact Mail::Mbox::MessageParser is working just fine with those emails as I'm
having no problem parsing those emails.

Would greatly appreciate any clues.

Regards,

-Sean.





--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102781.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by Jeff Mincy <je...@delphioutpost.com>.
   From: Sean Tout <se...@gmail.com>
   Date: Fri, 28 Dec 2012 01:10:02 -0800 (PST)
   
   Hi Henrik,
   
   Thank you much for the prompt response and points. I ran the Perl script
   with the code you pasted below, but still got the same report scores for all
   emails! by the way, when I also tried to print contents of the emails using
   $status->get_content_preview(), I got [...] I'm unable to print any portions
   of the email messages using $status = $spamtest->check($mail), however I can
   print any portions using $folder_reader->read_next_email().
   
   Regards,
   
   Sean.
   
Based on the tests that are hit
   --------------------------------------------------
   -0.0 NO_RELAYS              Informational: message was not relayed via SMTP
    1.2 MISSING_HEADERS        Missing To: header
    0.1 MISSING_MID            Missing Message-Id: header
    1.8 MISSING_SUBJECT        Missing Subject: header
    2.3 EMPTY_MESSAGE          Message appears to have no textual parts and no
                               Subject: text
   -0.0 NO_RECEIVED            Informational: message has no Received headers
    1.4 MISSING_DATE           Missing Date: header
    0.0 NO_HEADERS_MESSAGE     Message appears to be missing most RFC-822

you are passing in malformed email messages into SpamAssassin.
SpamAssassin can not find any of the headers.  I'd guess that you
have extraneous junk at the beginning of each message.

-jeff

Re: Spamassassin not parsing email messages

Posted by Sean Tout <se...@gmail.com>.
Hi Henrik,

Thank you much for the prompt response and points. I ran the Perl script
with the code you pasted below, but still got the same report scores for all
emails! by the way, when I also tried to print contents of the emails using
$status->get_content_preview(), I got [...] I'm unable to print any portions
of the email messages using $status = $spamtest->check($mail), however I can
print any portions using $folder_reader->read_next_email().

Regards,

Sean.




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102772.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Spamassassin not parsing email messages

Posted by Henrik K <he...@hege.li>.
On Fri, Dec 28, 2012 at 12:45:03AM -0800, Sean Tout wrote:
> Hello,
> 
> I wrote a short Perl program that reads email from an existing mbox
> formatted file, passes each individual email to Spamassassin for parse and
> score, then prints a report for each email. The strange thing is that I keep
> getting the same report score for all messages. I did confirm that I'm
> reading each message by printing it after reading it. I tried the below code
> on many different emails (spam and ham) yet I get the same report score for
> all of them. What am I doing wrong? 

You need to completely destroy SpamAssassin after usage.

Change this:

>   my $spamtest = Mail::SpamAssassin->new();
> 
>   # This is the main loop. It's executed once for each email
>   while(!$folder_reader->end_of_file())
>   {
>     $email = $folder_reader->read_next_email();
>     $mail = $spamtest->parse($email);
>     $status = $spamtest->check($mail);
>     print RFILE $status->get_report();
>     print RFILE "\n";
>   }

To something like this:

while(!$folder_reader->end_of_file())
{
  my $email = $folder_reader->read_next_email();
  my $spamtest = Mail::SpamAssassin->new();
  my $mail = $spamtest->parse($email);
  my $status = $spamtest->check($mail);
  print RFILE $status->get_report();
  print RFILE "\n";
  $status->finish(); # important
  $mail->finish(); # important
  $spamtest->finish(); # important
}

I can't remember from the top of my head if $spamtest can be reused after
finish(), but atleast this should work 100%.