You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Theo Van Dinter <fe...@apache.org> on 2006/09/14 06:35:14 UTC

Received header parsing

So I'm trying to go through all of my UNPARSEABLE_RELAY hits and see what I
can do.  The question is: how much data is required to come out of the header
to make it worth parsing?  conversely, when should we just ignore a header?

For example:

from (1.2.3.4) by host.example.com via smtp id
03d4_20d285fe_3c6f_11db_828f_0013725b2d50; Mon, 04 Sep 2006 19:43:12 -0400

from  ([172.16.1.78]) by email2.codeworksonline.com with Microsoft
SMTPSVC(5.0.2195.6713); Wed, 6 Sep 2006 21:14:29 -0400

Judging by "with HTTP", ip and by is enough.


(from someone@example.com)  by m06.lax.untd.com (jqueuemail) id LRVB3JAJ; Fri,
02 Jun 2006 08:15:21 PDT

from CNNIMAIL12.CNN.COM by CNNIMAIL12.CNN.COM (LISTSERV-TCP/IP release 1.8d)
with spool id 35469828 for TEXTBREAKINGNEWS@CNNIMAIL12.CNN.COM; Tue, 23 May
2006 11:01:27 -0400

from EXCL.hq.corp.pbs.org (mail1.hq.corp.pbs.org) by listserv.pbs.org (LSMTP
for Windows NT v1.1b) with SMTP id <0....@listserv.pbs.org>; Mon, 22 May
2006 9:43:25 -0400

from PRODWEB02LA by fireball.treehousei.com (Merak 8.5.0-2) with SMTP id
FTM10106 for <us...@example.com>; Tue, 27 Jun 2006 06:11:06 -0700

from mailer  by www.ennmagazine.com with HTTP (Mail); Fri, 1 Sep 2006 10:50:01
-0400

Without an IP I think these are useless...?


from FNCLISTSRV (10.6.147.53:1028) by listserv.foxnews.com (LSMTP for Windows
NT v1.1b) with SMTP id <23...@listserv.foxnews.com>; Tue, 8 Aug 2006
13:45:26 -0400

This is actually parseable.  I think it's helo, ip, by, and id.



Thoughts?  (happily, I was able to shrink the list of 3500 unparsed relays
down to ~50 unique formats)

-- 
Randomly Selected Tagline:
"Communist revolutionaries taking over the server room and demanding
 all the computers in the building or they shoot the sysadmin."
         - Today's BOFH Excuse

Re: Received header parsing

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Sep 13, 2006 at 11:51:58PM -0500, Michael Parker wrote:
> Not many but I will toss out an idea that I like to promote every now
> and then.  It sure would be nice if we could come up with some sort of
> syntax that we could define various received header layouts.  Then they
> could just be part of the ruleset and updated whenever a new one is found.

Hrm.  Basically rules for Received headers.  Have to think about that.  It's
doable, but I don't know if it'd be worth doing.  The performance is likely
going to be less than the current method, but more flexible.

And if we actually do keep up on our maintenance release schedules, I'm not
sure the trade off is worth it.

Hrm...

-- 
Randomly Selected Tagline:
"We must respect the other fellow's religion, but only in the sense and
 to the extent that we respect his opinion that his wife is beautiful
 and his children, smart."               - H.L. Mencken

Re: Received header parsing

Posted by Michael Parker <pa...@pobox.com>.
Theo Van Dinter wrote:
> Thoughts?  (happily, I was able to shrink the list of 3500 unparsed relays
> down to ~50 unique formats)
> 

Not many but I will toss out an idea that I like to promote every now
and then.  It sure would be nice if we could come up with some sort of
syntax that we could define various received header layouts.  Then they
could just be part of the ruleset and updated whenever a new one is found.

Possibly not doable, but interesting none the less.

Michael