You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Theo Van Dinter <fe...@apache.org> on 2006/09/14 06:35:14 UTC
Received header parsing
So I'm trying to go through all of my UNPARSEABLE_RELAY hits and see what I
can do. The question is: how much data is required to come out of the header
to make it worth parsing? conversely, when should we just ignore a header?
For example:
from (1.2.3.4) by host.example.com via smtp id
03d4_20d285fe_3c6f_11db_828f_0013725b2d50; Mon, 04 Sep 2006 19:43:12 -0400
from ([172.16.1.78]) by email2.codeworksonline.com with Microsoft
SMTPSVC(5.0.2195.6713); Wed, 6 Sep 2006 21:14:29 -0400
Judging by "with HTTP", ip and by is enough.
(from someone@example.com) by m06.lax.untd.com (jqueuemail) id LRVB3JAJ; Fri,
02 Jun 2006 08:15:21 PDT
from CNNIMAIL12.CNN.COM by CNNIMAIL12.CNN.COM (LISTSERV-TCP/IP release 1.8d)
with spool id 35469828 for TEXTBREAKINGNEWS@CNNIMAIL12.CNN.COM; Tue, 23 May
2006 11:01:27 -0400
from EXCL.hq.corp.pbs.org (mail1.hq.corp.pbs.org) by listserv.pbs.org (LSMTP
for Windows NT v1.1b) with SMTP id <0....@listserv.pbs.org>; Mon, 22 May
2006 9:43:25 -0400
from PRODWEB02LA by fireball.treehousei.com (Merak 8.5.0-2) with SMTP id
FTM10106 for <us...@example.com>; Tue, 27 Jun 2006 06:11:06 -0700
from mailer by www.ennmagazine.com with HTTP (Mail); Fri, 1 Sep 2006 10:50:01
-0400
Without an IP I think these are useless...?
from FNCLISTSRV (10.6.147.53:1028) by listserv.foxnews.com (LSMTP for Windows
NT v1.1b) with SMTP id <23...@listserv.foxnews.com>; Tue, 8 Aug 2006
13:45:26 -0400
This is actually parseable. I think it's helo, ip, by, and id.
Thoughts? (happily, I was able to shrink the list of 3500 unparsed relays
down to ~50 unique formats)
--
Randomly Selected Tagline:
"Communist revolutionaries taking over the server room and demanding
all the computers in the building or they shoot the sysadmin."
- Today's BOFH Excuse
Re: Received header parsing
Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Sep 13, 2006 at 11:51:58PM -0500, Michael Parker wrote:
> Not many but I will toss out an idea that I like to promote every now
> and then. It sure would be nice if we could come up with some sort of
> syntax that we could define various received header layouts. Then they
> could just be part of the ruleset and updated whenever a new one is found.
Hrm. Basically rules for Received headers. Have to think about that. It's
doable, but I don't know if it'd be worth doing. The performance is likely
going to be less than the current method, but more flexible.
And if we actually do keep up on our maintenance release schedules, I'm not
sure the trade off is worth it.
Hrm...
--
Randomly Selected Tagline:
"We must respect the other fellow's religion, but only in the sense and
to the extent that we respect his opinion that his wife is beautiful
and his children, smart." - H.L. Mencken
Re: Received header parsing
Posted by Michael Parker <pa...@pobox.com>.
Theo Van Dinter wrote:
> Thoughts? (happily, I was able to shrink the list of 3500 unparsed relays
> down to ~50 unique formats)
>
Not many but I will toss out an idea that I like to promote every now
and then. It sure would be nice if we could come up with some sort of
syntax that we could define various received header layouts. Then they
could just be part of the ruleset and updated whenever a new one is found.
Possibly not doable, but interesting none the less.
Michael