You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jesse Norell <je...@kci.net> on 2014/09/03 23:17:08 UTC

save resolved IP in bayes

Hello,

  Looking at recent botnet spam, comparing messages from one day to the
next, I see new URL's being advertised that resolve to the same IP
address as ones in the past.  Eg. some at http://pastie.org/9525224

The first of those was already on URIBL/RBL lists when it hit, but the
others were not - they all resolve to the same IP address.  The message
are hitting BAYES_50, on fairly well trained databases.  I dug around
some and as best I can tell, SpamAssassin does not resolve the IP
addresses of URL's and add them to Bayes when training, is that correct?
Would it not make sense to do so?

I could write a program to extract url's and add a X-URL-IP header or
something which bayes could use, but would this not be useful enough to
be in the normal part of training?

Also in the discussion, am I correct that a spamassassin "rule" wouldn't
be what does that, you would have to write a plugin?

Thanks,
Jesse

-- 
Jesse Norell
Kentec Communications, Inc.
970-522-8107  -  www.kci.net


Re: save resolved IP in bayes

Posted by RW <rw...@googlemail.com>.
On Wed, 03 Sep 2014 15:47:05 -0600
Jesse Norell wrote:


> Hmm, ok.  Without "hapaxes" enabled, how many hits on a token do you
> need for it to start being useful?

Don't disable hapaxes without good empirical evidence that it provides
you with a benefit. Typically it reduces accuracy without any benefit
at all. 

The documentation used to say it greatly reduces the size of the
database, but that's not true.  

> I actually meant to clarify that a plugin is what would need to
> perform the IP lookup and add it as a bayes token.  You can't say
> "increment URL_IP:x.x.x.x spam token count when training" in the rule
> language. (I've written some rules, never delved into plugins.)

DNS requests are normally sent out as early as possible, Bayes and other
text based rules are then processed in parallel with the the network
round-trips. If you try this don't be surprised if it's unreliable, or
doesn't work without deferring bayes.

Re: save resolved IP in bayes

Posted by Jesse Norell <je...@kci.net>.
On Wed, 2014-09-03 at 23:36 +0200, Axb wrote:
> On 09/03/2014 11:17 PM, Jesse Norell wrote:
> > Hello,
> >
> >    Looking at recent botnet spam, comparing messages from one day to the
> > next, I see new URL's being advertised that resolve to the same IP
> > address as ones in the past.  Eg. some at http://pastie.org/9525224
> >
> > The first of those was already on URIBL/RBL lists when it hit, but the
> > others were not - they all resolve to the same IP address.  The message
> > are hitting BAYES_50, on fairly well trained databases.  I dug around
> > some and as best I can tell, SpamAssassin does not resolve the IP
> > addresses of URL's and add them to Bayes when training, is that correct?
> > Would it not make sense to do so?
> 
> SA does query BLs for a domain's A record's IP.
> There are not many public lists which make a point of listing these.
> the SBL lookups are probably the most efficient.
> URIBL_SBL_A for the A rec's IP and
> URIBL_SBL for the NS rec's IP
> 
> > I could write a program to extract url's and add a X-URL-IP header or
> > something which bayes could use, but would this not be useful enough to
> > be in the normal part of training?
> 
> Imo, unless you have hundreds of these withing a couple of minutes it 
> won't make a much of a difference

Hmm, ok.  Without "hapaxes" enabled, how many hits on a token do you
need for it to start being useful?



> > Also in the discussion, am I correct that a spamassassin "rule" wouldn't
> > be what does that, you would have to write a plugin?
> 
> iirc, there isn't a _URI_ template tag for addheader "rules"
> You could open a  bug & request such a feature to be added.
> (https://issues.apache.org/SpamAssassin/)

I actually meant to clarify that a plugin is what would need to perform
the IP lookup and add it as a bayes token.  You can't say "increment
URL_IP:x.x.x.x spam token count when training" in the rule language.
(I've written some rules, never delved into plugins.)


-- 
Jesse Norell
Kentec Communications, Inc.
970-522-8107  -  www.kci.net


Re: save resolved IP in bayes

Posted by Axb <ax...@gmail.com>.
On 09/03/2014 11:17 PM, Jesse Norell wrote:
> Hello,
>
>    Looking at recent botnet spam, comparing messages from one day to the
> next, I see new URL's being advertised that resolve to the same IP
> address as ones in the past.  Eg. some at http://pastie.org/9525224
>
> The first of those was already on URIBL/RBL lists when it hit, but the
> others were not - they all resolve to the same IP address.  The message
> are hitting BAYES_50, on fairly well trained databases.  I dug around
> some and as best I can tell, SpamAssassin does not resolve the IP
> addresses of URL's and add them to Bayes when training, is that correct?
> Would it not make sense to do so?

SA does query BLs for a domain's A record's IP.
There are not many public lists which make a point of listing these.
the SBL lookups are probably the most efficient.
URIBL_SBL_A for the A rec's IP and
URIBL_SBL for the NS rec's IP

> I could write a program to extract url's and add a X-URL-IP header or
> something which bayes could use, but would this not be useful enough to
> be in the normal part of training?

Imo, unless you have hundreds of these withing a couple of minutes it 
won't make a much of a difference

> Also in the discussion, am I correct that a spamassassin "rule" wouldn't
> be what does that, you would have to write a plugin?

iirc, there isn't a _URI_ template tag for addheader "rules"
You could open a  bug & request such a feature to be added.
(https://issues.apache.org/SpamAssassin/)