You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mark Martinec <Ma...@ijs.si> on 2006/12/10 22:24:05 UTC

Re: p0f and IP distance (why?)

Peter,

> Can someone explain the concept behind penalizing SMTP clients who
> connect many hops away?

I only slightly favourize nearby hosts, not penalize distant hosts.

> I have OS detection running and I am very pleased so far.  I'm now
> wondering about this second aspect of p0f.

Take a look at the following diagram:

  http://www.ijs.si/software/amavisd/fig1.gif

It is a scattergram of few weeks' worth of mail, plotting a dot
for each message: spam score on y axis, IP distance on the x-axis.
Since the distance is an integer, I added a +-0.5 of white noise
to each IP distance to enhance visibility.

Each site would see a different result from their data. It depends on site 
connectivity, chain of upstream providers, commercial/academic/... profile,
type of users profile and their usual correspondents, ...

In my particular case, the diagram shows a pronounced gap near IP distance 12.
Hosts nearer mostly belong to European academic networks (Geant) or Slovenian 
service providers. Hosts further away are North America and further to the 
right Asia etc. Of course this is just a simplification, there are expections
all over.

Nevertheless, in my particular case, it seems that nearby hosts produce
significantly less spam than the rest, and include a majority of 
correspondents of our users. So it makes sense to deduce about 1 score point
for nearby hosts, which is what my example set of rules does.

Each site has its own unique profile. Collect data, see if there are any
regularities observed, and if so, spam score may get a contribution form
IP distance data. Whethet the score contribution is positive or negative
depends on each site.

  Mark