You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2009/04/30 00:24:10 UTC

Re: [SA] 419 emailBL?

Mike Cardwell wrote:
>>> For listing both emails and uri's it would be useful if you could add
>>> regular expressions. [...]

Steve Freegard responded:
>> Yuck; if you want to do stuff using regexp then:
>>
>> uri RULE_NAME /<regexp>/
>> score RULE_NAME nn.nnn
>>
>> Is the best way to do this - not via DNS.

Mike Cardwell defended:
> Depends what you're trying to achieve. I thought the objective was a
> block list of email addresses that could be queried via the DNS by any
> application... Your suggestion doesn't really capture the requirements.
> 
> In this particular example, the list should be used for preventing your
> users sending emails *to* those addresses. Many organisations rightly or
> wrongly don't perform spam filtering on their outgoing relays so
> spamassassin is a bit over the top when you can just use another dns
> based bl.

If by "any application" you mean "any application that can handle
full-blown perl regular expressions" ... your regex examples are
nontrivial, so you're already pretty much catering to SA anyway.

There's also the question of handling quotes and other forbidden
characters in the TXT field, plus its length limit.  Once that's all
solved, the question of feasibility and efficiency still looms.

Given the options of putting that kind of thing in (A) DNS or (B)
sa-channels, I'd lean towards (B) on the way to (C) something else:

I'm sure Justin Mason (for his sought channel) has thought long and
hard about this.  The mechanism for sa-update is brilliant, but
doesn't lend itself to enormous indices of frequently-changing
rulesets.  Even if it were revised to enable a diff/patch system (hint
hint), it would still fail to distribute the remaining load.

Justin:  Perhaps sa-update could support [version].torrent in addition
to [version].tar.gz on each mirror?  (This doesn't touch the current
DNS-based version/announce system.)  Channels hosted for versions of
SA after the supporting release (e.g. 0.4.3.[channel] and "higher")
would be allowed to host only the torrent file.

Either the self-healing nature of BT would implement the diffing
portion for free, or SA's BT client would merely choose which files in
the torrent to download (assuming there are perl-based clients that
support that... libtorrent does, but that's C-based), as it would
contain full.cf, [n-1].diff, [n-2].diff, [n-3].diff, and [last release
yesterday].diff (or the like).

... this is similar to my proposal for a distributed Blue Frog rehash,
http://khopesh.com/wiki/Ending_spam

-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam

Re: [SA] 419 emailBL?

Posted by Adam Katz <an...@khopis.com>.
>> And if bandwidth at the server is a problem, would publishing the ruleset
>> updates via the Coral Cache network work?
> 
> Unfortunately, no.  In fact, they kind of suck as a CDN.  We
> originally were putting updates through there and would regularly have
> issues w/ 404s, corrupt or incomplete downloads, etc.
> 
> It may have improved since the 2005 or so timeframe when we started w/
> updates, but ...  Haven't checked in a while.

Still has the same issues.  I'll be removing them from my sa-update
channels mirror files very soon.

Re: 419 emailBL?

Posted by John Hardin <jh...@impsec.org>.
On Wed, 29 Apr 2009, Theo Van Dinter wrote:

> On Wed, Apr 29, 2009 at 8:06 PM, John Hardin <jh...@impsec.org> wrote:
>>> And 135k doesn't add up to a lot of bandwidth?
>> And if bandwidth at the server is a problem, would publishing the ruleset
>> updates via the Coral Cache network work?
>
> Unfortunately, no.  In fact, they kind of suck as a CDN.  We
> originally were putting updates through there and would regularly have
> issues w/ 404s, corrupt or incomplete downloads, etc.
>
> It may have improved since the 2005 or so timeframe when we started w/
> updates, but ...  Haven't checked in a while.

I've edited my MIRRORED.BY, we'll see how it goes...

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The real opiate of the masses isn't religion; it's the belief that
   somewhere there is a benefit that can be delivered without a
   corresponding cost.                       -- Tom of "Radio Free NJ"
-----------------------------------------------------------------------
  9 days until the 64th anniversary of VE day

Re: 419 emailBL?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Apr 29, 2009 at 8:06 PM, John Hardin <jh...@impsec.org> wrote:
>> And 135k doesn't add up to a lot of bandwidth?
>
> ...so don't look for updates more than once every day or two.

Yeah, but I think the point was that a frequently changing ruleset
would be downloaded frequently.

> And if bandwidth at the server is a problem, would publishing the ruleset
> updates via the Coral Cache network work?

Unfortunately, no.  In fact, they kind of suck as a CDN.  We
originally were putting updates through there and would regularly have
issues w/ 404s, corrupt or incomplete downloads, etc.

It may have improved since the 2005 or so timeframe when we started w/
updates, but ...  Haven't checked in a while.

Re: 419 emailBL?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Apr 29, 2009 at 7:56 PM, Adam Katz <an...@khopis.com> wrote:
>> I guess it depends what you mean by "enormous".  A sought rule update is 135k.
>
> And 135k doesn't add up to a lot of bandwidth?  I suppose it depends
> on the number of users, and I'm figuring worst-case scenario, e.g.
> when/if it ships enabled in the default SA install.

Well, it depends what you're measuring.  :)

The update itself isn't large, it's just 135k, which is the not
"enormous" bit.  135k in and of itself is a pretty tiny file, but I'm
not sure what "enormous" means in this context -- megs?  gigs?

The aggregate bandwidth could very well be large, depending on update
publish frequency, client update frequency, number of clients, client
bandwidth, etc.  From what I've seen, the standard SA updates w/ the
same ~130k size and the current number of users ... isn't a lot of
bandwidth.

There are some pretty standard ways to deal with this issue though, such as:

a) have lots of mirrors, same idea as your P2P idea though less
dynamic  (oh, that was another thought I had ... go short of using
torrents since they're resource heavy and instead make our own P2P
protocol doing a dynamic http/mirrored.by system)

b) split the channel into a frequent / not frequent channel (or stable
/ testing, or split based on content, or ...) for patterns which don't
change often, there's no reason to keep sending them out.  same idea I
mentioned before.

c) shrink or hold update size steady in face of updates.  hard.

d) make updates less frequently.  defeats the purpose?  clearly every
15m is different than every day is different than weekly ...


To be perfectly honest, I really don't worry about the "omg, update
bandwidth" issue right now.  I worry that there aren't enough updates
right now.  The only auto-generated one, sought, is daily, and the
manual ones now are more than weekly on average.  I don't know if
sought could even be produced faster, you need a certain amount of
incoming ham and spam to sample and produce test rules, and enough
diversity of mails to test against to avoid "obvious" bad rules...

Re: 419 emailBL?

Posted by John Hardin <jh...@impsec.org>.
On Wed, 29 Apr 2009, Adam Katz wrote:

> Theo Van Dinter wrote:
>> On Wed, Apr 29, 2009 at 6:24 PM, Adam Katz <an...@khopis.com> wrote:
>>> The mechanism for sa-update is brilliant, but
>>> doesn't lend itself to enormous indices of frequently-changing rulesets.
>>
>> I guess it depends what you mean by "enormous".  A sought rule update is 135k.
>
> And 135k doesn't add up to a lot of bandwidth?

...so don't look for updates more than once every day or two.

And if bandwidth at the server is a problem, would publishing the ruleset 
updates via the Coral Cache network work?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A superior gunman is one who uses his superior judgment to keep
   himself out of situations that would require the use of his
   superior skills.
-----------------------------------------------------------------------
  9 days until the 64th anniversary of VE day

Re: 419 emailBL?

Posted by Adam Katz <an...@khopis.com>.
Theo Van Dinter wrote:
> On Wed, Apr 29, 2009 at 6:24 PM, Adam Katz <an...@khopis.com> wrote:
>> The mechanism for sa-update is brilliant, but
>> doesn't lend itself to enormous indices of frequently-changing rulesets.
> 
> I guess it depends what you mean by "enormous".  A sought rule update is 135k.

And 135k doesn't add up to a lot of bandwidth?  I suppose it depends
on the number of users, and I'm figuring worst-case scenario, e.g.
when/if it ships enabled in the default SA install.

> The likelihood is, imo, that you would probably split up your updates
> into multiple channels before they really got out of control in size.
> For example, you could do something like a weekly, daily, and
> sub-daily channel, and move rules appropriately between them.  Yes, a
> little more of a PITA for clients, but how much churn do you really
> expect?

How about hierarchical channel support, e.g. a channel's MIRRORED.BY
file is merely itself a sa-update-channels file.

>> Justin:  Perhaps sa-update could support [version].torrent in addition
>> to [version].tar.gz on each mirror?  (This doesn't touch the current
>> DNS-based version/announce system.)  Channels hosted for versions of
>> SA after the supporting release (e.g. 0.4.3.[channel] and "higher")
>> would be allowed to host only the torrent file.
> 
> I had actually thought about doing a P2P sa-update so as to better
> withstand DoS issues, skip the need for a mirrored.by file, etc.  But
> the main issue is that most channel updates are rather small, and so
> therefore the downloads are rather fast.  Compared to doing a torrent,
> which takes relatively a long time to get setup, and just as you
> start, you're done.  Also, it means clients are serving data, which
> makes the "quick sa-update and move on" more of a procedure and you
> have to worry about remote connectivity, etc, etc.
> 
> In the end it didn't seem worthwhile beyond the security aspect, so I
> didn't move beyond the "thinking about" stage.
> 
> (and yes, I know I'm not Justin. ;))

You're close enough on the SA development order.  For BT, I was
actually envisioning much larger rulesets with sought merely heralding
a future with lots of large auto-generated rulesets, but perhaps it
doesn't scale at the right point.  I think I'm trying to squeeze to
much :-p

-- 
Adam Katz
khopesh on irc://irc.freenode.net/#spamassassin
http://khopesh.com/Anti-spam

Re: [SA] 419 emailBL?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Apr 29, 2009 at 6:24 PM, Adam Katz <an...@khopis.com> wrote:
> The mechanism for sa-update is brilliant, but
> doesn't lend itself to enormous indices of frequently-changing rulesets.

I guess it depends what you mean by "enormous".  A sought rule update is 135k.

The likelihood is, imo, that you would probably split up your updates
into multiple channels before they really got out of control in size.
For example, you could do something like a weekly, daily, and
sub-daily channel, and move rules appropriately between them.  Yes, a
little more of a PITA for clients, but how much churn do you really
expect?

> Justin:  Perhaps sa-update could support [version].torrent in addition
> to [version].tar.gz on each mirror?  (This doesn't touch the current
> DNS-based version/announce system.)  Channels hosted for versions of
> SA after the supporting release (e.g. 0.4.3.[channel] and "higher")
> would be allowed to host only the torrent file.

I had actually thought about doing a P2P sa-update so as to better
withstand DoS issues, skip the need for a mirrored.by file, etc.  But
the main issue is that most channel updates are rather small, and so
therefore the downloads are rather fast.  Compared to doing a torrent,
which takes relatively a long time to get setup, and just as you
start, you're done.  Also, it means clients are serving data, which
makes the "quick sa-update and move on" more of a procedure and you
have to worry about remote connectivity, etc, etc.

In the end it didn't seem worthwhile beyond the security aspect, so I
didn't move beyond the "thinking about" stage.


(and yes, I know I'm not Justin. ;))

Re: [SA] 419 emailBL?

Posted by Mike Cardwell <sp...@lists.grepular.com>.
Adam Katz wrote:

>>>> For listing both emails and uri's it would be useful if you could add
>>>> regular expressions. [...]
> 
> Steve Freegard responded:
>>> Yuck; if you want to do stuff using regexp then:
>>>
>>> uri RULE_NAME /<regexp>/
>>> score RULE_NAME nn.nnn
>>>
>>> Is the best way to do this - not via DNS.
> 
> Mike Cardwell defended:
>> Depends what you're trying to achieve. I thought the objective was a
>> block list of email addresses that could be queried via the DNS by any
>> application... Your suggestion doesn't really capture the requirements.
>>
>> In this particular example, the list should be used for preventing your
>> users sending emails *to* those addresses. Many organisations rightly or
>> wrongly don't perform spam filtering on their outgoing relays so
>> spamassassin is a bit over the top when you can just use another dns
>> based bl.
> 
> If by "any application" you mean "any application that can handle
> full-blown perl regular expressions" ... your regex examples are
> nontrivial, so you're already pretty much catering to SA anyway.

You completely misunderstood what I was suggesting. On the server side I 
shove this in my list:

^foo-\d+@example\.com$

Then when the client looks up foo-5@example.com I return a positive 
result. The client needs no regex capability.

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)