You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2011/04/26 21:22:46 UTC

Re: Amazon S3 triggering FPs with SPOOF_COM* rules

On 03/24/2011 05:44 PM, Jason Haar wrote:
> Apparently when you use sharethis.com (who use S3 for hosting services)
> to send out links, the links look like
> 
> hXXp://img.sharethis.com *DOT* s3.amazonaws.com
> 
> I imagine from this that ANY .com domain using Amazon S3 services would
> create similar URLs?
> 
> This causes SPOOF_COM* rules to trigger
> 
>         *  3.0 SPOOF_COM2OTH URI: URI contains ".com" in middle
>         *  1.6 SPOOF_COM2COM URI: URI contains ".com" in middle and end
> 
> Owch. So there's a big class of FPs happening there, and I'd say there's
> redundancy in those rules? i.e. is 4.6 really an appropriate score for
> *one* img link?

Not necessarily a perfect fix, but I've checked in r1096851 which
specifically excludes S3 from these rules.  Note that most CDNs are .net
(like Coral CDN, e.g. www.spamassassin.org.nyud.net) and therefore won't
hit _COM2COM.  Coral doesn't tack on enough subdomain levels to trigger
COM2OTH.

There's still the issue of perhaps wanting these rules to be mutually
exclusive.  Maybe SPOOF_COM2OTH, which is currently (with my edit):

m{^https?://(?:\w+\.)+?com\.(?!s3\.amazonaws\.com)(?:\w+\.){2}}i

Should become this:

m{^https?://(?:\w+\.)+?com\.(?:\w+\.){2,}?(?!com\b)}i

(oops, the other rules should be com\b too.  checked in as r1096857.)