You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2008/02/18 11:50:18 UTC

Re: SVN notifications killing spamassassin

Eric A. Hall writes:
> I sometimes get SVN notifications that contain lists of files and their
> status. The filenames will often get picked up by the URI matching
> algorithm, each of which end up being processed through numerous lookups
> (URICOUNTRY, my LDAP filter, etc). Sometimes I get very large messages
> with hundreds of file lists, which in turn causes spamassassin to go into
> never-never land while it thinks about the hundreds of "URI" matches.
> 
> For example,
> 
>   A    fpo/reports/perl/nagios_notifications1.pl.bak
>   A    foo/reports/perl/nagios_outages1.pl
>   A    foo/reports/perl/GWIR.pm
> 
> nagios_outages1.pl will be determined as a URI for .pl domain and GWIR.pm
> will be determined as a URI for .pm domain, and so forth. The only way to
> get these messages through is to disable spamassassin...
> 
> I've updated to 3.2.4 just now and it still has the same problem
> 
> I'm guessing the URI analyzer needs to be smarter.

The URI analyzer already is smarter ;)

Changing the URICountry plugin is the way to fix this.

The Mail/SpamAssassin/Plugin/URIDetail plugin is a good example of how
plugins can get metadata about the URIs via the get_uri_detail_list() API.
looking at the POD doc and source for that
in Mail/SpamAssassin/PerMsgStatus, I see that "types" == "parsed" should
mean that the URI was inferred, instead of found in a link or image.
URICountry should ignore URIs of that type.

--j.

Re: SVN notifications killing spamassassin

Posted by "Eric A. Hall" <eh...@ehsco.com>.
On 2/18/2008 5:50 AM, Justin Mason wrote:
> Eric A. Hall writes:
>> I sometimes get SVN notifications that contain lists of files and their
>> status. The filenames will often get picked up by the URI matching
>> algorithm, each of which end up being processed through numerous lookups
>> (URICOUNTRY, my LDAP filter, etc). Sometimes I get very large messages
>> with hundreds of file lists, which in turn causes spamassassin to go into
>> never-never land while it thinks about the hundreds of "URI" matches.
>>
>> For example,
>>
>>   A    fpo/reports/perl/nagios_notifications1.pl.bak
>>   A    foo/reports/perl/nagios_outages1.pl
>>   A    foo/reports/perl/GWIR.pm
>>
>> nagios_outages1.pl will be determined as a URI for .pl domain and GWIR.pm
>> will be determined as a URI for .pm domain, and so forth. The only way to
>> get these messages through is to disable spamassassin...
>>
>> I've updated to 3.2.4 just now and it still has the same problem
>>
>> I'm guessing the URI analyzer needs to be smarter.
> 
> The URI analyzer already is smarter ;)
> 
> Changing the URICountry plugin is the way to fix this.

It doesn't appear to be URICountry that's dying. Either way though, I bet
all of the plugins will perform a lot better when they are no longer being
passed filenames.


-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/