You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Steve Freegard <st...@stevefreegard.com> on 2010/09/17 15:11:41 UTC

New plugin: DecodeShortURLs

Hi All,

Recently I've been getting a bit of filter-bleed from a bunch of spams 
injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foo 
that upon closer inspection would have been rejected with a high score 
if the real URL had been used.

To that end - it annoyed me enough to write a plug-in that decodes the 
shortened URL using an HTTP HEAD request to extract the location header 
sent by the shortening service and to put this into the list of 
extracted URIs for other plug-ins to find (such as URIDNSBL).

On the messages I tested it with - it raised the scores from <5 to >10 
based on URIDNSBL hits which is just what I wanted.

Hopefully it will be useful to others; you can grab it from:

http://www.fsl.com/support/DecodeShortURLs.pm
http://www.fsl.com/support/DecodeShortURLs.cf

Kind regards,
Steve.

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

On 22/09/10 13:44, Michael Scheidell wrote:
> one more: if # url_shortener_cache /tmp/DecodeShortURLs.sq3
> you should not try to load SQLLite.pm.
>
> ent host [79.98.90.156] blocked using zen.spamhaus.org;
> http://www.spamhaus.org/query/bl?ip=79.98.90.156;
> from=<ma...@rossatogroup.com> to=<he...@mcclancy.com>
> proto=ESMTP helo=<MEDMAVVR>
> Sep 22 08:38:40 sns amavis[77402]: (!)_DIE: Can't locate DBD/SQLite.pm
> in @INC (@INC contains: lib /usr/local/lib/perl5/5.8.9/BSDPAN
> /usr/local/lib/perl5/site_perl/5.8.9/mach
> /usr/local/lib/perl5/site_perl/5.8.9 /usr/local/lib/perl5/5.8.9/mach
> /usr/local/lib/perl5/5.8.9) at
> /usr/local/etc/mail/spamassassin/DecodeShortURLs.pm line 84.

There are lots of plug-ins that use exactly the same code to test if 
modules are installed or not as that's why I did it that way.

> diff -bBru DecodeShortURLs.pm /tmp
> --- DecodeShortURLs.pm 2010-09-22 08:41:55.000000000 -0400
> +++ /tmp/DecodeShortURLs.pm 2010-09-20 11:13:21.000000000 -0400
> @@ -81,7 +81,7 @@
>
> use constant HAS_LWP_USERAGENT => eval { require LWP::UserAgent; };
> use constant HAS_FCNTL => eval { require Fcntl; };
> -use constant HAS_SQLITE => eval { require DBD::SQLite; } if
> url_shortener_cache;
> +use constant HAS_SQLITE => eval { require DBD::SQLite; };

That's of no use at all (you got the diff arguments backwards BTW) as 
you can't know if the option is enabled in the .cf file yet as it hasn't 
been read yet...

That's why it's testing the return of the require in the eval{} block to 
set the constant for later testing.

I suggest you check your amavis debug/log settings as it looks like 
amavis is setting something like $SIG{__DIE__} and reporting it to your 
logs in the signal handler.  That's fine for debugging - but you're 
going to get other noise from things like eval{} blocks such as this and 
is not a bug.

Regards,
Steve.

Re: New plugin: DecodeShortURLs

Posted by Michael Scheidell <mi...@secnap.com>.

  On 9/20/10 11:33 AM, Steve Freegard wrote:
> On 20/09/10 15:28, Bowie Bailey wrote:
>>
>> You can get rid of the 'backslashitis' by using a different delimiter.
>>
>> uri  URI_BITLY_BLOCKED  m~^http://bit\.ly/a/warning~i
>>
>> You still need to escape the period, but since the tilde (~) is now the
>> delimiter rather than the slash, you don't need to escape all the
>> slashes.  This is very useful for URI patterns!  Just remember that you
>> will now need to escape the new delimiter if it appears in the regex.
>>
one more:  if # url_shortener_cache /tmp/DecodeShortURLs.sq3
you should not try to load SQLLite.pm.

ent host [79.98.90.156] blocked using zen.spamhaus.org; 
http://www.spamhaus.org/query/bl?ip=79.98.90.156; 
from=<ma...@rossatogroup.com> to=<he...@mcclancy.com> 
proto=ESMTP helo=<MEDMAVVR>
Sep 22 08:38:40 sns amavis[77402]: (!)_DIE: Can't locate DBD/SQLite.pm 
in @INC (@INC contains: lib /usr/local/lib/perl5/5.8.9/BSDPAN 
/usr/local/lib/perl5/site_perl/5.8.9/mach 
/usr/local/lib/perl5/site_perl/5.8.9 /usr/local/lib/perl5/5.8.9/mach 
/usr/local/lib/perl5/5.8.9) at 
/usr/local/etc/mail/spamassassin/DecodeShortURLs.pm line 84.

  diff -bBru DecodeShortURLs.pm /tmp
--- DecodeShortURLs.pm    2010-09-22 08:41:55.000000000 -0400
+++ /tmp/DecodeShortURLs.pm    2010-09-20 11:13:21.000000000 -0400
@@ -81,7 +81,7 @@

  use constant HAS_LWP_USERAGENT => eval { require LWP::UserAgent; };
  use constant HAS_FCNTL => eval { require Fcntl; };
-use constant HAS_SQLITE => eval { require DBD::SQLite; } if 
url_shortener_cache;
+use constant HAS_SQLITE => eval { require DBD::SQLite; };

  sub dbg {
    my $msg = shift;

>
> Thanks for the tip; I did know about using different delimiters - but 
> using / is force of habit ;-)
>
> I'll try and remember to use something different for uri rules.
>
> Cheers,
> Steve.
>

-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
 > *| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best in Email Security,2010: Network Products Guide
    * King of Spam Filters, SC Magazine 2008

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

On 20/09/10 15:28, Bowie Bailey wrote:
>
> You can get rid of the 'backslashitis' by using a different delimiter.
>
> uri  URI_BITLY_BLOCKED  m~^http://bit\.ly/a/warning~i
>
> You still need to escape the period, but since the tilde (~) is now the
> delimiter rather than the slash, you don't need to escape all the
> slashes.  This is very useful for URI patterns!  Just remember that you
> will now need to escape the new delimiter if it appears in the regex.
>

Thanks for the tip; I did know about using different delimiters - but 
using / is force of habit ;-)

I'll try and remember to use something different for uri rules.

Cheers,
Steve.

Re: New plugin: DecodeShortURLs

Posted by Bowie Bailey <Bo...@BUC.com>.

 On 9/20/2010 8:15 AM, Steve Freegard wrote:
> On 17/09/10 14:48, RW wrote:
>>
>> I think it might be better to take the "blocked page" handling out of
>> the perl and turn it into an ordinary uri rule.
>>
>
> Yeah; really don't know why I did it like that in the first place.
>
> I've just uploaded version 0.2 which does it this way instead and adds
> the following rule:
>
> uri  URI_BITLY_BLOCKED  /^http:\/\/bit\.ly\/a\/warning/i

You can get rid of the 'backslashitis' by using a different delimiter.

uri  URI_BITLY_BLOCKED  m~^http://bit\.ly/a/warning~i

You still need to escape the period, but since the tilde (~) is now the
delimiter rather than the slash, you don't need to escape all the
slashes.  This is very useful for URI patterns!  Just remember that you
will now need to escape the new delimiter if it appears in the regex.

-- 
Bowie

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

On 20/09/10 16:17, Michael Scheidell wrote:
> On 9/20/10 8:15 AM, Steve Freegard wrote:
>> Caching; if desired it will now cache URLs to a SQLite database for
>> additional speed-up and to prevent DoS of the shortener services.

> any anticipated write lock problems with this due to sqlite not handling
> multi-threaded reads/writes?

No; the module handles locking just fine.  If you get two writes at 
once; the first will block until the other completes.  Being as this is 
a very simple database this is unlikely to cause any problems.

Also; if you're really concerned about this or have reason to believe 
you'll have high enough concurrency on the local machine for this to be 
a problem, then see http://www.sqlite.org/wal.html which is new in 
SQLite 3.7.

> most (many?) SA installs already have db4. I guess maybe, hey, its open
> source, get out your flowchart guys and write the db4 module :-)

I didn't use db4 as I'm not familiar enough with it.  I also needed more 
than just a key->value store to handle the expiry of old data 
efficiently using indexed deletes and I've no idea how to do that in 
db4.  So patches welcome if someone really wants this.

Regards,
Steve.

Re: New plugin: DecodeShortURLs

Posted by Michael Scheidell <mi...@secnap.com>.

  On 9/20/10 8:15 AM, Steve Freegard wrote:
> Caching; if desired it will now cache URLs to a SQLite database for 
> additional speed-up and to prevent DoS of the shortener services.
any anticipated write lock problems with this due to sqlite not handling 
multi-threaded reads/writes?
most (many?) SA installs already have db4.  I guess maybe, hey, its open 
source, get out your flowchart guys and write the db4 module :-)

-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
 > *| *SECNAP Network Security Corporation
______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

On 17/09/10 14:48, RW wrote:
>
> I think it might be better to take the "blocked page" handling out of
> the perl and turn it into an ordinary uri rule.
>

Yeah; really don't know why I did it like that in the first place.

I've just uploaded version 0.2 which does it this way instead and adds 
the following rule:

uri  URI_BITLY_BLOCKED  /^http:\/\/bit\.ly\/a\/warning/i
describe URI_BITLY_BLOCKED Message contains a bit.ly URL that has been 
disabled due to abuse
score URI_BITLY_BLOCKED 10.0

This new version also contains the following:

1)  Improved the URI handling as I noticed it was checking some URIs 
that it shouldn't have been due to the URI embedding a shortener in a 
get variable.

2)  Log file; if desired it will output the unix time, short url and 
decoded URL to a log file for further analysis.

3)  Caching; if desired it will now cache URLs to a SQLite database for 
additional speed-up and to prevent DoS of the shortener services.

Cheers,
Steve.

Re: New plugin: DecodeShortURLs

Posted by RW <rw...@googlemail.com>.

On Fri, 17 Sep 2010 14:11:41 +0100
Steve Freegard <st...@stevefreegard.com> wrote:

> Hi All,
> 
> Recently I've been getting a bit of filter-bleed from a bunch of
> spams injected via Hotmail/Yahoo that contain shortened URLs e.g.
> bit.ly/foo that upon closer inspection would have been rejected with
> a high score if the real URL had been used.
> 
> To that end - it annoyed me enough to write a plug-in that decodes
> the shortened URL using an HTTP HEAD request to extract the location
> header sent by the shortening service and to put this into the list
> of extracted URIs for other plug-ins to find (such as URIDNSBL).
> 

I think it might be better to take the "blocked page" handling out of
the perl and turn it into an ordinary uri rule.

Re: New plugin: DecodeShortURLs

Posted by Eduardo Casarero <ec...@gmail.com>.

2010/9/17 Steve Freegard <st...@stevefreegard.com>

> Hi All,
>
> Recently I've been getting a bit of filter-bleed from a bunch of spams
> injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foothat upon closer inspection would have been rejected with a high score if
> the real URL had been used.
>
> To that end - it annoyed me enough to write a plug-in that decodes the
> shortened URL using an HTTP HEAD request to extract the location header sent
> by the shortening service and to put this into the list of extracted URIs
> for other plug-ins to find (such as URIDNSBL).
>
> On the messages I tested it with - it raised the scores from <5 to >10
> based on URIDNSBL hits which is just what I wanted.
>
> Hopefully it will be useful to others; you can grab it from:
>
> http://www.fsl.com/support/DecodeShortURLs.pm
> http://www.fsl.com/support/DecodeShortURLs.cf
>
> Kind regards,
> Steve.
>
> Thanks Steve! i will test it later!

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

On 17/09/10 14:33, Jari Fredriksson wrote:
>
>
> It has a typo.
>
> describe URIBL_SHORT	...
>
> The rule name is wrong, should be SHORT_URIBL
>
> Didn't you --lint it? ;)
>

Doh! - fixed.

Regards,
Steve.

Re: New plugin: DecodeShortURLs

Posted by Jari Fredriksson <ja...@iki.fi>.

On 17.9.2010 16:11, Steve Freegard wrote:
> Hi All,
> 
> Recently I've been getting a bit of filter-bleed from a bunch of spams
> injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foo
> that upon closer inspection would have been rejected with a high score
> if the real URL had been used.
> 
> To that end - it annoyed me enough to write a plug-in that decodes the
> shortened URL using an HTTP HEAD request to extract the location header
> sent by the shortening service and to put this into the list of
> extracted URIs for other plug-ins to find (such as URIDNSBL).
> 
> On the messages I tested it with - it raised the scores from <5 to >10
> based on URIDNSBL hits which is just what I wanted.
> 
> Hopefully it will be useful to others; you can grab it from:
> 
> http://www.fsl.com/support/DecodeShortURLs.pm
> http://www.fsl.com/support/DecodeShortURLs.cf


It has a typo.

describe URIBL_SHORT	...

The rule name is wrong, should be SHORT_URIBL

Didn't you --lint it? ;)




-- 

You will not be elected to public office this year.

Re: New plugin: DecodeShortURLs

Posted by David Touzeau <da...@touzeau.eu>.

Many thanks

ADDED in Artica web Open Source Interface !!

http://www.artica.fr/index.php/menudocmessaging/39-manage-filters-anti-spam-content-filters/391--shorturls-spam-checking-plugin-with-spamassassin



On 17/09/2010 15:11, Steve Freegard wrote:
> Hi All,
>
> Recently I've been getting a bit of filter-bleed from a bunch of spams
> injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foo
> that upon closer inspection would have been rejected with a high score
> if the real URL had been used.
>
> To that end - it annoyed me enough to write a plug-in that decodes the
> shortened URL using an HTTP HEAD request to extract the location header
> sent by the shortening service and to put this into the list of
> extracted URIs for other plug-ins to find (such as URIDNSBL).
>
> On the messages I tested it with - it raised the scores from <5 to >10
> based on URIDNSBL hits which is just what I wanted.
>
> Hopefully it will be useful to others; you can grab it from:
>
> http://www.fsl.com/support/DecodeShortURLs.pm
> http://www.fsl.com/support/DecodeShortURLs.cf
>
> Kind regards,
> Steve.
>

Re: New plugin: DecodeShortURLs

Posted by John Horne <jo...@plymouth.ac.uk>.

On Mon, 2010-10-04 at 22:55 +0100, John Horne wrote:
>
> I grabbed a copy of the above plugin and tried it this afternoon (on a
> CentOS 5.5 system). We log all our spamd messages to /var/log/maillog
> via syslog. For the plugin I disabled all the options except
> 'url_shortener_syslog' which was set to 1.
> 
> After restarting SpamAssassin we started to get some messages from spamd
> sent to /var/log/mailog and some sent to /var/log/messages.
>
Hello,

Well I suspect the problem is with the Sys::Syslog perl module. On our
CentOS 5.5 system we have perl 5.8 with version 0.13 of the module (this
is quite old). My Fedora 13 PC uses perl 5.10 with version 0.27 of the
module (the latest version). However, it seems there is a bug with that
version which causes it to ignore the facility - (fix here)
http://rt.cpan.org/Public/Bug/Display.html?id=55151

I have left the plugin enabled, but without using the syslog options.

I have had a look at the (0.13) syslog module, but can't really see
where the problem is. If I get more time, then I may try and debug it
further.

John.

-- 
John Horne, University of Plymouth, UK
Tel: +44 (0)1752 587287    Fax: +44 (0)1752 587001

Re: New plugin: DecodeShortURLs

Posted by Jason Bertoch <ja...@i6ix.com>.

On 2010/10/04 6:35 PM, Martin Gregorie wrote:
> Just a data point for you.
>
> I'm running DecodeShortURLs with the as-issued .cf file
> (log,cache,syslog options commented out).

I initially tried running the plugin with these options commented out, 
but it just doesn't work.  It needs those defined.  Once I uncommented, 
everything began working beautifully.

-- 
/Jason

Re: New plugin: DecodeShortURLs

Posted by Martin Gregorie <ma...@gregorie.org>.

On Mon, 2010-10-04 at 22:55 +0100, John Horne wrote:
> I grabbed a copy of the [DecodeShortURLs] plugin and tried it this afternoon (on a
> CentOS 5.5 system). We log all our spamd messages to /var/log/maillog
> via syslog. For the plugin I disabled all the options except
> 'url_shortener_syslog' which was set to 1.
> 
Just a data point for you.

I'm running DecodeShortURLs with the as-issued .cf file
(log,cache,syslog options commented out). It has found a couple of
shortened URLs too, but all its log messages have been sent
to /var/log/maillog. None were sent to /var/log/messages.

Martin

Re: New plugin: DecodeShortURLs

Posted by John Horne <jo...@plymouth.ac.uk>.

On Thu, 2010-09-23 at 11:30 +0100, Steve Freegard wrote:
> >
> > Hopefully it will be useful to others; you can grab it from:
> >
> > http://www.fsl.com/support/DecodeShortURLs.pm
> > http://www.fsl.com/support/DecodeShortURLs.cf
> >
> 
...
> 
> - Added option to allow logging to syslog (mail.info).
> 
Hello,

I grabbed a copy of the above plugin and tried it this afternoon (on a
CentOS 5.5 system). We log all our spamd messages to /var/log/maillog
via syslog. For the plugin I disabled all the options except
'url_shortener_syslog' which was set to 1.

After restarting SpamAssassin we started to get some messages from spamd
sent to /var/log/mailog and some sent to /var/log/messages. Not messages
from the plugin, but any messages from spamd. For example
(from /var/log/messages):

   Oct  4 22:28:50 pat sauser[31061]: spamd: checking message
   <79...@www.facebook.com> for sauser:10001
   Oct  4 22:28:56 pat sauser[31061]: spamd: clean message (-0.1/8.0)
   for sauser:10001 in 5.6 seconds, 7896 bytes. 
   Oct  4 22:28:56 pat sauser[31061]: spamd: result: . 0 -
   BAYES_00,DCC_CHECK,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,
     HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS
     scantime=5.6,size=7896,user=sauser,uid=10001,required_score=8.0,
     rhost=localhost.localdomain,raddr=127.0.0.1,rport=38700,
     mid=<79...@www.facebook.com>,
     bayes=0.000000,autolearn=no

The messages are not just being duplicated in both files, there are
different messages in each file. Our syslog.conf specifies:

    *.info;mail.none       /var/log/messages
    mail.*                 -/var/log/maillog

I tried changing DecodeShortURLs.pm calls to syslog to use 'info|mail'
and that made no difference. I also tried commenting out the 'syslog'
calls, and used backtick calls to '/usr/bin/logger' instead. The same
problem happened. If I take the plugin out, then all messages from spamd
go to /var/log/maillog as before.

Anyone any ideas as to what is going on?

Thanks,

John.

-- 
John Horne, University of Plymouth, UK
Tel: +44 (0)1752 587287    Fax: +44 (0)1752 587001

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

  On 01/01/11 12:02, Warren Togami Jr. wrote:
> http://www.surbl.org/faqs#redirect
> BTW, this page mentions SpamCopURI and urirhdbl as existing tools that 
> handle redirection to some degree.  Have you confirmed that you are 
> not needlessly reinventing the wheel?  It is entirely possible that 
> your design with suggestions here could be better than the existing 
> tools, but it might be worthwhile to look at the existing tools to see 
> if they have useful ideas to borrow.

You've got this way confused as to how it works or what it does;  from 
Mail::SpamAssassin::Conf:

        redirector_pattern  /pattern/modifiers
            A regex pattern that matches both the redirector site 
portion, and the target site portion of a URI.

            Note: The target URI portion must be surrounded in 
parentheses and
                  no other part of the pattern may create a backreference.

            Example: 
http://chkpt.zdnet.com/chkpt/whatever/spammer.domain/yo/dude

It's designed to get URIs from CGI redirectors and add them to the 
lookup lists for the URIBL plugin.   It's *nothing* like what I'm doing 
with the shorteners.

Regards,
Steve.

Re: New plugin: DecodeShortURLs

Posted by "Warren Togami Jr." <wt...@gmail.com>.

http://www.surbl.org/faqs#redirect
BTW, this page mentions SpamCopURI and urirhdbl as existing tools that
handle redirection to some degree.  Have you confirmed that you are not
needlessly reinventing the wheel?  It is entirely possible that your design
with suggestions here could be better than the existing tools, but it might
be worthwhile to look at the existing tools to see if they have useful ideas
to borrow.

Warren

Re: New plugin: DecodeShortURLs

Posted by "Warren Togami Jr." <wt...@gmail.com>.

On Thu, Jan 6, 2011 at 7:23 AM, Henrik K <he...@hege.li> wrote:

>
> There are lots of plugins out there that aren't part of the core for one
> reason or another. If you ask me, this is one of them. It just asks trouble
> widely used. It's not the only way to solve the problem anyway. And the
> problem itself is somewhat "temporary" in nature, just like image spam was
> etc.
>
>
I don't disagree, but I am wondering how is this "temporary"?

Warren

Re: New plugin: DecodeShortURLs

Posted by Henrik K <he...@hege.li>.

On Thu, Jan 06, 2011 at 07:05:05AM -1000, Warren Togami Jr. wrote:
> On Wed, Jan 5, 2011 at 2:41 AM, Warren Togami Jr. <wt...@gmail.com> wrote:
> 
> > The only trouble here is HTTP's TCP handshake and teardown is significantly
> > slower than DNSBL and URIBL lookups already used in spamassassin.  My
> > average scan time is less than one second.  A plugin that catches the 1% of
> > URL shortening spam is only worthwhile if it doesn't slow down your mail
> > scanning considerably.  Doing the HTTP query asynchronously would help, but
> > I fear that this could easily add several seconds per mail.
> >
> > Warren
> >
> 
> Another problem... spammers could intentionally max out the number of
> shortener URL's per spam.  The URL's don't even have to be real.  Any random
> garbage after the domain name will trigger a HTTP get, and render the local
> cache useless.  HTTP get could happen dozens or hundreds of times a minute
> until the shortening service decides to block the spamassassin IP.

There are lots of plugins out there that aren't part of the core for one
reason or another. If you ask me, this is one of them. It just asks trouble
widely used. It's not the only way to solve the problem anyway. And the
problem itself is somewhat "temporary" in nature, just like image spam was
etc.

Re: New plugin: DecodeShortURLs

Posted by "Warren Togami Jr." <wt...@gmail.com>.

On Sat, Jan 1, 2011 at 7:19 AM, Steve Freegard <st...@stevefreegard.com>wrote:

> 7) How fast are typical URL shortening responses?  What is the timeout?  We
> want to avoid degrading the scan time and delivery performance of
> spamassassin, but in a way that cannot be abused by the spammer to evade
> detection.
>
>
> This could be a problem with your huge list of shortening services.  If you
> blindly include all possible shortening services, spammers could
> purposefully use only the slowest in order timeout spamassasin.  Web
> browsers are more forgiving in timeouts, so a slow redirector is the ideal
> way to evade your plugin.
>
> It is possible that you may want to include only the most reputable
> shortening services by default, because you don't know what will happen
> during the multiple years of your plugin being deployed on arbitrary
> servers.  Other less reputable shortening services might be hijacked, domain
> ownership changed, or simply neglected and become slow.  Such services may
> need to be blacklisted entirely.  For the non-default shortening services,
> it may be safe only if it can be updated via sa-update.
>
>
> The timeout is set to 5 seconds and with a default of 10 short URIs scanned
> it would take 50 seconds before it timed out the lookups.  Thinking about it
> I could possibly mitigate this by tracking timeouts by shortener domain; so
> if the 1st lookup to that shortener service timed-out then it wouldn't
> attempt the rest.
>

Everything else about this sounds very good, but this part is a bit
worrisome.  Looking through my logs, my average scantime is under 1 second.
During debugging a timeout of 5 seconds would be fine in order to help
determine how fast the shorteners typically respond.  But changes are needed
to avoid severely impacting delivery times.

* Consecutive timeouts wont work.  The combined timeout of all short lookups
when this plugin goes into production must be under maybe 3-5 seconds.
* I know this would be difficult, but would it be possible to make
asynchronous and concurrent queries to the shorteners instead of
one-after-another?  Kind of like how the URIDNSBL plugin currently works.
There might be some complications here, like most HTTP servers will only
respond to the first two concurrent connections from an IP address while
further connections are serviced only after the first two have disconnected.

Rule Ideas
========
SHORT_URL_MULTI10
SHORT_URL_TOOMANY
Rules triggering on suspicious behavior even if your plugin didn't have time
to query it all.

SHORT_URL_TIMEOUT
The plugin could print out which URL timed out.  Something like:

X-Spam-Report:
	*  0.5 SHORT_URL_TIMEOUT Shortened URL Timedout
	*      [3 second timeout for http://example.com/298fauu]

Warren

Re: New plugin: DecodeShortURLs

Posted by Raymond Dijkxhoorn <ra...@prolocation.net>.

Warren,

> It appears that under 1% of spam is abusing shortening redirectors.  
> ~40% of the shortening redirector spam has local-only spamassassin 
> scores below the 5 point threshold.  We'll see next
> Saturday how it scores with all network rules.

Could you please quote the old messages and not post messages inside a 
thread with completely removing the old message?

http://lipas.uwasa.fi/~ts/http/quote.html

Thanks,
Raymond.

Re: New plugin: DecodeShortURLs

Posted by "Warren Togami Jr." <wt...@gmail.com>.

http://ruleqa.spamassassin.org/20110102-r1054364-n/T_URL_SHORTENER/detail
I inserted a giant uri regex into the nightly masscheck in order to get a
rough measure the true extent of the URL shortener problem.

It appears that under 1% of spam is abusing shortening redirectors.  ~40% of
the shortening redirector spam has local-only spamassassin scores below the
5 point threshold.  We'll see next Saturday how it scores with all network
rules.

Warren

Re: New plugin: DecodeShortURLs

Posted by Jason Haar <Ja...@trimble.co.nz>.

On 01/02/2011 07:52 AM, Michael Scheidell wrote:
>> Currently the default used by the LWP module.  Could easily set it to
>> use an identical string to Firefox or IE.
>
> and, on occasion, our IPS will tarpit, or delay, or totally block
> anything that hits the web servers more than a couple of times with
> LWP (or java lib), assuming its a spammer using LWP to harvest web
> sites for email addresses, so, changing it would be good.
>

...and in our case we prefer to set User-Agent to a known "secret sauce"
so that our NIDS will ignore outbound HTTP requests from our DMZ mail
servers. Having User-Agent as a config setting would be useful to us
(this is really more of a general SA question than DecodeShortURL)

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1

Re: New plugin: DecodeShortURLs

Posted by Michael Scheidell <mi...@secnap.com>.

On 1/1/11 12:19 PM, Steve Freegard wrote:
>> 8) What UserAgent is used in the HTTP request?  If they can easily 
>> detect that the request is not a real browser, then they can avoid 
>> detection by using a safe looking fake response, while browser-based 
>> redirects go to the intended spam target.
>
> Currently the default used by the LWP module.  Could easily set it to 
> use an identical string to Firefox or IE.
and, on occasion, our IPS will tarpit, or delay, or totally block 
anything that hits the web servers more than a couple of times with LWP 
(or java lib), assuming its a spammer using LWP to harvest web sites for 
email addresses, so, changing it would be good.

-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
 >*| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best in Email Security,2010: Network Products Guide
    * King of Spam Filters, SC Magazine 2008

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________

Re: New plugin: DecodeShortURLs

Posted by "Warren Togami Jr." <wt...@gmail.com>.

On Wed, Jan 5, 2011 at 2:41 AM, Warren Togami Jr. <wt...@gmail.com> wrote:

> The only trouble here is HTTP's TCP handshake and teardown is significantly
> slower than DNSBL and URIBL lookups already used in spamassassin.  My
> average scan time is less than one second.  A plugin that catches the 1% of
> URL shortening spam is only worthwhile if it doesn't slow down your mail
> scanning considerably.  Doing the HTTP query asynchronously would help, but
> I fear that this could easily add several seconds per mail.
>
> Warren
>

Another problem... spammers could intentionally max out the number of
shortener URL's per spam.  The URL's don't even have to be real.  Any random
garbage after the domain name will trigger a HTTP get, and render the local
cache useless.  HTTP get could happen dozens or hundreds of times a minute
until the shortening service decides to block the spamassassin IP.

Re: New plugin: DecodeShortURLs

Posted by "Warren Togami Jr." <wt...@gmail.com>.

On Sat, Jan 1, 2011 at 7:19 AM, Steve Freegard <st...@stevefreegard.com>wrote:

>  On 01/01/11 11:51, Warren Togami Jr. wrote:
>
>  I'll help you start the process with a Bugzilla ticket.  I also hope you
> could get it into some sort of public source control mechanism soon so we
> can see the changes that go into it before inclusion in upstream.  I feel
> uncomfortable using something that is only available from a URL without
> being able to see its change history.
>
> Know how to use git?  github.com is pretty good for something small like
> this.
>
>
>
> Sure. No problem.
>

Setup a git repository?  I'd like to collaborate on development on this
plugin.

> 2) How widespread is URL shortening abuse now?  I can figure this out very
> easily by adding a non-network URI rule to the nightly masscheck.  Could you
> please send me privately your updated list of shorteners so that I may write
> such a rule?
>
>
> Based on the reports I get - quite prevalent at times and when these are
> used it's effectively a free-pass through the URIBL plug-in which often
> results in a false-negative.
>
> As soon as I've sorted out the list - I'll send it to you.
>

According to yesterday's masschecks, it appears that roughly 1% of spam and
1% of ham contains a URL shortener.  Of the spam in the corpus, ~49% of the
spam containing a URL shortener scoring 5 points or fewer.  A score this low
probably means they are successful in  avoiding positive URIBL hits.  If you
look at the borderline scores all the way up to 7, then you're looking at
64% of URL shortening spam.  Higher scores are almost always a sign that the
URL shortener domain itself is listed in a URIBL, probably because they
didn't police themselves and they were abused too much.  But the spam bias
of URL shorteners are definitely weighted heavily on the lower-end of
spamassassin scoring, meaning this is a worthwhile approach to develop.

The only trouble here is HTTP's TCP handshake and teardown is significantly
slower than DNSBL and URIBL lookups already used in spamassassin.  My
average scan time is less than one second.  A plugin that catches the 1% of
URL shortening spam is only worthwhile if it doesn't slow down your mail
scanning considerably.  Doing the HTTP query asynchronously would help, but
I fear that this could easily add several seconds per mail.

Warren

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

  On 01/01/11 11:51, Warren Togami Jr. wrote:
> I'll help you start the process with a Bugzilla ticket.  I also hope 
> you could get it into some sort of public source control mechanism 
> soon so we can see the changes that go into it before inclusion in 
> upstream.  I feel uncomfortable using something that is only available 
> from a URL without being able to see its change history.
>
> Know how to use git? github.com <http://github.com> is pretty good for 
> something small like this.

Sure. No problem.
> More questions:
>
> 1) Is it really necessary to follow a chain deeper than 2?  My mind 
> thinks that a chain of 2 is never legitimate, and it consumes time and 
> resources to query further.
>

No - I'll make this configurable and default it to 2.  After a bit of 
testing to make sure there are no obvious issues doing this.

> 2) How widespread is URL shortening abuse now?  I can figure this out 
> very easily by adding a non-network URI rule to the nightly 
> masscheck.  Could you please send me privately your updated list of 
> shorteners so that I may write such a rule?
>

Based on the reports I get - quite prevalent at times and when these are 
used it's effectively a free-pass through the URIBL plug-in which often 
results in a false-negative.

As soon as I've sorted out the list - I'll send it to you.

> 3) SHORT_URL_NOTSHORT
> If the expanded address is not much longer than the original address, 
> then they are likely obfuscating with ill intent.  What should the 
> threshold be?  Exact length?  Original length + 4 characters?  This 
> should be a good new rule.
>

Hmmm; not sure on that - I'll see if I can add this.

> 4)
> url_shortener_log /tmp/DecodeShortURLs.txt
> url_shortener_cache /tmp/DecodeShortURLs.sq3
> Is there a variable to the spamassassin homedir's path so you don't 
> need to hardcode an absolute path in the default config?
>

Not that I know of.

> 5) Do you currently make any distinction between reputable and 
> non-reputable shortening services?  Questions below are related to this.
>

No - I have no information on this.  However the URIBLs do and often 
blacklist the rogues if they are used for a lot of abuse.

> 6) If your plugin expands http://example.com/foobar to 
> http://somethingsafe.com, is http://example.com hidden from URIBL 
> lookups?  This might matter if a shortening service goes rogue.
>

No - the expanded URIs are added to the list gathered by SA; it doesn't 
overwrite them.

> 7) How fast are typical URL shortening responses?  What is the 
> timeout?  We want to avoid degrading the scan time and delivery 
> performance of spamassassin, but in a way that cannot be abused by the 
> spammer to evade detection.
>
> This could be a problem with your huge list of shortening services.  
> If you blindly include all possible shortening services, spammers 
> could purposefully use only the slowest in order timeout spamassasin.  
> Web browsers are more forgiving in timeouts, so a slow redirector is 
> the ideal way to evade your plugin.
>
> It is possible that you may want to include only the most reputable 
> shortening services by default, because you don't know what will 
> happen during the multiple years of your plugin being deployed on 
> arbitrary servers.  Other less reputable shortening services might be 
> hijacked, domain ownership changed, or simply neglected and become 
> slow.  Such services may need to be blacklisted entirely.  For the 
> non-default shortening services, it may be safe only if it can be 
> updated via sa-update.
>

The timeout is set to 5 seconds and with a default of 10 short URIs 
scanned it would take 50 seconds before it timed out the lookups.  
Thinking about it I could possibly mitigate this by tracking timeouts by 
shortener domain; so if the 1st lookup to that shortener service 
timed-out then it wouldn't attempt the rest.

> 8) What UserAgent is used in the HTTP request?  If they can easily 
> detect that the request is not a real browser, then they can avoid 
> detection by using a safe looking fake response, while browser-based 
> redirects go to the intended spam target.

Currently the default used by the LWP module.  Could easily set it to 
use an identical string to Firefox or IE.

Regards,
Steve.

Re: New plugin: DecodeShortURLs

Posted by "Warren Togami Jr." <wt...@gmail.com>.

On Fri, Dec 31, 2010 at 11:46 PM, Steve Freegard <st...@stevefreegard.com>wrote:

>
>  I notice that there is no Bugzilla ticket for this plugin.  Do you intend
>> on submitting it for inclusion in future spamassassin upstream?
>>
>>
>
> I hadn't really thought about it TBH and wasn't sure what the procedure was
> for this.
>
> It's been working well for me and for others based on some feedback that
> people have sent me - however it could do with being tested in the network
> mass-checks to see actually how effective it is compared to the other rules.
>
> But I'd also feel a bit more comfortable if one of the core devs looked
> over the code and made sure I haven't done anything obviously stupid.

I'll help you start the process with a Bugzilla ticket.  I also hope you
could get it into some sort of public source control mechanism soon so we
can see the changes that go into it before inclusion in upstream.  I feel
uncomfortable using something that is only available from a URL without
being able to see its change history.

Know how to use git?  github.com is pretty good for something small like
this.

>
>
>  Would a DoS happen if the scanned e-mail contains 10,000 short URL's, and
>> your mail server is hit by many such mail?  (Either spamassasin becomes very
>> slow, or you piss off the short URL provider by hitting them too quickly and
>> often.)
>>
>>
> No - it's got a hard-coded limit of 10 short URLs that will be checked at
> maximum; anything after the limit of 10 are skipped.  You can also
> optionally enable a cache (requires DBD::SQLite) to prevent multiple
> messages with the same short link from generating additional queries.
>
> On reflection whilst typing this - I could probably handle this a bit
> better; currently the short URLs are stored in a Perl hash (to effectively
> de-dup them); I should possibly turn the hash into an array, randomize it
> and remove the first 10 entries from it so it's not so predictable.

Sounds like an good plan.  I'll see how it is in practice.

>
>
>  Could the plugin detect when there are intentionally too many short URL's?
>>  If so, what should it do in such cases?  Are there ever legit reasons for
>> an e-mail to have a large number of short URL's?
>>
>>
> For now - I guess I could add an additional rule (e.g. scored at 0.001 to
> see how many times it hits the current limit); but the age old issue is 'how
> many is too many?'.
>
> I'll see about pushing out a new version with the updated list of
> shorteners and those changes shortly.
>
> Kind regards,
> Steve.
>

More questions:

1) Is it really necessary to follow a chain deeper than 2?  My mind thinks
that a chain of 2 is never legitimate, and it consumes time and resources to
query further.

2) How widespread is URL shortening abuse now?  I can figure this out very
easily by adding a non-network URI rule to the nightly masscheck.  Could you
please send me privately your updated list of shorteners so that I may write
such a rule?

3) SHORT_URL_NOTSHORT
If the expanded address is not much longer than the original address, then
they are likely obfuscating with ill intent.  What should the threshold be?
Exact length?  Original length + 4 characters?  This should be a good new
rule.

4)

url_shortener_log /tmp/DecodeShortURLs.txt
url_shortener_cache /tmp/DecodeShortURLs.sq3

Is there a variable to the spamassassin homedir's path so you don't need to
hardcode an absolute path in the default config?

5) Do you currently make any distinction between reputable and non-reputable
shortening services?  Questions below are related to this.

6) If your plugin expands http://example.com/foobar to
http://somethingsafe.com, is http://example.com hidden from URIBL lookups?
This might matter if a shortening service goes rogue.

7) How fast are typical URL shortening responses?  What is the timeout?  We
want to avoid degrading the scan time and delivery performance of
spamassassin, but in a way that cannot be abused by the spammer to evade
detection.

This could be a problem with your huge list of shortening services.  If you
blindly include all possible shortening services, spammers could
purposefully use only the slowest in order timeout spamassasin.  Web
browsers are more forgiving in timeouts, so a slow redirector is the ideal
way to evade your plugin.

It is possible that you may want to include only the most reputable
shortening services by default, because you don't know what will happen
during the multiple years of your plugin being deployed on arbitrary
servers.  Other less reputable shortening services might be hijacked, domain
ownership changed, or simply neglected and become slow.  Such services may
need to be blacklisted entirely.  For the non-default shortening services,
it may be safe only if it can be updated via sa-update.

8) What UserAgent is used in the HTTP request?  If they can easily detect
that the request is not a real browser, then they can avoid detection by
using a safe looking fake response, while browser-based redirects go to the
intended spam target.

Warren Togami
warren@togami.com

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

  Hi Warren,

On 01/01/11 09:17, Warren Togami Jr. wrote:
> What is the status of this plugin?
>

As far as I'm concerned - I'm actively maintaining it and have been 
using it in production on several sites; I've been planning to push out 
an update as I've recently been contributed a massive list of additional 
shorteners (however I need to double-check them all).

> I notice that there is no Bugzilla ticket for this plugin.  Do you 
> intend on submitting it for inclusion in future spamassassin upstream?
>

I hadn't really thought about it TBH and wasn't sure what the procedure 
was for this.

It's been working well for me and for others based on some feedback that 
people have sent me - however it could do with being tested in the 
network mass-checks to see actually how effective it is compared to the 
other rules.

But I'd also feel a bit more comfortable if one of the core devs looked 
over the code and made sure I haven't done anything obviously stupid.

> Would a DoS happen if the scanned e-mail contains 10,000 short URL's, 
> and your mail server is hit by many such mail?  (Either spamassasin 
> becomes very slow, or you piss off the short URL provider by hitting 
> them too quickly and often.)
>

No - it's got a hard-coded limit of 10 short URLs that will be checked 
at maximum; anything after the limit of 10 are skipped.  You can also 
optionally enable a cache (requires DBD::SQLite) to prevent multiple 
messages with the same short link from generating additional queries.

On reflection whilst typing this - I could probably handle this a bit 
better; currently the short URLs are stored in a Perl hash (to 
effectively de-dup them); I should possibly turn the hash into an array, 
randomize it and remove the first 10 entries from it so it's not so 
predictable.

> Could the plugin detect when there are intentionally too many short 
> URL's?  If so, what should it do in such cases?  Are there ever legit 
> reasons for an e-mail to have a large number of short URL's?
>

For now - I guess I could add an additional rule (e.g. scored at 0.001 
to see how many times it hits the current limit); but the age old issue 
is 'how many is too many?'.

I'll see about pushing out a new version with the updated list of 
shorteners and those changes shortly.

Kind regards,
Steve.

Re: New plugin: DecodeShortURLs

Posted by "Warren Togami Jr." <wt...@gmail.com>.

What is the status of this plugin?

I notice that there is no Bugzilla ticket for this plugin.  Do you intend on
submitting it for inclusion in future spamassassin upstream?

Would a DoS happen if the scanned e-mail contains 10,000 short URL's, and
your mail server is hit by many such mail?  (Either spamassasin becomes very
slow, or you piss off the short URL provider by hitting them too quickly and
often.)

Could the plugin detect when there are intentionally too many short URL's?
If so, what should it do in such cases?  Are there ever legit reasons for an
e-mail to have a large number of short URL's?

Warren Togami
warren@togami.com

Re: New plugin: DecodeShortURLs

Posted by Brent Gardner <br...@gmail.com>.

René Berber wrote:
> On 10/5/2010 3:42 PM, Yet Another Ninja wrote:
>
>   
>> On 2010-10-05 22:35, Brent Gardner wrote:
>>     
>
> [snip]
>   
>>> Using URLs like these:
>>>
>>> http://goo.gl/foo
>>> http://bit.ly/foo
>>> http://2chap.it/foo
>>>
>>> I consistently hit on these rules:
>>>
>>> HAS_SHORT_URL
>>> SHORT_URL_404
>>> SHORT_URL_CHAINED
>>> SHORT_URL_LOOP
>>> SHORT_URL_MAXCHAIN
>>>
>>>
>>> I can understand hitting on HAS_SHORT_URL and SHORT_URL_404, but why
>>> is -every- test hitting SHORT_URL_CHAINED, SHORT_URL_LOOP,
>>> SHORT_URL_MAXCHAIN?
>>>       
>> I bet *none* of the /foo targets exist.
>> Could that be confusing the plugin when /foo redirects back to "home"
>> Steve?
>>     
>
> Brent can see in /tmp/DecodeShortURLs.txt if that was the case (i.e. the
> file shows the mapping found between the short link and the long one).
> Of course this is only if he didn't change the original .cf's
> url_shortener_log .
>   
Here's the contents of /tmp/DecodeShortURLs.txt so far:

[1286308657] http://2chap.it/foo => http://2chap.it
[1286308914] http://bit.ly/foo => sdadsa
[1286309776] http://goo.gl/l6MS => 
http://googleblog.blogspot.com/2009/12/making-urls-shorter-for-google-toolbar.html
[1286309866] http://tinyurl.com/2vw3t8j => 
http://www.google.com/search?q=android+url+shortener&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
[1286309979] http://bit.ly/3hDSUb => http://www.example.com/
[1286310537] http://bit.ly/3hDSUb => http://www.example.com/


Of course, I didn't expect the /foo URLs to exist, but I didn't have any 
live data to test with.  I found the other URLs listed by googling.  
They all act the same, hitting on all 5 rules listed above.


Brent Gardner

Re: New plugin: DecodeShortURLs

Posted by René Berber <r....@computer.org>.

On 10/5/2010 3:42 PM, Yet Another Ninja wrote:

> On 2010-10-05 22:35, Brent Gardner wrote:

[snip]
>> Using URLs like these:
>>
>> http://goo.gl/foo
>> http://bit.ly/foo
>> http://2chap.it/foo
>>
>> I consistently hit on these rules:
>>
>> HAS_SHORT_URL
>> SHORT_URL_404
>> SHORT_URL_CHAINED
>> SHORT_URL_LOOP
>> SHORT_URL_MAXCHAIN
>>
>>
>> I can understand hitting on HAS_SHORT_URL and SHORT_URL_404, but why
>> is -every- test hitting SHORT_URL_CHAINED, SHORT_URL_LOOP,
>> SHORT_URL_MAXCHAIN?
> 
> I bet *none* of the /foo targets exist.
> Could that be confusing the plugin when /foo redirects back to "home"
> Steve?

Brent can see in /tmp/DecodeShortURLs.txt if that was the case (i.e. the
file shows the mapping found between the short link and the long one).
Of course this is only if he didn't change the original .cf's
url_shortener_log .
-- 
René Berber

Re: New plugin: DecodeShortURLs

Posted by Yet Another Ninja <sa...@alexb.ch>.

On 2010-10-05 22:35, Brent Gardner wrote:
> Steve Freegard wrote:
>> Hi All,
>>
>> On 17/09/10 14:11, Steve Freegard wrote:
>>> Hi All,
>>>
>>> Recently I've been getting a bit of filter-bleed from a bunch of spams
>>> injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foo
>>> that upon closer inspection would have been rejected with a high score
>>> if the real URL had been used.
>>>
>>> To that end - it annoyed me enough to write a plug-in that decodes the
>>> shortened URL using an HTTP HEAD request to extract the location header
>>> sent by the shortening service and to put this into the list of
>>> extracted URIs for other plug-ins to find (such as URIDNSBL).
>>>
>>> On the messages I tested it with - it raised the scores from <5 to >10
>>> based on URIDNSBL hits which is just what I wanted.
>>>
>>> Hopefully it will be useful to others; you can grab it from:
>>>
>>> http://www.fsl.com/support/DecodeShortURLs.pm
>>> http://www.fsl.com/support/DecodeShortURLs.cf
>>>
>>
>> I've just put up a new version at the above URLs (v0.3) which adds the 
>> following new features:
>>
>> - Now follows 'chained' short URLs  (e.g. shortURL -> shortURL -> real)
>>
>> When chained URLs are detected the rule 'SHORT_URL_CHAINED' is fired.
>> If a chained loop is detected the rule 'SHORT_URL_LOOP' is fired.
>> If more than 10 chained URLs are found 'SHORT_URL_MAXCHAIN' is fired 
>> and no further redirections are checked.
>>
>> - If the shortener returns 404 (e.g. not found) for the short URL then 
>> 'SHORT_URL_404' is fired.
>>
>> - Prevent amavis from die'ing on eval block tests by adding "local 
>> $SIG{'__DIE__'} to each block.
>>
>> - Added option to allow logging to syslog (mail.info).
>>
>> Kind regards,
>> Steve.
>>
> I've been testing this plugin, version 0.5.  I'm running SpamAssassin 
> v3.2.5 on CentOS v5.5 32-bit, Perl v5.8.8.  I've been testing using a 
> test message and changing out the URLs it contains.
> 
> Using URLs like these:
> 
> http://goo.gl/foo
> http://bit.ly/foo
> http://2chap.it/foo
> 
> I consistently hit on these rules:
> 
> HAS_SHORT_URL
> SHORT_URL_404
> SHORT_URL_CHAINED
> SHORT_URL_LOOP
> SHORT_URL_MAXCHAIN
> 
> 
> I can understand hitting on HAS_SHORT_URL and SHORT_URL_404, but why is 
> -every- test hitting SHORT_URL_CHAINED, SHORT_URL_LOOP, SHORT_URL_MAXCHAIN?

I bet *none* of the /foo targets exist.
Could that be confusing the plugin when /foo redirects back to "home"
Steve?

Re: New plugin: DecodeShortURLs

Posted by Brent Gardner <br...@gmail.com>.

Steve Freegard wrote:
> Hi All,
>
> On 17/09/10 14:11, Steve Freegard wrote:
>> Hi All,
>>
>> Recently I've been getting a bit of filter-bleed from a bunch of spams
>> injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foo
>> that upon closer inspection would have been rejected with a high score
>> if the real URL had been used.
>>
>> To that end - it annoyed me enough to write a plug-in that decodes the
>> shortened URL using an HTTP HEAD request to extract the location header
>> sent by the shortening service and to put this into the list of
>> extracted URIs for other plug-ins to find (such as URIDNSBL).
>>
>> On the messages I tested it with - it raised the scores from <5 to >10
>> based on URIDNSBL hits which is just what I wanted.
>>
>> Hopefully it will be useful to others; you can grab it from:
>>
>> http://www.fsl.com/support/DecodeShortURLs.pm
>> http://www.fsl.com/support/DecodeShortURLs.cf
>>
>
> I've just put up a new version at the above URLs (v0.3) which adds the 
> following new features:
>
> - Now follows 'chained' short URLs  (e.g. shortURL -> shortURL -> real)
>
> When chained URLs are detected the rule 'SHORT_URL_CHAINED' is fired.
> If a chained loop is detected the rule 'SHORT_URL_LOOP' is fired.
> If more than 10 chained URLs are found 'SHORT_URL_MAXCHAIN' is fired 
> and no further redirections are checked.
>
> - If the shortener returns 404 (e.g. not found) for the short URL then 
> 'SHORT_URL_404' is fired.
>
> - Prevent amavis from die'ing on eval block tests by adding "local 
> $SIG{'__DIE__'} to each block.
>
> - Added option to allow logging to syslog (mail.info).
>
> Kind regards,
> Steve.
>
I've been testing this plugin, version 0.5.  I'm running SpamAssassin 
v3.2.5 on CentOS v5.5 32-bit, Perl v5.8.8.  I've been testing using a 
test message and changing out the URLs it contains.

Using URLs like these:

http://goo.gl/foo
http://bit.ly/foo
http://2chap.it/foo

I consistently hit on these rules:

HAS_SHORT_URL
SHORT_URL_404
SHORT_URL_CHAINED
SHORT_URL_LOOP
SHORT_URL_MAXCHAIN


I can understand hitting on HAS_SHORT_URL and SHORT_URL_404, but why is 
-every- test hitting SHORT_URL_CHAINED, SHORT_URL_LOOP, SHORT_URL_MAXCHAIN?


Brent Gardner

Re: New plugin: DecodeShortURLs

Posted by Steve Freegard <st...@stevefreegard.com>.

Hi All,

On 17/09/10 14:11, Steve Freegard wrote:
> Hi All,
>
> Recently I've been getting a bit of filter-bleed from a bunch of spams
> injected via Hotmail/Yahoo that contain shortened URLs e.g. bit.ly/foo
> that upon closer inspection would have been rejected with a high score
> if the real URL had been used.
>
> To that end - it annoyed me enough to write a plug-in that decodes the
> shortened URL using an HTTP HEAD request to extract the location header
> sent by the shortening service and to put this into the list of
> extracted URIs for other plug-ins to find (such as URIDNSBL).
>
> On the messages I tested it with - it raised the scores from <5 to >10
> based on URIDNSBL hits which is just what I wanted.
>
> Hopefully it will be useful to others; you can grab it from:
>
> http://www.fsl.com/support/DecodeShortURLs.pm
> http://www.fsl.com/support/DecodeShortURLs.cf
>

I've just put up a new version at the above URLs (v0.3) which adds the 
following new features:

- Now follows 'chained' short URLs  (e.g. shortURL -> shortURL -> real)

When chained URLs are detected the rule 'SHORT_URL_CHAINED' is fired.
If a chained loop is detected the rule 'SHORT_URL_LOOP' is fired.
If more than 10 chained URLs are found 'SHORT_URL_MAXCHAIN' is fired and 
no further redirections are checked.

- If the shortener returns 404 (e.g. not found) for the short URL then 
'SHORT_URL_404' is fired.

- Prevent amavis from die'ing on eval block tests by adding "local 
$SIG{'__DIE__'} to each block.

- Added option to allow logging to syslog (mail.info).

Kind regards,
Steve.

Re: New plugin: DecodeShortURLs

Posted by "Chip M." <sa...@IowaHoneypot.com>.

Steve Freegard wrote:
>Hopefully it will be useful to others; you can grab it from:

Thanks Steve!

Suggestions (for future enhancements):

1. Consider splitting the list of shorteners between those that
are well established and KNOWN to be reasonably diligent, and
"all others" (e.g. the anti-pattern ably described last week in:
http://www.xkcd.com/792/ ).
Split them in such a way as to make it easy for users to test only
ONE set (probably the better known ones), and (perhaps) add an
option to score the rest without doing a DNS call.

2. Investigate BitLy's API.
I've been experimenting with it for a few months, and am very
pleased with the options and data it provides.  I still need to add
ham shortener links to my standard/automated testing (preliminary
results are excellent).
The only "issue" I had (at the very beginning) was signing up with
a mixed case API key, then lower casing it when I used it.  My BIM.

3. Please collect and share performance data.  Thanks in advance! :)


I still haven't deployed anything real-time (have had VERY limited
quality-dev-time this year - Grrr!Argh!).  Since these first became
a problem, I've been auto-quarantining (except for a very short list
of manually excluded newsletters and select validated Senders), then
we handle the DNS tests as part of our desktop-based FP pipeline.

The occurrence of shorteners in ham is low enough that that's been
acceptable to our userbase, largely because they run the actual
tests, so they have Complete Control.  It's been my experience that
not-stupid endusers who are given control are happy users.  They're
full participants in the process. :)
	- "Chip"