You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Simon Standley <si...@yacc.co.uk> on 2006/08/14 20:14:13 UTC

The arms race continues

Hi Gang,

I've had the latest FuzzyOcr on test for the past day or so - very nice work. Congrats to all involved.

Thought you may be interested in the attached GIF. It was only a matter of time before something like this came along ...

Si.

 <<forgiving26.gif>> 

.


Re: The arms race continues

Posted by Matthias Keller <li...@matthias-keller.ch>.
decoder wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Michel Vaillancourt wrote:
>   
>> Simon Standley wrote:
>>     
>>> Hi Gang,
>>>
>>> I've had the latest FuzzyOcr on test for the past day or so -
>>> very nice work. Congrats to all involved.
>>>
>>> Thought you may be interested in the attached GIF. It was only a
>>> matter of time before something like this came along ...
>>>
>>> Si.
>>>
>>> <<forgiving26.gif>>
>>>
>>> .
>>>       
>> I've seen three of these this morning alone...  and FuzzyOCR isn't
>> trapping them.
>>
>> --Michel Wolfstar Systems
>>
>>     
>
> gocr features a nice parameter called -d. It is able to remove smaller
> particles before scanning, compare these results:
>   
Very interesting...
this time my gocr is better than yours.
Mine catches the text even better WITHOUT -d option. it gets a bit worse
(though not much) with -d 2...

# gocr -i forgiving26.gif
giftopnm: Reading Image Sequence 0
Visit RX{MUNGED}GOOD.COM
(don ' t LIILk _ust type In browser)
and SAVE 5_o,_o on your Phar{MUNGED}macy!.
VIA{MUNGED}GRA from $3,33
CIA{MUNGED}LIC_ from $3,75
VAL{MUNGED}IUM mom $l,21
_ave a nii_e da)/!,


--
Using SuSE 10.1's gocr-0.40
(sorry, i had to use {MUNGED} because the list server rejected my mail 
otherwise.....)

Matt



Re: The arms race continues

Posted by John Rudd <jr...@ucsc.edu>.
On Aug 14, 2006, at 12:01 PM, decoder wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Theo Van Dinter wrote:
>> On Mon, Aug 14, 2006 at 08:46:51PM +0200, decoder wrote:
>>> gocr features a nice parameter called -d. It is able to remove
>>> smaller particles before scanning, compare these results:
>>
>> So my problem with the OCR idea is that it inevitably gets to the
>> point where we'd need to programatically solve the same graphics as
>> used in CAPTCHAs, and then I don't think we're really focused on
>> addressing the core issue any longer.
>>
>> It's mostly the same way in non-graphic spams -- catching the text
>> may or may not be difficult with all the obfuscation and such that
>> goes on. However, catching the fact that there's obfuscation is a
>> good indication of spam.
>>
>> Just a thought.
>>
> You are absolutely right, this COULD get to a point where it gets
> really pointless to scan for text in an image. But for an image it is
> even harder to detect an obfuscation, than with text.
>
> For text, I had the idea earlier to utilize a method to detect
> obfuscations with approximate matching and then scoring the
> obfuscation itself and not the content. But this can lead easily to
> false positives, so one must pay attention on what he puts on the
> wordlist.
>
> For images, this is even harder, how would one try to recognize an
> attempt to mislead OCR?
>

Exactly: how do you know if the OCR software didn't find text because 
it wasn't there, or because it was sufficiently obfuscated?


I don't mind an arms race for this area of spam fighting.  It's a race 
the spammers will lose, because at some point the image will become so 
unclear as to be like a captcha system, at which point: who will be 
bothering to try to read the image?  In essence, when it comes to this 
little part of the spam arms race, we are the plains indians and they 
are the buffalo.  All we have to do is keep herding them toward the 
cliff of "images so obfuscated as to be unreadable by humans".

Their only way out of this particular race is to just stop.  It's a 
lose-lose proposition for them.


Re: The arms race continues

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Theo Van Dinter wrote:
> On Mon, Aug 14, 2006 at 08:46:51PM +0200, decoder wrote:
>> gocr features a nice parameter called -d. It is able to remove
>> smaller particles before scanning, compare these results:
>
> So my problem with the OCR idea is that it inevitably gets to the
> point where we'd need to programatically solve the same graphics as
> used in CAPTCHAs, and then I don't think we're really focused on
> addressing the core issue any longer.
>
> It's mostly the same way in non-graphic spams -- catching the text
> may or may not be difficult with all the obfuscation and such that
> goes on. However, catching the fact that there's obfuscation is a
> good indication of spam.
>
> Just a thought.
>
You are absolutely right, this COULD get to a point where it gets
really pointless to scan for text in an image. But for an image it is
even harder to detect an obfuscation, than with text.

For text, I had the idea earlier to utilize a method to detect
obfuscations with approximate matching and then scoring the
obfuscation itself and not the content. But this can lead easily to
false positives, so one must pay attention on what he puts on the
wordlist.

For images, this is even harder, how would one try to recognize an
attempt to mislead OCR?


Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE4Mh0JQIKXnJyDxURAgHTAJ9gL6EoSaWpcFjBWJVwg6zk+MJoIgCgomov
HWbHnKbbJovLuXwRtOhf2kc=
=vez+
-----END PGP SIGNATURE-----


Re: The arms race continues

Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Aug 14, 2006 at 08:46:51PM +0200, decoder wrote:
> gocr features a nice parameter called -d. It is able to remove smaller
> particles before scanning, compare these results:

So my problem with the OCR idea is that it inevitably gets to the point
where we'd need to programatically solve the same graphics as used in
CAPTCHAs, and then I don't think we're really focused on addressing the
core issue any longer.

It's mostly the same way in non-graphic spams -- catching the text may
or may not be difficult with all the obfuscation and such that goes on.
However, catching the fact that there's obfuscation is a good indication
of spam.

Just a thought.

-- 
Randomly Generated Tagline:
Capital Punishment means never having to say "YOU AGAIN?"

RE: Checking my own users mail

Posted by Thomas Lindell <tl...@adlmail.com>.
I do have amavis running the problem is identifiying the message Idealy I
guess I would like it to pop up an error in outlook like it does when they
try to send a file attachment that's to large.

I suppose I could implement some sort of rate limiting but that's just
irritating I am trying to stay out of there way as much as possible and yet
still protect the internet from spam generated by the odd customer.
Tom

-----Original Message-----
From: Evan Platt [mailto:evan@espphotography.com] 
Sent: Monday, August 14, 2006 2:00 PM
To: users@spamassassin.apache.org
Subject: Re: Checking my own users mail

At 12:00 PM 8/14/2006, you wrote:
>Every now and again one of my bonehead customers get's a trojon that 
>starts shooting out spam message like crazy.  I usualy catch it withen 
>a few hours but I am wondering if there's a way for me to scan messages 
>my customers send and drop them or bounce them back if there detected as
spam.

There probably is. Not with spamassassin though. SpamAssassin cannot drop or
reject mail. But depending on how you call SpamAssassin, ie procmail, you
may be able to do something.

But keep in mind, a trojan sending out 1000 messages an hour may not
classify as SPAM. A better option may be something on your mail server, or a
anti-virus program on your mail server. 


Re: Checking my own users mail

Posted by Evan Platt <ev...@espphotography.com>.
At 12:00 PM 8/14/2006, you wrote:
>Every now and again one of my bonehead customers get's a trojon that starts
>shooting out spam message like crazy.  I usualy catch it withen a few hours
>but I am wondering if there's a way for me to scan messages my customers
>send and drop them or bounce them back if there detected as spam.

There probably is. Not with spamassassin though. SpamAssassin cannot 
drop or reject mail. But depending on how you call SpamAssassin, ie 
procmail, you may be able to do something.

But keep in mind, a trojan sending out 1000 messages an hour may not 
classify as SPAM. A better option may be something on your mail 
server, or a anti-virus program on your mail server. 


Re: Checking my own users mail

Posted by "Michele Neylon:: Blacknight.ie" <mi...@blacknight.ie>.
Thomas Lindell wrote:
> Every now and again one of my bonehead customers get's a trojon that starts
> shooting out spam message like crazy.  I usualy catch it withen a few hours
> but I am wondering if there's a way for me to scan messages my customers
> send and drop them or bounce them back if there detected as spam.
> 
> 
> Thanks
> 
> Tom
Short answer ..

If they are using your SMTP - yes

If they aren't ......


-- 
Mr Michele Neylon
Blacknight Solutions
Quality Business Hosting & Colocation
http://www.blacknight.ie/
Tel. 1850 927 280
Intl. +353 (0) 59  9183072
Direct Dial: +353 (0)59 9183090
Fax. +353 (0) 59  9164239

Re: Checking my own users mail

Posted by Logan Shaw <ls...@emitinc.com>.
On Mon, 14 Aug 2006, Thomas Lindell wrote:
> Every now and again one of my bonehead customers get's a trojon that starts
> shooting out spam message like crazy.  I usualy catch it withen a few hours
> but I am wondering if there's a way for me to scan messages my customers
> send and drop them or bounce them back if there detected as spam.

What about enabling some sort of connection rate throttling
(keyed by IP address) in your MTA?  I believe sendmail has
such a feature.  Then, scan the log messages and alert the
on-call person (you?) if some client machine starts connecting
to send outgoing messages more than seems reasonable.  If it's
only every now and then, it might not be that bad to have to
respond to it manually.  You could check the logs to see if the
traffic is really malicious (rather than someone using e-mail
as an instant-messenger substitute), and if so, cut them off.

Of course, this only works for certain classes of customers.
If you're an ISP and your customers each have one desktop
computer, it works great.  If your customers have 100 users
and their own mail server, it doesn't work as great...

   - Logan

Checking my own users mail

Posted by Thomas Lindell <tl...@adlmail.com>.
Every now and again one of my bonehead customers get's a trojon that starts
shooting out spam message like crazy.  I usualy catch it withen a few hours
but I am wondering if there's a way for me to scan messages my customers
send and drop them or bounce them back if there detected as spam.


Thanks

Tom


Re: The arms race continues

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michel Vaillancourt wrote:
> Simon Standley wrote:
>> Hi Gang,
>>
>> I've had the latest FuzzyOcr on test for the past day or so -
>> very nice work. Congrats to all involved.
>>
>> Thought you may be interested in the attached GIF. It was only a
>> matter of time before something like this came along ...
>>
>> Si.
>>
>> <<forgiving26.gif>>
>>
>> .
> I've seen three of these this morning alone...  and FuzzyOCR isn't
> trapping them.
>
> --Michel Wolfstar Systems
>

gocr features a nice parameter called -d. It is able to remove smaller
particles before scanning, compare these results:


Original:

decoder@mordor ~/Uni/SysOP-Paul/spamassassin $ gocr -i forgiving26.gif
' ''v''ìgt _' 'CÒ'O'' '0' '':CO'.M.'''_.'..'_'__'_i.'''._''
_.'''.''.'.'...'.','_
;'_ _'. 1don '.. t. 'cn.c'k. _. s._. t'y,_' e. m'.' bro. 'w_'er).''. _
.'_ '.'.ì. .,. _ ._.
_. ä'nìd.....'SA'.. V..'E... .j.Oq.'o.. .'.òn,'.m.. ù. ì.'m''. ._ìm.
.'.'_i.._'_'' !..'. '
'.''VI'A'' i_' ' À ììàm'' ._.$' '3' _,''3 ''3 ' '_ ' _' i_ .'  :ì.'ì ';.'.
ì CIAL_I_' fr.om ..$3, 75 _' _. ' ' __ ..' ''''.' ' _. '_.
_.. K. ._. .'_.ì'UM' ' _ m..Q.m. '._.$. 1 ;2.. .'.ì ..'.._. _. ._._.'
'..'... _..'..'.. ' .ì '..
M.. .i'a.v...e.'.g...''m.iì''e.'. .d..a.._.'...'!,.',.'_ ;_'.'.'..
.'._... ,'_..',i_.'...._.'. ' .','...i..'..'_.'.ì'.'..'...'_.'.''._
''.'.._


With -d 2:

decoder@mordor ~/spamassassin $ gocr -d 2 -i forgiving26.gif
t
v:gt _CO00.COM
,_  1don t cnck_s_ty,_e m' brow_'er)   , _
ànd.SAVE 50q.o.o. n mur marm_cy!.
VIAGRA fram $3' ,33      _
CIALI_ from $3,75
K__ì_ mQm'_$l,2l
Mav,e g nIce da_'I    . ,



The second one surely gets detected because it contains at least two
words recognized (viagra and cialis). In the next version I will put
- -d 2 as the default and make the parameter configurable via the cf
file. Until that, simply put -d 2 into the gocr arguments.

This works for this one sample, but there are plenty of other methods
to avoid OCR.

If you get more mails like that with different methods of obfuscation,
please tell me.



Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE4MUaJQIKXnJyDxURAuLiAJ40Hqd3/X1xbcsXc6xFrhOTUfkjYgCghcGl
l7p7ZgIfjcHbJclBoL2LT04=
=y9sq
-----END PGP SIGNATURE-----


Re: The arms race continues

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michel Vaillancourt wrote:
> Simon Standley wrote:
>> Hi Gang,
>>
>> I've had the latest FuzzyOcr on test for the past day or so - very
nice work. Congrats to all involved.
>>
>> Thought you may be interested in the attached GIF. It was only a
matter of time before something like this came along ...
>>
>> Si.
>>
>>  <<forgiving26.gif>>
>>
>> .
>     I've seen three of these this morning alone...  and FuzzyOCR isn't
trapping them. 
>
>     --Michel
>     Wolfstar Systems
>

I will have a look at it and if possible, adjust FuzzyOcr to catch
those as well.

It will always be an endless fight I guess... but surrendering is no
option ;)

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE4L+yJQIKXnJyDxURAvMXAKDEJDn2KSJJu/FydBk/GrOG7awgXwCdG7ja
yNTFcMR0CqQXOj2VhRdftzw=
=Mppp
-----END PGP SIGNATURE-----


Re: The arms race continues

Posted by Agent Smith <ne...@yahoo.com>.
I installed it this morning too according to the
documentation but it isn't trapping any mails.

not sure why that it.

can someone post a working config. mine is the
following.

I untared the tar ball and installed the perl module
String-Approx-3.26.

I created a file with text2gif and passed it via email
but it went thru. now the same file when done with
spamassassin -t < sample.eml says it will filter.

# cat FuzzyOcr.cf | grep -v ^#
loadplugin FuzzyOcr FuzzyOcr.pm
body FUZZY_OCR eval:check_fuzzy_ocr()
describe FUZZY_OCR Mail contains an image with common
spam text inside
body FUZZY_OCR_WRONG_CTYPE eval:dummy_check()
describe FUZZY_OCR_WRONG_CTYPE Mail contains an image
with wrong content-type set
body FUZZY_OCR_CORRUPT_IMG eval:dummy_check()
describe FUZZY_OCR_CORRUPT_IMG Mail contains a
corrupted image


focr_word stock
focr_word investor
focr_word international
focr_word company
focr_word money
focr_word million
focr_word thousand
focr_word buy
focr_word price
focr_word trade
focr_word banking
focr_word service
focr_word kunde
focr_word volksbank
focr_word sparkasse
focr_word software
focr_word viagra
focr_word cialis
focr_word levitra
focr_word medicine
focr_word legal
focr_word medication
focr_word click here
focr_word penis
focr_word growth
focr_word drugs
focr_word pharmacy

focr_bin_giffix /usr/local/bin/giffix
focr_bin_giftopnm /usr/bin/giftopnm
focr_bin_jpegtopnm /usr/bin/jpegtopnm
focr_bin_pngtopnm /usr/bin/pngtopnm
focr_bin_gocr /usr/local/bin/gocr

focr_threshold 0.3
focr_base_score 4
focr_add_score 1
focr_wrongctype_score 1.5
focr_corrupt_score 3.5
focr_autodisable_score 50
focr_counts_required 2
focr_verbose 1
focr_tmp_path /tmp


--- Michel Vaillancourt <mi...@wolfstar.ca> wrote:

> Simon Standley wrote:
> > Hi Gang,
> > 
> > I've had the latest FuzzyOcr on test for the past
> day or so - very nice work. Congrats to all
> involved.
> > 
> > Thought you may be interested in the attached GIF.
> It was only a matter of time before something like
> this came along ...
> > 
> > Si.
> > 
> >  <<forgiving26.gif>> 
> > 
> > .
> 	I've seen three of these this morning alone...  and
> FuzzyOCR isn't trapping them.  
> 
> 	--Michel
> 	Wolfstar Systems
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: The arms race continues

Posted by Michel Vaillancourt <mi...@wolfstar.ca>.
Simon Standley wrote:
> Hi Gang,
> 
> I've had the latest FuzzyOcr on test for the past day or so - very nice work. Congrats to all involved.
> 
> Thought you may be interested in the attached GIF. It was only a matter of time before something like this came along ...
> 
> Si.
> 
>  <<forgiving26.gif>> 
> 
> .
	I've seen three of these this morning alone...  and FuzzyOCR isn't trapping them.  

	--Michel
	Wolfstar Systems