You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Igor Chudov <ig...@chudov.com> on 2009/04/24 22:12:51 UTC

Another bad kind of spams, for Pfizer knockoffs with image

I get plenty of these also, and cannot get them to score well. 

These advertise knockoffs of bestselling Pfizer products. The text is
meaningless garbage text. The sales message is contained in a PNG
image, but it could be other image types like jpeg. 

       http://igor.chudov.com/tmp/spam008.txt

Any ides what I can do?

i

Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by James Wilkinson <sa...@aprilcottage.co.uk>.
Charles Gregory wrote:
> I've been scoring the attachment name pattern with a 'full' test.
> But this will only work until they figure ways to randomize the 
> attachment names....

The mimeheader plugin can do that and is much cheaper.

The
    <STYLE>
    Abody
    Ahead
    </STYLE>
part of the HTML seems to be a good spam sign, too. I can’t come up with
a test (other than a full test) that will actually match all of that
with 3.2.x: the rawbody rule matches one line at a time. A meta on both
Abody and Ahead in the rawbody seems to do a pretty good job.

To what extent should Windows Mail be counted as a variant of
Outlook/Outlook Express? It’s not caught in __ANY_OUTLOOK_MUA: should it
be?

Hope this helps,

James.

-- 
E-mail:     james@ | ... a sign carefully conveying in pictograms the fact
aprilcottage.co.uk | that you should not leave wheelchairs on a certain river
                   | bank as they would roll down the hill and the crocs would
                   | eat the passenger.                                -- Skud

Re: SMTP-callbacks (aka Sender Verify, Sender callouts, SAV)

Posted by John Rudd <jr...@ucsc.edu>.
On Sun, Apr 26, 2009 at 14:01, Adam Katz <an...@khopis.com> wrote:
> Charles Gregory wrote:
>> On Fri, 24 Apr 2009, Adam Katz wrote:

>
> The more pressing point (since fixing the one you mentioned is pretty
> simple) is that when you use a call to a sender's MX record and either
> use SMTP's VRFY command or pretend to begin a message, you're wasting
> their bandwidth and even acting like a spammer yourself.
>
> In extreme cases, this is also an accidental DDoS attack.  A spammer
> aware of such mechanisms can use SAV-enabled servers LIKE YOURS to
> purposefully launch DDoS attacks against whomever they're forging.
>

Yup, SMTP callbacks and challenge-response mechanisms are both major
blights upon the internet.  They're rude, they're prone to abuse,
they're pushing your spam problem onto someone else's servers... and
on and on.  There's no excuse for using them.

(and, frankly, whenever I get a stray challenge-response, I answer it
... and I'm not the only one, so that also means that challenge
response mechanisms aren't reliable, exactly because you're pushing
your spam solution onto someone else, and you have no idea what that
someone else might do about it)

Re: SMTP-callbacks (aka Sender Verify, Sender callouts, SAV)

Posted by Adam Katz <an...@khopis.com>.
Charles Gregory wrote:
> On Fri, 24 Apr 2009, Adam Katz wrote:
>> I read recently that that's a Bad Thing (and I'm leaning on agreeing):
>> http://www.backscatterer.org/?target=sendercallouts
> 
> The most compelling argument on that site is one that almost slips by
> un-noticed. A spammer could very well forge a honeypot as a sender
> address, causing my system to 'send mail' (a verify) to a honeypot, and
> possibly get blacklisted. And this would also open up a way for spammers
> to 'poison' honey pots by having them blacklist so many legitimate
> servers that the blacklists have to be thrown out.... Ouch.

Actually, that's referring to backscatter itself.  You should never send
bounce messages, challenge-response, vacation messages, or other
automated responses to external accounts via email.  It should be done
with SMTP codes during the initial transaction.  See:
http://www.spamcop.net/fom-serve/cache/329.html
http://en.wikipedia.org/wiki/Backscatter_spam
and of course, the rest of the www.backscatterer.org site.

The more pressing point (since fixing the one you mentioned is pretty
simple) is that when you use a call to a sender's MX record and either
use SMTP's VRFY command or pretend to begin a message, you're wasting
their bandwidth and even acting like a spammer yourself.

In extreme cases, this is also an accidental DDoS attack.  A spammer
aware of such mechanisms can use SAV-enabled servers LIKE YOURS to
purposefully launch DDoS attacks against whomever they're forging.

Re: SMTP-callbacks (aka Sender Verify, Sender callouts, SAV)

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 24 Apr 2009, Adam Katz wrote:
> I read recently that that's a Bad Thing (and I'm leaning on agreeing):
> http://www.backscatterer.org/?target=sendercallouts

The most compelling argument on that site is one that almost slips by 
un-noticed. A spammer could very well forge a honeypot as a sender 
address, causing my system to 'send mail' (a verify) to a honeypot, and 
possibly get blacklisted. And this would also open up a way for spammers 
to 'poison' honey pots by having them blacklist so many legitimate 
servers that the blacklists have to be thrown out.... Ouch.

Mind you, I receive mail on a different IP address than my outgoing mail.
So even if the incoming server was blacklisted for verifies, this wouldn't 
impede my legitimate outgoing mail. Or would it....? Hmmmm......

- Charles

SMTP-callbacks (aka Sender Verify, Sender callouts, SAV)

Posted by Adam Katz <an...@khopis.com>.
Charles Gregory wrote:
> On my system I also have SMTP-callbacks, so if the envelope sender is
> not deliverable ...

I read recently that that's a Bad Thing (and I'm leaning on agreeing):
http://www.backscatterer.org/?target=sendercallouts

Sure, you can justify it with CAN-SPAM mentality (you're required to
facilitate one transaction for the opt-out, etc), but it's an
interesting point nonetheless.

I had (once upon a time) though about implementing a system where it
uses a series of fail-overs, so e.g. try DKIM, then SPF, then SAV
(Sender Address Verify, a.k.a. Sender callouts, a.k.a.
SMTP-callbacks).  This means that SAV would not be used for any domain
that already has DKIM or SPF.  Since I also have greylisting in front
of all of that, that would make the invasive SAV calls far more rare
and targeted mostly at legit senders rather than forged ones.

Thoughts?

Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by Charles Gregory <cg...@hwcn.org>.
On Fri, 24 Apr 2009, Igor Chudov wrote:
> .... The sales message is contained in a PNG image....
>       http://igor.chudov.com/tmp/spam008.txt
> Any ides what I can do?

I've been scoring the attachment name pattern with a 'full' test.
But this will only work until they figure ways to randomize 
the attachment names....

On my system I also have SMTP-callbacks, so if the envelope sender is not 
deliverable *and* has an attachment "DSL####.png" (or latest, a gif 
file with no name), I score twice as heavy.

- C

Re: [sa-list] Re: A rant about FUZZY_OCR

Posted by John Hardin <jh...@impsec.org>.
On Mon, 27 Apr 2009, Dan Mahoney, System Admin wrote:

> 3) Wordlists loadable from userprefs, if not bayes.

Along with that, the detected words should be (somehow) fed into bayes for 
analysis along with the other message text.

We touched on that last time fuzzyOCR was active.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Vista is at best mildly annoying and at worst makes you want to
   rush to Redmond, Wash. and rip somebody's liver out.      -- Forbes
-----------------------------------------------------------------------
  96 days since Obama's inauguration and still no unicorn!

Re: [sa-list] Re: A rant about FUZZY_OCR

Posted by "Dan Mahoney, System Admin" <da...@prime.gushi.org>.
On Mon, 27 Apr 2009, Jo Rhett wrote:

> On Apr 27, 2009, at 1:16 PM, Dan Mahoney, System Admin wrote:
>> The problem exists now, there is PNG spam, and there will continue to be, 
>> because it gets through.  Right now the only way I find this blocked is if 
>> spamcop blocks it.
>
>
> Just as a point of reference, I'd like to note that we haven't bothered with 
> FuzzyOCR here and absolute none of the spam which reaches my inbox is a PNG 
> or JPG or GIF spam.   SA does block it, and it does so without FuzzyOCR.
>
> That said, we have jacked the scores for e-mail with images and no text and 
> that might be why.   We never, ever receive valid e-mail with no text in it.

The spam I've been getting contains text, lots of it.  Markov-chain like 
crap that is 100 percent nonrelevant to the image.

-Dan


-- 

"She's NOT my girlfriend!"

-Dan Mahoney, Quite a bit recently.

--------Dan Mahoney--------
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---------------------------


Re: A rant about FUZZY_OCR

Posted by LuKreme <kr...@kreme.com>.
On 27-Apr-2009, at 16:06, Jo Rhett wrote:
> On Apr 27, 2009, at 1:16 PM, Dan Mahoney, System Admin wrote:
>> The problem exists now, there is PNG spam, and there will continue  
>> to be, because it gets through.  Right now the only way I find this  
>> blocked is if spamcop blocks it.
>
> Just as a point of reference, I'd like to note that we haven't  
> bothered with FuzzyOCR here and absolute none of the spam which  
> reaches my inbox is a PNG or JPG or GIF spam.   SA does block it,  
> and it does so without FuzzyOCR.

Yeah, I've not seen an image spam in my mailboxes in a long time.  I  
figured people were getting spam I'm not getting...

> We never, ever receive valid e-mail with no text in it.

Oh, I do all the time, but it's from people whom the AWL scores well  
down, pulling them out of spam range (My brother often sends me silly  
pictures with nothing else in the email).

BTW, is there anyway to see what the AWL adjustment is for a  
particular email or for a specific sender couplet?

-- 
Anybody who could duck the Vietnam war can certainly duck a couple of
shoes. -- Chris Gehlker


Re: [sa-list] Re: A rant about FUZZY_OCR

Posted by Jo Rhett <jr...@netconsonance.com>.
On Apr 27, 2009, at 1:16 PM, Dan Mahoney, System Admin wrote:
> The problem exists now, there is PNG spam, and there will continue  
> to be, because it gets through.  Right now the only way I find this  
> blocked is if spamcop blocks it.


Just as a point of reference, I'd like to note that we haven't  
bothered with FuzzyOCR here and absolute none of the spam which  
reaches my inbox is a PNG or JPG or GIF spam.   SA does block it, and  
it does so without FuzzyOCR.

That said, we have jacked the scores for e-mail with images and no  
text and that might be why.   We never, ever receive valid e-mail with  
no text in it.

-- 
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source  
and other randomness




Re: [sa-list] Re: A rant about FUZZY_OCR

Posted by "Dan Mahoney, System Admin" <da...@prime.gushi.org>.
On Mon, 27 Apr 2009, Henrik K wrote:
> Nothing of this makes sense. If you don't have a test server, too bad. If
> you don't trust the "score-changing values" too bad. It all worked for me.
>
>> It's a great idea, but I'd like to see it mature some first, especially
>> with respect to its documentation, test emails, word list, and live testing.
>
> If was quickly developed to an ongoing problem. The problem disappeared
> years ago. It was mature enough for 99% of users at that time. Though it did
> add lots of complexity and stricter MTA rules etc handled the job just fine
> also.

The problem exists now, there is PNG spam, and there will continue to be, 
because it gets through.  Right now the only way I find this blocked is if 
spamcop blocks it.

Ideally, what I'd probably like to see with regard to fuzzyOCR are:

1) Just patch it enough to work with 3.2 and 3.3 -- I don't have the 
internals know-how to do this, and I don't know if Decoder still reads 
this list.

2) A debug mode, whereby the plugin would note its own score, possibly by 
applying an equal negative value.

3) Wordlists loadable from userprefs, if not bayes.

4) A recommended configuration, along with "shortcircuit" documentation.

-Dan

-- 

"Ca. Tas. Tro. Phy."

-John Smedley, March 28th 1998, 3AM

--------Dan Mahoney--------
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---------------------------


Re: A rant about FUZZY_OCR

Posted by Henrik K <he...@hege.li>.
On Sun, Apr 26, 2009 at 02:37:06PM -0400, Adam Katz wrote:
> > On Fri, Apr 24, 2009 at 05:14:21PM -0400, Adam Katz wrote:
> >> I wouldn't trust FUZZY_OCR with anything.  12 points is *WAY* too high
> >> for any single thing.  I had to disable this plugin a year or three
> >> ago because it assigned 20+ points to legit screenshots in ham (and
> >> that was /after/ I trimmed its flagging words file down in size)!
> 
> Henrik K wrote:
> > You do realize that it's configurable? Who to blame if you just run
> > things blindly.
> 
> I expect the defaults to at least border on sane.  As noted before, I've
> tried and failed to configure it.  Could you point me at where the
> configuration options are specified, specifically focr_threshold?  All I
> see is the installation manual and the .cf file, neither of which is
> terribly informative (like say the perldoc pages for other plugins).

Unfortunately it's not a sane world. But if you have any logic, you will see
that focr_base_score and focr_add_score mainly make up the score. One can
argue that the popular "botnet" plugin also doesn't have sane defaults.

> I don't know if I still have the example of the bad hit from those years
> ago, but it made absolutely no sense, hitting dozens of "words found"
> that did not exist ... and this was a PNG screen capture, not even a
> photo or a JPEG-compressed image.  My company deals with screen captures
> a LOT, and I just can't afford for such a poorly designed plugin to run
> amok the way Fuzzy OCR does.

I'm sorry that you are disappointed on the design. If you need "mission
critical" code, then you must expect that code people generously make on
their spare time for free might have few kinks around. Were you on fuzzyocr
mailing list few years ago and participate on the development process?

> It's extremely disturbing that there are several tests (which is a good
> thing), but none of them are designed to test for false positives, or
> even to help you tweak the detection threshold.  You're left guessing
> what reasonable levels are, especially when the config file (the best
> docs I could find) points you at the manual (which I believe is the
> install guide, which doesn't even include the string "thresh").
> The last release was two years ago, and even on the svn trunk, the word
> list hasn't been updated ... ever (excepting minor tweaks like a
> threshold change from 0.1 to 0.01).  How is this fair?

The plugin was last needed few years ago? Why is it supposed to be updated
to this day as there was no image spam? There is not much point making
general word lists. You put there what your mail flow sees. Someone from
medical company could be using it and come screaming at the "bad defaults"..

> The claim that FUZZY_OCR can't use the Bayesian database is a weak one,
> too; just make a custom prefix to the tokens it creates (I don't know
> SA's bayes token syntax, but other implementations use things like
> "subject:foo" to indicate that the word "foo" in the subject differs
> from the word "foo" elsewhere, so you could have "fuzzyocr:foo"
> instead).  Implement the fuzziness by inserting a dozen tokens for each
> possible parsing.)  This would solve the issue of stale or inappropriate
> word lists.

You are free to contribute code. If I remember right, someone might have
been trying it, maybe some talk can be found on mailinglist archives.

> Finally, I have no way of testing the thing live.  Since FUZZY_OCR is a
> dynamically scored rule, I can't just push it to 0.001 and see the hits,
> the way I can with the BAYES_XX thresholds for example.  (Sure, I can
> make all score-changing values 0.001, but I'm not sure that would
> properly test it, and given my past experiences, I wouldn't be surprised
> if this still causes problems.)

Nothing of this makes sense. If you don't have a test server, too bad. If
you don't trust the "score-changing values" too bad. It all worked for me.

> It's a great idea, but I'd like to see it mature some first, especially
> with respect to its documentation, test emails, word list, and live testing.

If was quickly developed to an ongoing problem. The problem disappeared
years ago. It was mature enough for 99% of users at that time. Though it did
add lots of complexity and stricter MTA rules etc handled the job just fine
also.

Cheers,
Henrik

A rant about FUZZY_OCR

Posted by Adam Katz <an...@khopis.com>.
> On Fri, Apr 24, 2009 at 05:14:21PM -0400, Adam Katz wrote:
>> I wouldn't trust FUZZY_OCR with anything.  12 points is *WAY* too high
>> for any single thing.  I had to disable this plugin a year or three
>> ago because it assigned 20+ points to legit screenshots in ham (and
>> that was /after/ I trimmed its flagging words file down in size)!

Henrik K wrote:
> You do realize that it's configurable? Who to blame if you just run
> things blindly.

I expect the defaults to at least border on sane.  As noted before, I've
tried and failed to configure it.  Could you point me at where the
configuration options are specified, specifically focr_threshold?  All I
see is the installation manual and the .cf file, neither of which is
terribly informative (like say the perldoc pages for other plugins).

Searching for it http://google.com/search?q=FUZZY_OCR finds an
OVERWHELMING MAJORITY of hits describing false positives and
configuration issues.  The official documentation didn't even make it to
the top 100 hits in Google, and after finding it on the SA wiki (google
hit #59), I found it sparse at best (I had to dive into the svn repo!).

The FAQ, which features only two answered questions, includes an
un-answered question about how to cap the score, which IMHO is a
mission-critical feature.

I don't know if I still have the example of the bad hit from those years
ago, but it made absolutely no sense, hitting dozens of "words found"
that did not exist ... and this was a PNG screen capture, not even a
photo or a JPEG-compressed image.  My company deals with screen captures
a LOT, and I just can't afford for such a poorly designed plugin to run
amok the way Fuzzy OCR does.

It's extremely disturbing that there are several tests (which is a good
thing), but none of them are designed to test for false positives, or
even to help you tweak the detection threshold.  You're left guessing
what reasonable levels are, especially when the config file (the best
docs I could find) points you at the manual (which I believe is the
install guide, which doesn't even include the string "thresh").

The last release was two years ago, and even on the svn trunk, the word
list hasn't been updated ... ever (excepting minor tweaks like a
threshold change from 0.1 to 0.01).  How is this fair?

The claim that FUZZY_OCR can't use the Bayesian database is a weak one,
too; just make a custom prefix to the tokens it creates (I don't know
SA's bayes token syntax, but other implementations use things like
"subject:foo" to indicate that the word "foo" in the subject differs
from the word "foo" elsewhere, so you could have "fuzzyocr:foo"
instead).  Implement the fuzziness by inserting a dozen tokens for each
possible parsing.)  This would solve the issue of stale or inappropriate
word lists.

Finally, I have no way of testing the thing live.  Since FUZZY_OCR is a
dynamically scored rule, I can't just push it to 0.001 and see the hits,
the way I can with the BAYES_XX thresholds for example.  (Sure, I can
make all score-changing values 0.001, but I'm not sure that would
properly test it, and given my past experiences, I wouldn't be surprised
if this still causes problems.)


It's a great idea, but I'd like to see it mature some first, especially
with respect to its documentation, test emails, word list, and live testing.

Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by Henrik K <he...@hege.li>.
On Fri, Apr 24, 2009 at 05:14:21PM -0400, Adam Katz wrote:
> Igor Chudov wrote:
> > Stefan and guys!!! You are awesome!!!
> 
> >   12 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
> >                             [Words found:]
> >                             ["cia***" in 3 lines]
> >                             ["via***" in 3 lines]
> >                             [(9 word occurrences found)]
> 
> I wouldn't trust FUZZY_OCR with anything.  12 points is *WAY* too high
> for any single thing.  I had to disable this plugin a year or three
> ago because it assigned 20+ points to legit screenshots in ham (and
> that was /after/ I trimmed its flagging words file down in size)!

You do realize that it's configurable? Who to blame if you just run things
blindly.


Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by Adam Katz <an...@khopis.com>.
Igor Chudov wrote:
> Stefan and guys!!! You are awesome!!!

>   12 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
>                             [Words found:]
>                             ["cia***" in 3 lines]
>                             ["via***" in 3 lines]
>                             [(9 word occurrences found)]

I wouldn't trust FUZZY_OCR with anything.  12 points is *WAY* too high
for any single thing.  I had to disable this plugin a year or three
ago because it assigned 20+ points to legit screenshots in ham (and
that was /after/ I trimmed its flagging words file down in size)!


IMHO, very very few tests should score more than BAYES_99 (3.5 of a
needed 5.0 points).  That's the whole point of using SpamAssassin - a
best-of-breed so that you need multiple angles to kill any message,
thus vastly reducing the false positive chance.

Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by Igor Chudov <ig...@chudov.com>.
Stefan and guys!!! You are awesome!!!

All I did was aptitude install fuzzyocr. Nothing else. I re-ran the
test again, and this particular spam scored for fuzzyOCR and got a
score of 16!!!

Here's the new score:

#############

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.0 BAYES_50               BODY: Bayesian spam probability is 40 to 60%
                            [score: 0.5085]
 3.0 RCVD_IN_XBL            RBL: Received via a relay in Spamhaus XBL
                            [88.236.102.45 listed in zen.spamhaus.org]
 0.9 RCVD_IN_PBL            RBL: Received via a relay in Spamhaus PBL
 0.8 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
 0.1 RDNS_NONE              Delivered to trusted network by a host with no rDNS
  12 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
                            [Words found:]
                            ["cia***" in 3 lines]
                            ["via***" in 3 lines]
                            [(9 word occurrences found)]

On Fri, Apr 24, 2009 at 10:52:30PM +0200, Stefan Luetje wrote:
> Am 24. Apr 2009 um 22:12 CEST schrieb Igor Chudov:
> > I get plenty of these also, and cannot get them to score well. 
> > 
> > These advertise knockoffs of bestselling Pfizer products. The text is
> > meaningless garbage text. The sales message is contained in a PNG
> > image, but it could be other image types like jpeg. 
> > 
> >        http://igor.chudov.com/tmp/spam008.txt
> > 
> > Any ides what I can do?
> 
> You can install FuzzyOcr
> <http://wiki.apache.org/spamassassin/FuzzyOcrPlugin>
> 
> ,----
> | X-Spam-Status: Yes, score=19.8 required=5.0 tests=BADRELAY,BAYES_99,FUZZY_OCR,
> | 	HK_IMGSPAM,HTML_MESSAGE,SAGREY autolearn=no version=3.2.5
> | X-Spam-Relay-Country: US TR
> | X-Spam-Report: =?ISO-8859-1?Q?
> | 	*  3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
> | 	*      [score: 1.0000]
> | 	*  0.3 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
> | 	*  2.5 BADRELAY bad Relay
> | 	*  2.0 HK_IMGSPAM Inline image in message, Bayes think it's spam
> | 	*   10 FUZZY_OCR BODY:
> | 	*  1.0 SAGREY Adds 1.0 to spam from first-time senders
> `----
> 
> ,----[ fuzzyocr.log ]
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "cialis" with fuzz of 0.0000
> |                       line: "ur prce viagra  cialis special offer"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "cialis" with fuzz of 0.0000
> |                       line: "lgg cialis special offer"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 0.0000
> |                       line: "ur prce viagra  cialis special offer"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 0.1667
> |                       line: "l ls lo x vagra loo mg  lo x cals omg"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 0.0000
> |                       line: " viagra hot offer"
> | 2009-04-24 22:30:08 [9756] Scanset "ocrad" generates enough hits (5), skipping further scansets...
> | 2009-04-24 22:30:08 [9756] Message is spam, score = 10.500
> | 2009-04-24 22:30:08 [9756] Adding Hash to "/home/stefan/.fuzzyocr/FuzzyOcr.hashdb"
> | 2009-04-24 22:30:08 [9756] Words found:
> |                       "cialis" in 2 lines
> |                       "viagra" in 3 lines
> |                       (7.5 word occurrences found)
> `----
> 
> 
> Greets
> Stefan
>   



Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by Stefan Luetje <st...@t-online.de>.
Am 24. Apr 2009 um 22:12 CEST schrieb Igor Chudov:
> I get plenty of these also, and cannot get them to score well. 
> 
> These advertise knockoffs of bestselling Pfizer products. The text is
> meaningless garbage text. The sales message is contained in a PNG
> image, but it could be other image types like jpeg. 
> 
>        http://igor.chudov.com/tmp/spam008.txt
> 
> Any ides what I can do?

You can install FuzzyOcr
<http://wiki.apache.org/spamassassin/FuzzyOcrPlugin>

,----
| X-Spam-Status: Yes, score=19.8 required=5.0 tests=BADRELAY,BAYES_99,FUZZY_OCR,
| 	HK_IMGSPAM,HTML_MESSAGE,SAGREY autolearn=no version=3.2.5
| X-Spam-Relay-Country: US TR
| X-Spam-Report: =?ISO-8859-1?Q?
| 	*  3.5 BAYES_99 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 99-100%
| 	*      [score: 1.0000]
| 	*  0.3 HTML_MESSAGE BODY: Nachricht enth=e4lt HTML
| 	*  2.5 BADRELAY bad Relay
| 	*  2.0 HK_IMGSPAM Inline image in message, Bayes think it's spam
| 	*   10 FUZZY_OCR BODY:
| 	*  1.0 SAGREY Adds 1.0 to spam from first-time senders
`----

,----[ fuzzyocr.log ]
| 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "cialis" with fuzz of 0.0000
|                       line: "ur prce viagra  cialis special offer"
| 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "cialis" with fuzz of 0.0000
|                       line: "lgg cialis special offer"
| 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 0.0000
|                       line: "ur prce viagra  cialis special offer"
| 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 0.1667
|                       line: "l ls lo x vagra loo mg  lo x cals omg"
| 2009-04-24 22:30:08 [9756] Scanset "ocrad" found word "viagra" with fuzz of 0.0000
|                       line: " viagra hot offer"
| 2009-04-24 22:30:08 [9756] Scanset "ocrad" generates enough hits (5), skipping further scansets...
| 2009-04-24 22:30:08 [9756] Message is spam, score = 10.500
| 2009-04-24 22:30:08 [9756] Adding Hash to "/home/stefan/.fuzzyocr/FuzzyOcr.hashdb"
| 2009-04-24 22:30:08 [9756] Words found:
|                       "cialis" in 2 lines
|                       "viagra" in 3 lines
|                       (7.5 word occurrences found)
`----


Greets
Stefan
  
-- 
,-----------------------------------------------------------------------------.
|         Stefan Lütje        |   "Die Zukunft wird morgen besser sein."   |
|  stefan.luetje@t-online.de  |               George W. Bush               |
`----Key fingerprint = BCB2 48E4 9211 C975 5A3F  B192 9B6E CCCF 99CC 44FA-----'


Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by SM <sm...@resistor.net>.
At 13:12 24-04-2009, Igor Chudov wrote:
>I get plenty of these also, and cannot get them to score well.
>
>These advertise knockoffs of bestselling Pfizer products. The text is
>meaningless garbage text. The sales message is contained in a PNG
>image, but it could be other image types like jpeg.

The following rule may help.  You'll need the ImageInfo plugin.

body PNG_200_400     eval:image_size_range('png', 200, 400, 250, 450)
describe PNG_200_400 Contains png 200-250 x 400-450
score   PNG_200_400  0.1

Adjust the score to fit your needs.

Regards,
-sm   


Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by Benny Pedersen <me...@junc.org>.
On Fri, April 24, 2009 22:56, John Hardin wrote:
> I do that check using milter-regex. A sample config file is at
> http://www.impsec.org/~jhardin/antispam/ - you'd have to edit it to match
> your needs for domain names and local MTA IP addresses.

tempfail "helo and ip does not resolve"
helo /\./n and \
connect /\[.*\..*\]/ //

home made :)

i liked to make it as dns test rule but so far it works good as is also

> I don't have a rule for SA, as I block that at the MTA.

will send email privately after this, have a rule more for milter-regex

-- 
http://localhost/ 100% uptime and 100% mirrored :)


Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by John Hardin <jh...@impsec.org>.
On Fri, 24 Apr 2009, Igor Chudov wrote:

> On Fri, Apr 24, 2009 at 01:31:37PM -0700, John Hardin wrote:
>
>> Do you have administrative access to ak74.algebra.com? That looks like
>> it's your MX host.
>
> Yep, it is my MX host. I have root access, it is a 5 year old Fedora 3
> server.

Cool.

>> If so, a MTA rule that rejects any message from the internet having a 
>> HELO without a period may block a lot of that.
>>
>> If not, a SA rule that looks for such a HELO in the Received: header 
>> that ak74.algebra.com adds might help.
>
> Do you have examples of both kinds of such rules?
>
> I am especially interested in the mailserver side, as I have a lot of
> accounts handled by that server.

I do that check using milter-regex. A sample config file is at 
http://www.impsec.org/~jhardin/antispam/ - you'd have to edit it to match 
your needs for domain names and local MTA IP addresses.

I don't have a rule for SA, as I block that at the MTA.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Win95: Where do you want to go today?
   Vista: Where will Microsoft allow you to go today?
-----------------------------------------------------------------------
  Today: Max Planck's 151st birthday

Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by Igor Chudov <ig...@chudov.com>.
On Fri, Apr 24, 2009 at 01:31:37PM -0700, John Hardin wrote:
> On Fri, 24 Apr 2009, Igor Chudov wrote:
>
>> I get plenty of these also, and cannot get them to score well.
>>
>>       http://igor.chudov.com/tmp/spam008.txt
>>
>> Any ides what I can do?
>
> Do you have administrative access to ak74.algebra.com? That looks like  
> it's your MX host.

Yep, it is my MX host. I have root access, it is a 5 year old Fedora 3
server. 

> If so, a MTA rule that rejects any message from the internet having a 
> HELO without a period may block a lot of that. I'm seeing an increase in 
> the number of messages with that particular flaw:
>
>     217 Mar 23
>     129 Mar 24
>     208 Mar 25
>     212 Mar 26
>     207 Mar 27
>     149 Mar 28
>     143 Mar 29
>     138 Mar 30
>     135 Mar 31
>     172 Apr 1
>     155 Apr 2
>      83 Apr 3
>     121 Apr 4
>     123 Apr 5
>     126 Apr 6
>     141 Apr 7
>     124 Apr 8
>     151 Apr 9
>     125 Apr 10
>     144 Apr 11
>     139 Apr 12
>     199 Apr 13
>     332 Apr 14
>     197 Apr 15
>     249 Apr 16
>     279 Apr 17
>     385 Apr 18
>     440 Apr 19
>     355 Apr 20
>     419 Apr 21
>     531 Apr 22
>     326 Apr 23
>
> If not, a SA rule that looks for such a HELO in the Received: header that 
> ak74.algebra.com adds might help.
>

Do you have examples of both kinds of such rules? 

I am especially interested in the mailserver side, as I have a lot of
accounts handled by that server. 

i

Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by John Hardin <jh...@impsec.org>.
On Fri, 24 Apr 2009, Igor Chudov wrote:

> I get plenty of these also, and cannot get them to score well.
>
>       http://igor.chudov.com/tmp/spam008.txt
>
> Any ides what I can do?

Do you have administrative access to ak74.algebra.com? That looks like 
it's your MX host.

If so, a MTA rule that rejects any message from the internet having a HELO 
without a period may block a lot of that. I'm seeing an increase in the 
number of messages with that particular flaw:

     217 Mar 23
     129 Mar 24
     208 Mar 25
     212 Mar 26
     207 Mar 27
     149 Mar 28
     143 Mar 29
     138 Mar 30
     135 Mar 31
     172 Apr 1
     155 Apr 2
      83 Apr 3
     121 Apr 4
     123 Apr 5
     126 Apr 6
     141 Apr 7
     124 Apr 8
     151 Apr 9
     125 Apr 10
     144 Apr 11
     139 Apr 12
     199 Apr 13
     332 Apr 14
     197 Apr 15
     249 Apr 16
     279 Apr 17
     385 Apr 18
     440 Apr 19
     355 Apr 20
     419 Apr 21
     531 Apr 22
     326 Apr 23

If not, a SA rule that looks for such a HELO in the Received: header that 
ak74.algebra.com adds might help.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Win95: Where do you want to go today?
   Vista: Where will Microsoft allow you to go today?
-----------------------------------------------------------------------
  Today: Max Planck's 151st birthday

Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by John Hardin <jh...@impsec.org>.
On Fri, 24 Apr 2009, Igor Chudov wrote:

> The sales message is contained in a PNG image, but it could be other 
> image types like jpeg.

Is it time to dust off FuzzyOCR again?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Win95: Where do you want to go today?
   Vista: Where will Microsoft allow you to go today?
-----------------------------------------------------------------------
  Today: Max Planck's 151st birthday

Re: Another bad kind of spams, for Pfizer knockoffs with image

Posted by Michael Scheidell <sc...@secnap.net>.

Igor Chudov wrote:
> I get plenty of these also, and cannot get them to score well. 
>
> These advertise knockoffs of bestselling Pfizer products. The text is
> meaningless garbage text. The sales message is contained in a PNG
> image, but it could be other image types like jpeg. 
>
>        http://igor.chudov.com/tmp/spam008.txt
>
> Any ides what I can do?
>   
sanesecurity and mrbl image signatures.

-- 
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 > *| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best Anti-Spam Product 2008, Network Products Guide
    * King of Spam Filters, SC Magazine 2008


Re: Image spam and failing rule

Posted by James Wilkinson <sa...@aprilcottage.co.uk>.
Gary Forrest wrote:
> Hi All
>
> We are receiving the same image spam many times, random text within the  
> body.
> The only common thing is a image attachment, with the filename in the  
> following format
>
>   DSL1234.png
>
> I have made the following ' RAWBODY ' rule
>
> /dsl[0-9]{4}\.png/i
>
> This rule works if the text appears in the body, when testing with a  
> hand telnet to port 25, but fails in practice.
> I think this is because the  RAWBODY rule does not search the text of a  
> attachment.
>
> example text of a spam
>
> ------=_NextPart_000_0075_01C9C5DF.A7950570
> Content-Type: image/png;
>        name="DSL6672.png"
> Content-Transfer-Encoding: base64
> Content-ID: <a0ff8910$101f730d$6c822ecf>
> Content-Disposition: inline
>
> Any ideas ?

mimeheader LOCAL_DSL_ATTACHMENT Content-Type =~ /name="dsl[0-9]{4}\.png"/i
(Untested.)

Hope this helps,

James.

-- 
E-mail:     james@ | top! to bottom from or backwards read not do I, post top
aprilcottage.co.uk | not do Please
                   |     -- Jeff Vian

Re: Image spam and failing rule

Posted by James Wilkinson <sa...@aprilcottage.co.uk>.
Theo Van Dinter wrote:
> It's already been mentioned, but mimeheader is the right way to look
> at the headers of MIME parts.

Charles Gregory wrote:
> Look more closely at my rule. It is checking for TWO headers,
> one after the other (separated by \n), identifying a gif with no name.
>
>>> full /Content-Type: image\/gif;\n[^a-z]+name=""/

I think you’ll find that’s one header on two lines, and mimeheader copes
with it.

Hope this helps,

James.

-- 
E-mail:     james@ | “As for Nitel, the state telephone monopoly, the less
aprilcottage.co.uk | said the better, which might well be the company’s
                   | motto.”
                   |     -- The Economist, about Nigeria

Re: Image spam and failing rule

Posted by Charles Gregory <cg...@hwcn.org>.
On Sun, 26 Apr 2009, Theo Van Dinter wrote:
> It's already been mentioned, but mimeheader is the right way to look
> at the headers of MIME parts.

Look more closely at my rule. It is checking for TWO headers,
one after the other (separated by \n), identifying a gif with no name.

>> full /Content-Type: image\/gif;\n[^a-z]+name=""/

But yes, I will be keeping 'mimeheader' in mind for tests like
the simple 'DS[LC]' png check. :)

- Charles

Re: Image spam and failing rule

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2009-04-27 at 12:16 +0200, Andy Spiegl wrote:
> > It's already been mentioned, but mimeheader is the right way to look
> > at the headers of MIME parts.
> 
> How about multiline Content-Types?

They appear to be wrapped.

$ grep -A 1 image/ dslxxxx.png.msg
Content-Type: image/png;
	name="DSL9020.png"

$ spamassassin -D
  --cf="mimeheader TEST Content-Type =~ m~image/png; name=~"
  < dslxxxx.png.msg  2>&1 | grep 'eval rule TEST'
[4719] dbg: rules: ran eval rule TEST ======> got hit (1)


> I tried without success:
>  mimeheader NAMELESSGIF_ATTACHMENT Content-Type =~ /image\/gif;\n[^a-z]+name=""/
> 
> But this seems to work:
>  mimeheader NAMELESSGIF_ATTACHMENT Content-Type =~ /image\/gif;\s*(\n\s+)?name=""/
                                                                 ^^^^^^^^^^^
The \s* matches a single space. The optional part does not match
anything. :)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Image spam and failing rule

Posted by Andy Spiegl <sp...@br-online.de>.
> > While you are at it, you can also scan for
> >   full /Content-Type: image\/gif;\n[^a-z]+name=""/

> It's already been mentioned, but mimeheader is the right way to look
> at the headers of MIME parts.

How about multiline Content-Types?
I tried without success:
 mimeheader NAMELESSGIF_ATTACHMENT Content-Type =~ /image\/gif;\n[^a-z]+name=""/

But this seems to work:
 mimeheader NAMELESSGIF_ATTACHMENT Content-Type =~ /image\/gif;\s*(\n\s+)?name=""/

Whadya think?

Thx,
 Andy.

Re: Image spam and failing rule

Posted by Theo Van Dinter <fe...@apache.org>.
It's already been mentioned, but mimeheader is the right way to look
at the headers of MIME parts.

The rule of thumb is "if you are using 'full' you're probably doing it
wrong". :)


On Sun, Apr 26, 2009 at 11:57 AM, Charles Gregory <cg...@hwcn.org> wrote:
> On Sat, 25 Apr 2009, Gary Forrest wrote:
>>
>> We are receiving the same image spam many times, random text within the
>> body. The only common thing is a image attachment, with the filename in the
>> following format
>>  DSL1234.png
>> I have made the following ' RAWBODY ' rule
>> /dsl[0-9]{4}\.png/i
>
> You need to use a 'full' rule to scan attachment names.
> While you are at it, you can also scan for
>   full /Content-Type: image\/gif;\n[^a-z]+name=""/
>
> As this seems to be the next evolution of the spam. Nameless gifs.... :)
>
> Enjoy!
>
> - Charles
>
>

Re: Image spam and failing rule

Posted by Charles Gregory <cg...@hwcn.org>.
On Sat, 25 Apr 2009, Gary Forrest wrote:
> We are receiving the same image spam many times, random text within the 
> body. The only common thing is a image attachment, with the filename in 
> the following format
>  DSL1234.png
> I have made the following ' RAWBODY ' rule
> /dsl[0-9]{4}\.png/i

You need to use a 'full' rule to scan attachment names.
While you are at it, you can also scan for
    full /Content-Type: image\/gif;\n[^a-z]+name=""/

As this seems to be the next evolution of the spam. Nameless gifs.... :)

Enjoy!

- Charles


Re: [sa-list] Re: Image spam and failing rule

Posted by Henrik K <he...@hege.li>.
On Sun, Apr 26, 2009 at 04:11:10PM -0400, Dan Mahoney, System Admin wrote:
> On Sat, 25 Apr 2009, John Hardin wrote:
>
>> On Sat, 25 Apr 2009, Gary Forrest wrote:
>>
>>> We are receiving the same image spam many times, random text within 
>>> the body.
>>
>> FuzzyOCR. It seems Spammers are trying image spam again, after giving 
>> up on it for a year or so.
>
> Is there a version of FuzzyOCR that's actually supported with the current 
> SA release?  Or under active development at all?

As I said on the other post, there were no significant image spam for years.
Why would anyone want to waste time on developing it? Mostly that stuff
comes from botnets anyway, which you can easily block even at MTA.

The author (decoder) read this list, maybe he will reply if he has any
thoughts.. if image spam is coming to another wave, maybe the developement
and mailing list will be refreshed.


Re: [sa-list] Re: Image spam and failing rule

Posted by "Dan Mahoney, System Admin" <da...@prime.gushi.org>.
On Sat, 25 Apr 2009, John Hardin wrote:

> On Sat, 25 Apr 2009, Gary Forrest wrote:
>
>> We are receiving the same image spam many times, random text within the 
>> body.
>
> FuzzyOCR. It seems Spammers are trying image spam again, after giving up on 
> it for a year or so.

Is there a version of FuzzyOCR that's actually supported with the current 
SA release?  Or under active development at all?

-Dan

--

"Man, this is such a trip"

-Dan Mahoney, October 25, 1997

--------Dan Mahoney--------
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---------------------------


Re: Image spam and failing rule

Posted by RW <rw...@googlemail.com>.
On Sat, 25 Apr 2009 16:10:41 -0500
Igor Chudov <ig...@chudov.com> wrote:

> On Sat, Apr 25, 2009 at 02:09:05PM -0700, John Hardin wrote:
> > On Sat, 25 Apr 2009, Gary Forrest wrote:

> > FuzzyOCR. It seems Spammers are trying image spam again, after
> > giving up on it for a year or so.
> >
> 
> Why did spammers give up on it, seems like a good idea?

There are probably many reasons, but I suspect that fundamentally it
doesn't make good spam. I think that most spam needs one of two things,
either it should look like a legitimate email, or it should have a
clickable link. 

Before image spams, spammers experimented with fragmented urls, but it
never really caught-on. If people wont reassemble urls with
cut-and-paste, they're even less likely to type them in from
captcha-style text.

Re: Image spam and failing rule

Posted by Igor Chudov <ig...@chudov.com>.
On Sat, Apr 25, 2009 at 02:09:05PM -0700, John Hardin wrote:
> On Sat, 25 Apr 2009, Gary Forrest wrote:
>
>> We are receiving the same image spam many times, random text within the 
>> body.
>
> FuzzyOCR. It seems Spammers are trying image spam again, after giving up  
> on it for a year or so.
>

Why did spammers give up on it, seems like a good idea?

Re: Image spam and failing rule

Posted by John Hardin <jh...@impsec.org>.
On Sat, 25 Apr 2009, Gary Forrest wrote:

> We are receiving the same image spam many times, random text within the 
> body.

FuzzyOCR. It seems Spammers are trying image spam again, after giving up 
on it for a year or so.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79

Image spam and failing rule

Posted by Gary Forrest <ga...@netnorth.co.uk>.
Hi All

We are receiving the same image spam many times, random text within the 
body.
The only common thing is a image attachment, with the filename in the 
following format

   DSL1234.png

I have made the following ' RAWBODY ' rule

/dsl[0-9]{4}\.png/i

This rule works if the text appears in the body, when testing with a 
hand telnet to port 25, but fails in practice.
I think this is because the  RAWBODY rule does not search the text of a 
attachment.

example text of a spam

------=_NextPart_000_0075_01C9C5DF.A7950570
Content-Type: image/png;
        name="DSL6672.png"
Content-Transfer-Encoding: base64
Content-ID: <a0ff8910$101f730d$6c822ecf>
Content-Disposition: inline
 
Any ideas ?

Thanks in advance
Regards
Gary