You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by "Chip M." <sa...@IowaHoneypot.com> on 2006/08/22 07:13:01 UTC

animated GIF spam

While skimming thru my daily rejected spam pile, did a double take when a
GIF spam seemed to "blink" at me.  Thought it was a sw glitch at first...
then realized the sneaky Borg had adapted again.

Took a look at the frames in PaintShopPro's AnimationShop, and the first 
three are all but blank (wee bit of noise), followed by the payload.

Below are links to the raw message, and the extracted GIF:
	http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
	http://Puffin.net/software/spam/samples/0001b_been.gif

Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)

The good news is that ImageInfo should have no problem with this particular 
instance, as the initial width x height are "correct".

Time to recalibrate those phaser frequencies!  :)
	- "Chip"

Re: animated GIF spam

Posted by decoder <de...@own-hero.net>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kenneth Porter wrote:
> --On Tuesday, August 22, 2006 1:07 AM -0500 "Chip M."
> <sa...@IowaHoneypot.com> wrote:
>
>>> For interlaced ... I have no idea.  Depends a lot on how the
>>> interlaced images are stored, I guess.
>>
>> Yes, exactly.  Until there's samples, I'm not going to worry
>> about it.
>
> There's also progressive JPEG.
>
> <http://www.faqs.org/faqs/jpeg-faq/part1/section-11.html>
> <http://en.wikipedia.org/wiki/JPEG>
> <http://en.wikipedia.org/wiki/JPEG_2000>
>
>
These do not pose a problem currently, FuzzyOcr can handle them as far
as I am aware.


Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE64DZJQIKXnJyDxURApmqAJ45da6se7aCswGQQtwOo6slEXESTACfeMIq
wYoVzlsgoebqByqdT3+ZrP4=
=BClH
-----END PGP SIGNATURE-----

Re: animated GIF spam

Posted by Kenneth Porter <sh...@sewingwitch.com>.

--On Tuesday, August 22, 2006 1:07 AM -0500 "Chip M." 
<sa...@IowaHoneypot.com> wrote:

>> For interlaced ... I have no idea.  Depends a lot on how the interlaced
>> images are stored, I guess.
>
> Yes, exactly.  Until there's samples, I'm not going to worry about it.

There's also progressive JPEG.

<http://www.faqs.org/faqs/jpeg-faq/part1/section-11.html>
<http://en.wikipedia.org/wiki/JPEG>
<http://en.wikipedia.org/wiki/JPEG_2000>

Re: animated GIF spam

Posted by "Chip M." <sa...@IowaHoneypot.com>.

At 10:26 PM 8/21/2006 -0700, John Rudd wrote:
>I also heard that interlaced gif spam is appearing now.

Yes, I saw that post, however there wasn't a publicly available sample.
Any such would be much appreciated.

>It'd be interesting to see how to counter them.

Should be easy.  One approach is "pixel density".  What I've been doing is
reading JUST enough of the header to calculate the area (just like Dallas'
excellent ImageInfo plugin), then dividing by the total raw file size of
just the image (i.e. what one gets after base64 decoding just the GIF part),
less the size of the obvious parts of the header.  Works well, and is
blindingly fast.

Ham generally have a much LOWER density, because it's typically clipart,
whereas spam is generally text, which compresses extremely well, resulting
in a much HIGHER density.  It's not fool proof, so I use a sliding scale,
and have had only one FP this month (from an idiot (redundant) recruiter to
one of my testers - the PNG misfiring was only half the points required to
reject, and the able idiot managed to do several other things rare in Ham).

The beauty is that the spammer can "easily" foil this by lowerering the
density by adding more complexity, which increases the file size, so more
bandwidth is consumed. :)

Some stock spams do use a fancier font which scores lower, so I'm still 
considering other types of analysis as a backup.

Specifically to address animated GIFs, it would be very easy to "walk" the 
raw image, calculating each frame's pixel density, simply ignoring the 
obvious chaff frames.

Tomorrow, I'll write some code to decompose the frames and see what sort of 
numbers I get.

>For interlaced ... I have no idea.  Depends a lot on how the interlaced 
>images are stored, I guess.

Yes, exactly.  Until there's samples, I'm not going to worry about it.

What we also need is a diverse Ham GIF corpus.  Does anyone know of one?
	- "Chip"

P.S.  Dallas:  it never occurred to me to _JUST_ score the area.  My pixel 
density approach fails on multi-GIFs, so you saved my bacon there. ;)

Re: animated GIF spam

Posted by Spamassassin List <sp...@gmail.com>.

> While skimming thru my daily rejected spam pile, did a double take when a
> GIF spam seemed to "blink" at me.  Thought it was a sw glitch at first...
> then realized the sneaky Borg had adapted again.
>
> Took a look at the frames in PaintShopPro's AnimationShop, and the first
> three are all but blank (wee bit of noise), followed by the payload.
>
> Below are links to the raw message, and the extracted GIF:
> http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
> http://Puffin.net/software/spam/samples/0001b_been.gif
>
> Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)
>
> The good news is that ImageInfo should have no problem with this 
> particular
> instance, as the initial width x height are "correct".

Yes ImageInfo got them well.

Re: animated GIF spam

Posted by Logan Shaw <ls...@emitinc.com>.

On Mon, 21 Aug 2006, John Rudd wrote:
> On Aug 21, 2006, at 10:13 PM, Chip M. wrote:

>> While skimming thru my daily rejected spam pile, did a double take when a
>> GIF spam seemed to "blink" at me.  Thought it was a sw glitch at first...
>> then realized the sneaky Borg had adapted again.
>> 
>> Took a look at the frames in PaintShopPro's AnimationShop, and the first
>> three are all but blank (wee bit of noise), followed by the payload.

Given the way the GIF format works, that is actually a
reasonable way to inject "salt" into a given image to throw
off checksumming.  (If only the programmer who is doing the
technical end of this would get a real job instead of working
for a spammer...)

> For animated, is there a clean break between "frames" of animation, something 
> that netpbm or whatever can easily identify and break out into individual 
> images?

Yes, briefly, the GIF format is a sequence of chunks.  Before
any image data comes along, a chunk defines the overall size of
the GIF (sort of the size of the canvas), and then you can have
a series of other chunks.  One type of chunk says "draw this
image on the virtual canvas at these coordinates using this
palette" and another says "delay this long".  Putting these
two types of chunks together in the right sequence gives the
ability to do animations.  (It also, incidentally, gives you
the ability to do full 24-bit color.  Few people know GIF
is actually capable of this.  But even though it is capable,
it is a hack, and very wasteful of space, so maybe that's for
the better.)

> It would be CPU intensive, but the right way to fight it might be to 
> run the FuzzyOCR on each frame.  And/or have a setting for "maximum frames to 
> process", and if the GIF goes over that number of frames, give it a huge spam 
> score.

Yeah, that is a bit tricky.  I can think of a way to do a
denial-of-service attack against the "run it on each frame"
approach, but I won't share what that is.  In theory, if that
happens, one could write a plugin to examine the internal
structure of the GIF and detect that.

The one thing that would be important to guard against is
suddenly flagging all animated GIFs as spam.  Although I think
they're really tacky and annoying, that doesn't mean that they
are actually spam.

> For interlaced ... I have no idea.  Depends a lot on how the interlaced 
> images are stored, I guess.  And whether or not netpbm can generate the final 
> image for processing, instead of having to work on the interlaced data.

I'm pretty sure it should be able to.  If I recall correctly,
interlaced GIFs just have the rows in a different order.
It should be no problem to get the full image.

   - Logan

Re: animated GIF spam

Posted by decoder <de...@own-hero.net>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Rudd wrote:
>
> On Aug 21, 2006, at 10:13 PM, Chip M. wrote:
>
>> While skimming thru my daily rejected spam pile, did a double take
>> when a
>> GIF spam seemed to "blink" at me.  Thought it was a sw glitch at
>> first...
>> then realized the sneaky Borg had adapted again.
>>
>> Took a look at the frames in PaintShopPro's AnimationShop, and the
>> first
>> three are all but blank (wee bit of noise), followed by the payload.
>>
>> Below are links to the raw message, and the extracted GIF:
>>     http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
>>     http://Puffin.net/software/spam/samples/0001b_been.gif
>>
>> Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)
I'll implement that in the next release :) thx :D
>>
>> The good news is that ImageInfo should have no problem with this
>> particular
>> instance, as the initial width x height are "correct".
>>
>> Time to recalibrate those phaser frequencies!  :)
>>     - "Chip"
>>
>
> I also heard that interlaced gif spam is appearing now.
This will be supported then, too. Not a big deal:)
>
> It'd be interesting to see how to counter them.
>
> For animated, is there a clean break between "frames" of animation,
> something that netpbm or whatever can easily identify and break out
> into individual images?  It would be CPU intensive, but the right
> way to fight it might be to run the FuzzyOCR on each frame.  And/or
> have a setting for "maximum frames to process", and if the GIF goes
> over that number of frames, give it a huge spam score.  Or "add this
> score per frame", so that the number of frames increases the spam
> score directly, and automatically bail out if they cross a certain
> threshold (score from number of animation frames alone >= 20, then
> just return 20 ... or something; which saves you on processing the
> frames themselves).
Sounds good :) But there might be a better way... but I'm not sure
atm, got to read up on it in the netpbm manual first:)
>
> For interlaced ... I have no idea.  Depends a lot on how the
> interlaced images are stored, I guess.  And whether or not netpbm
> can generate the final image for processing, instead of having to
> work on the interlaced data.
>
>
>

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE6rlvJQIKXnJyDxURAg8iAKCnQkgGNY/o+iJDf+WG0KSisyi32QCeJ8zR
DfefnLEv8Tkow0O6HhbieLs=
=lj4i
-----END PGP SIGNATURE-----

Re: animated GIF spam

Posted by John Rudd <jr...@ucsc.edu>.

On Aug 21, 2006, at 10:13 PM, Chip M. wrote:

> While skimming thru my daily rejected spam pile, did a double take 
> when a
> GIF spam seemed to "blink" at me.  Thought it was a sw glitch at 
> first...
> then realized the sneaky Borg had adapted again.
>
> Took a look at the frames in PaintShopPro's AnimationShop, and the 
> first
> three are all but blank (wee bit of noise), followed by the payload.
>
> Below are links to the raw message, and the extracted GIF:
> 	http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
> 	http://Puffin.net/software/spam/samples/0001b_been.gif
>
> Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)
>
> The good news is that ImageInfo should have no problem with this 
> particular
> instance, as the initial width x height are "correct".
>
> Time to recalibrate those phaser frequencies!  :)
> 	- "Chip"
>

I also heard that interlaced gif spam is appearing now.

It'd be interesting to see how to counter them.

For animated, is there a clean break between "frames" of animation, 
something that netpbm or whatever can easily identify and break out 
into individual images?  It would be CPU intensive, but the right way 
to fight it might be to run the FuzzyOCR on each frame.  And/or have a 
setting for "maximum frames to process", and if the GIF goes over that 
number of frames, give it a huge spam score.  Or "add this score per 
frame", so that the number of frames increases the spam score directly, 
and automatically bail out if they cross a certain threshold (score 
from number of animation frames alone >= 20, then just return 20 ... or 
something; which saves you on processing the frames themselves).

For interlaced ... I have no idea.  Depends a lot on how the interlaced 
images are stored, I guess.  And whether or not netpbm can generate the 
final image for processing, instead of having to work on the interlaced 
data.