You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kelly Jones <ke...@gmail.com> on 2006/12/23 07:05:07 UTC

Despeckling images for OCR and anti-spam purposes

Spammers are starting to put "speckles" in their images to defeat
OCR-scanning plugins such as FuzzyOCR.

I thought ImageMagick's -despeckle option would help, but it doesn't
seem to, not even when applied multiple times, not even in conjunction
with -monochrome.

I want a filter that does this for each pixel X:

1) if any of X's 8 neighbor pixels is the same color, turn X black
2) otherwise, turn X white

Can some combination of options to convert do this?

I realize that:

1. This will only work w/ indexed-color images (eg, GIFs) and not JPEGs, etc.
2. Spammers will soon work around this, so this is just a short-term bandage.
3. I could write something in libgd to do this (blech!)

-- 
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.

Re: Despeckling images for OCR and anti-spam purposes

Posted by René Berber <r....@computer.org>.
Kenneth Porter wrote:

> --On Saturday, December 23, 2006 12:43 PM +0100 decoder
> <de...@own-hero.net> wrote:
> 
>> Which images are you refering to? If you can put up a sample, then I
>> can tell you which scanner setting will catch it :)
> 
> Does the SA wiki support uploading of images? Perhaps we could have a
> page of just problem images. [snip]

Bad idea, it would help spammers more than anybody else.
-- 
René Berber


Re: Despeckling images for OCR and anti-spam purposes

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Kenneth Porter wrote:
> --On Saturday, December 23, 2006 12:43 PM +0100 decoder
> <de...@own-hero.net> wrote:
>
>> Which images are you refering to? If you can put up a sample,
>> then I can tell you which scanner setting will catch it :)
>
> Does the SA wiki support uploading of images? Perhaps we could have
>  a page of just problem images. Such a page is likely to grow large
>  and consume a lot of bandwidth, so perhaps we could get a resource
>  that thumbnails them and runs them through the Coral Cache.
I'm not sure about the SA wiki but you can create a ticket for it on
our side and attach the picture :) Maybe I can create a wiki page for
it as well on our page that allows uploading/appending of images. You
can find the page at fuzzyocr.own-hero.net.

Chris
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFjYNrJQIKXnJyDxURAs8PAJ0TMpqHh47zay0wN8MPwFkcyluknQCeJU9m
YOi1MNkEKQ/0YcIe4VhCVSs=
=2LK1
-----END PGP SIGNATURE-----


Re: Despeckling images for OCR and anti-spam purposes

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Saturday, December 23, 2006 12:43 PM +0100 decoder 
<de...@own-hero.net> wrote:

> Which images are you refering to? If you can put up a sample, then I
> can tell you which scanner setting will catch it :)

Does the SA wiki support uploading of images? Perhaps we could have a page 
of just problem images. Such a page is likely to grow large and consume a 
lot of bandwidth, so perhaps we could get a resource that thumbnails them 
and runs them through the Coral Cache.



Re: Despeckling images for OCR and anti-spam purposes

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Kelly Jones wrote:
> Spammers are starting to put "speckles" in their images to defeat
> OCR-scanning plugins such as FuzzyOCR.
Which images are you refering to? If you can put up a sample, then I
can tell you which scanner setting will catch it :)


Best regards,

Chris


>
> I thought ImageMagick's -despeckle option would help, but it
> doesn't seem to, not even when applied multiple times, not even in
> conjunction with -monochrome.
>
> I want a filter that does this for each pixel X:
>
> 1) if any of X's 8 neighbor pixels is the same color, turn X black
> 2) otherwise, turn X white
>
> Can some combination of options to convert do this?
>
> I realize that:
>
> 1. This will only work w/ indexed-color images (eg, GIFs) and not
> JPEGs, etc. 2. Spammers will soon work around this, so this is just
> a short-term bandage. 3. I could write something in libgd to do
> this (blech!)
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFjRZqJQIKXnJyDxURAt4YAKCCpRPORjqRy2l6UejArzZKH6Ar1ACghlCC
PcRpJ+Ur+RUvHMy0OY6eDms=
=EJCE
-----END PGP SIGNATURE-----


Re: Despeckling images for OCR and anti-spam purposes

Posted by René Berber <r....@computer.org>.
Kelly Jones wrote:

> Spammers are starting to put "speckles" in their images to defeat
> OCR-scanning plugins such as FuzzyOCR.

That's a very old technique.

> I thought ImageMagick's -despeckle option would help, but it doesn't
> seem to, not even when applied multiple times, not even in conjunction
> with -monochrome.

Have you tried a simple `gocr -d 4 ...` it does a good job with those images.

> I want a filter that does this for each pixel X:

man gocr:
...
       -d size
              set  dust  size  in  pixels  (clusters  smaller  than  this  are
              removed), 0 means no clusters are removed, the default is -1 for
              auto detection
...
> 1) if any of X's 8 neighbor pixels is the same color, turn X black
> 2) otherwise, turn X white
> 
> Can some combination of options to convert do this?
> 
> I realize that:
> 
> 1. This will only work w/ indexed-color images (eg, GIFs) and not JPEGs,
> etc.
> 2. Spammers will soon work around this, so this is just a short-term
> bandage.
> 3. I could write something in libgd to do this (blech!)

Whatever.
-- 
René Berber