You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kelly Jones <ke...@gmail.com> on 2006/12/23 07:05:07 UTC
Despeckling images for OCR and anti-spam purposes
Spammers are starting to put "speckles" in their images to defeat
OCR-scanning plugins such as FuzzyOCR.
I thought ImageMagick's -despeckle option would help, but it doesn't
seem to, not even when applied multiple times, not even in conjunction
with -monochrome.
I want a filter that does this for each pixel X:
1) if any of X's 8 neighbor pixels is the same color, turn X black
2) otherwise, turn X white
Can some combination of options to convert do this?
I realize that:
1. This will only work w/ indexed-color images (eg, GIFs) and not JPEGs, etc.
2. Spammers will soon work around this, so this is just a short-term bandage.
3. I could write something in libgd to do this (blech!)
--
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.
Re: Despeckling images for OCR and anti-spam purposes
Posted by René Berber <r....@computer.org>.
Kenneth Porter wrote:
> --On Saturday, December 23, 2006 12:43 PM +0100 decoder
> <de...@own-hero.net> wrote:
>
>> Which images are you refering to? If you can put up a sample, then I
>> can tell you which scanner setting will catch it :)
>
> Does the SA wiki support uploading of images? Perhaps we could have a
> page of just problem images. [snip]
Bad idea, it would help spammers more than anybody else.
--
René Berber
Re: Despeckling images for OCR and anti-spam purposes
Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Kenneth Porter wrote:
> --On Saturday, December 23, 2006 12:43 PM +0100 decoder
> <de...@own-hero.net> wrote:
>
>> Which images are you refering to? If you can put up a sample,
>> then I can tell you which scanner setting will catch it :)
>
> Does the SA wiki support uploading of images? Perhaps we could have
> a page of just problem images. Such a page is likely to grow large
> and consume a lot of bandwidth, so perhaps we could get a resource
> that thumbnails them and runs them through the Coral Cache.
I'm not sure about the SA wiki but you can create a ticket for it on
our side and attach the picture :) Maybe I can create a wiki page for
it as well on our page that allows uploading/appending of images. You
can find the page at fuzzyocr.own-hero.net.
Chris
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFjYNrJQIKXnJyDxURAs8PAJ0TMpqHh47zay0wN8MPwFkcyluknQCeJU9m
YOi1MNkEKQ/0YcIe4VhCVSs=
=2LK1
-----END PGP SIGNATURE-----
Re: Despeckling images for OCR and anti-spam purposes
Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Saturday, December 23, 2006 12:43 PM +0100 decoder
<de...@own-hero.net> wrote:
> Which images are you refering to? If you can put up a sample, then I
> can tell you which scanner setting will catch it :)
Does the SA wiki support uploading of images? Perhaps we could have a page
of just problem images. Such a page is likely to grow large and consume a
lot of bandwidth, so perhaps we could get a resource that thumbnails them
and runs them through the Coral Cache.
Re: Despeckling images for OCR and anti-spam purposes
Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Kelly Jones wrote:
> Spammers are starting to put "speckles" in their images to defeat
> OCR-scanning plugins such as FuzzyOCR.
Which images are you refering to? If you can put up a sample, then I
can tell you which scanner setting will catch it :)
Best regards,
Chris
>
> I thought ImageMagick's -despeckle option would help, but it
> doesn't seem to, not even when applied multiple times, not even in
> conjunction with -monochrome.
>
> I want a filter that does this for each pixel X:
>
> 1) if any of X's 8 neighbor pixels is the same color, turn X black
> 2) otherwise, turn X white
>
> Can some combination of options to convert do this?
>
> I realize that:
>
> 1. This will only work w/ indexed-color images (eg, GIFs) and not
> JPEGs, etc. 2. Spammers will soon work around this, so this is just
> a short-term bandage. 3. I could write something in libgd to do
> this (blech!)
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFjRZqJQIKXnJyDxURAt4YAKCCpRPORjqRy2l6UejArzZKH6Ar1ACghlCC
PcRpJ+Ur+RUvHMy0OY6eDms=
=EJCE
-----END PGP SIGNATURE-----
Re: Despeckling images for OCR and anti-spam purposes
Posted by René Berber <r....@computer.org>.
Kelly Jones wrote:
> Spammers are starting to put "speckles" in their images to defeat
> OCR-scanning plugins such as FuzzyOCR.
That's a very old technique.
> I thought ImageMagick's -despeckle option would help, but it doesn't
> seem to, not even when applied multiple times, not even in conjunction
> with -monochrome.
Have you tried a simple `gocr -d 4 ...` it does a good job with those images.
> I want a filter that does this for each pixel X:
man gocr:
...
-d size
set dust size in pixels (clusters smaller than this are
removed), 0 means no clusters are removed, the default is -1 for
auto detection
...
> 1) if any of X's 8 neighbor pixels is the same color, turn X black
> 2) otherwise, turn X white
>
> Can some combination of options to convert do this?
>
> I realize that:
>
> 1. This will only work w/ indexed-color images (eg, GIFs) and not JPEGs,
> etc.
> 2. Spammers will soon work around this, so this is just a short-term
> bandage.
> 3. I could write something in libgd to do this (blech!)
Whatever.
--
René Berber