You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jeff Chan <je...@surbl.org> on 2006/10/30 16:19:44 UTC

ocrtext vs FuzzyOCR?

Does anyone have any opinions on which of these is better:

  http://wiki.apache.org/spamassassin/CustomPlugins

OCR scanner and image validator SA-plugin
Checks for specific keywords in gif/jpg/png attachments, using
gocr. This can be used to detect spam that puts all the real
contect in an attached image, accompanied with random text and
html (no URL's, etc). There are also various rules to validate
attached images and detect forged content types or broken images.
This plugin needs SpamAssassin 3.1.1 or later. The version 2.0 is
able to defeat recent gif animations which use gif tricks to
avoid OCR.
Created by: Martin Blapp
Contact: mb -at- imp -dot- ch
License Type: BSD
Status: active
Available at: [WWW] http://antispam.imp.ch/patches/patch-ocrtext
Note: Feedback and new sample images are welcome. Please test and send reports.


Fuzzy OCR Plugin
Derived from OcrPlugin (see above), but has many feature
enhancements, including an approximate matching algorithm to
compensate recognition errors and obfuscation, support for broken
gifs, jpeg and png, dynamic scoring, automatic content-type
independant format detection and many more.
Created by: Christian Holler
Contact: decoder_at_own-hero_dot_net
License Type: Same as SpamAssassin
Status: active
Available at: FuzzyOcrPlugin
Note: Feedback and new sample images are welcome. Please test and send reports. 

Jeff C.
-- 
Jeff Chan
mailto:jeffc@surbl.org
http://www.surbl.org/


Re: ocrtext vs FuzzyOCR?

Posted by James Lay <jl...@slave-tothe-box.net>.
On Mon, 30 Oct 2006 17:15:51 +0100
decoder <de...@own-hero.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> James Lay wrote:
>
> >>
> >> Jeff C. --
> >
> > I'd like to see something on this myself.  The segfault patch for
> > Fuzzy OCR failed, so I stopped right there as I wasn't sure what to
> > do next.
> >
> This is no patch for FuzzyOcr but for gocr. You will need it with
> every OCR plugin that uses gocr... It should work with version 0.40
> 
> Best regards,
> 
> Chris
> 
> > James
> 

Interesting.  Here's what I get patching gocr-0.41.  Patched fine
with 0.40 though.  Guess this is just an FYI really

 patching file src/pgm2asc.c
Hunk #1 FAILED at 1200.
Hunk #2 succeeded at 1719 with fuzz 2 (offset 466 lines).
1 out of 2 hunks FAILED -- saving rejects to file src/pgm2asc.c.rej

James

Re: ocrtext vs FuzzyOCR?

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



James Lay wrote:
> On Mon, 30 Oct 2006 07:19:44 -0800 Jeff Chan <je...@surbl.org>
> wrote:
>
>> Does anyone have any opinions on which of these is better:
>>
>> http://wiki.apache.org/spamassassin/CustomPlugins
>>
>> OCR scanner and image validator SA-plugin Checks for specific
>> keywords in gif/jpg/png attachments, using gocr. This can be used
>> to detect spam that puts all the real contect in an attached
>> image, accompanied with random text and html (no URL's, etc).
>> There are also various rules to validate attached images and
>> detect forged content types or broken images. This plugin needs
>> SpamAssassin 3.1.1 or later. The version 2.0 is able to defeat
>> recent gif animations which use gif tricks to avoid OCR. Created
>> by: Martin Blapp Contact: mb -at- imp -dot- ch License Type: BSD
>> Status: active Available at: [WWW]
>> http://antispam.imp.ch/patches/patch-ocrtext Note: Feedback and
>> new sample images are welcome. Please test and send reports.
>>
>>
>> Fuzzy OCR Plugin Derived from OcrPlugin (see above), but has many
>> feature enhancements, including an approximate matching algorithm
>> to compensate recognition errors and obfuscation, support for
>> broken gifs, jpeg and png, dynamic scoring, automatic
>> content-type independant format detection and many more. Created
>> by: Christian Holler Contact: decoder_at_own-hero_dot_net License
>> Type: Same as SpamAssassin Status: active Available at:
>> FuzzyOcrPlugin Note: Feedback and new sample images are welcome.
>> Please test and send reports.
>>
>> Jeff C. --
>
> I'd like to see something on this myself.  The segfault patch for
> Fuzzy OCR failed, so I stopped right there as I wasn't sure what to
> do next.
>
This is no patch for FuzzyOcr but for gocr. You will need it with
every OCR plugin that uses gocr... It should work with version 0.40

Best regards,

Chris

> James

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFRiU2JQIKXnJyDxURAhB4AJ4vDRdlck+1I0D0HSNu0AFikgn13QCffOyi
0Tq0HJJvW7lrUGUKEKwX/EE=
=xWpz
-----END PGP SIGNATURE-----

Re: ocrtext vs FuzzyOCR?

Posted by James Lay <jl...@slave-tothe-box.net>.
On Mon, 30 Oct 2006 07:19:44 -0800
Jeff Chan <je...@surbl.org> wrote:

> Does anyone have any opinions on which of these is better:
> 
>   http://wiki.apache.org/spamassassin/CustomPlugins
> 
> OCR scanner and image validator SA-plugin
> Checks for specific keywords in gif/jpg/png attachments, using
> gocr. This can be used to detect spam that puts all the real
> contect in an attached image, accompanied with random text and
> html (no URL's, etc). There are also various rules to validate
> attached images and detect forged content types or broken images.
> This plugin needs SpamAssassin 3.1.1 or later. The version 2.0 is
> able to defeat recent gif animations which use gif tricks to
> avoid OCR.
> Created by: Martin Blapp
> Contact: mb -at- imp -dot- ch
> License Type: BSD
> Status: active
> Available at: [WWW] http://antispam.imp.ch/patches/patch-ocrtext
> Note: Feedback and new sample images are welcome. Please test and
> send reports.
> 
> 
> Fuzzy OCR Plugin
> Derived from OcrPlugin (see above), but has many feature
> enhancements, including an approximate matching algorithm to
> compensate recognition errors and obfuscation, support for broken
> gifs, jpeg and png, dynamic scoring, automatic content-type
> independant format detection and many more.
> Created by: Christian Holler
> Contact: decoder_at_own-hero_dot_net
> License Type: Same as SpamAssassin
> Status: active
> Available at: FuzzyOcrPlugin
> Note: Feedback and new sample images are welcome. Please test and
> send reports. 
> 
> Jeff C.
> -- 

I'd like to see something on this myself.  The segfault patch for Fuzzy
OCR failed, so I stopped right there as I wasn't sure what to do next.

James