You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by decoder <de...@own-hero.net> on 2006/08/08 23:04:50 UTC

subject was meant to be "new version, please test" ;) -nt-

decoder wrote:
> decoder wrote:
> >> Hello there,
> >>
> >> I have improved the original OcrPlugin (found at
> >> http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
> >> fuzzy matching. Like that, mistakes made by the OCR recognition or
> >> intentional obfuscations in the text don't make the recognition
> >> impossible. This is being done with a relative distance calculation
> >>  between the pattern (word from a given word list) and a line in
> >> the recognized input. Also, the plugin uses dynamic scoring (more
> >> matched words means more score, this can be adjusted in the
> >> source).
> >>
> >> You can find a full description and an example in the wiki under:
> >>
> >> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
> >>
> >>
> >> Ideas for improvements or critics are always welcome :)
> >>
> >>
> >> Best regards,
> >>
> >>
> >> Chris
>
> See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>
> Major changes: Replaced imagemagick with netpbm, support png, invoked
> giffix for broken gifs, detect image format with magic bytes and not
> by content-type, added various configuration options.
>
> Feedback is welcome  :)
>
> Chris


Re: new version, please test

Posted by Mathias Tauber <ta...@hdpnet.de>.
Just a little typo I think:

On the wiki it says:

"... focr_tmp_path - String determining the absolute path to a directory where the plugin may write temporary files to (without trailing slash) focr_verbosity -
Verbose level (0 - 2). ..."

As far as I found out it should be "focr_verbose" and not "focr_verbosity". In the example config file it is written correctly...


Mathias

Re: new version, please test

Posted by Matthias Keller <li...@matthias-keller.ch>.
decoder wrote:
> decoder wrote:
>   
>> decoder wrote:
>>     
>>>> Hello there,
>>>>
>>>> I have improved the original OcrPlugin (found at
>>>> http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
>>>> fuzzy matching. Like that, mistakes made by the OCR recognition or
>>>> intentional obfuscations in the text don't make the recognition
>>>> impossible. This is being done with a relative distance calculation
>>>>  between the pattern (word from a given word list) and a line in
>>>> the recognized input. Also, the plugin uses dynamic scoring (more
>>>> matched words means more score, this can be adjusted in the
>>>> source).
>>>>
>>>> You can find a full description and an example in the wiki under:
>>>>
>>>> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>>>>
>>>>
>>>> Ideas for improvements or critics are always welcome :)
>>>>
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Chris
>>>>         
>> See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>>
>> Major changes: Replaced imagemagick with netpbm, support png, invoked
>> giffix for broken gifs, detect image format with magic bytes and not
>> by content-type, added various configuration options.
>>
>> Feedback is welcome  :)
>>
>> Chris
>>     
Hi Chris

Wanted to report back: works like a charm here, thanks for the png 
support - got one today with 23 hits :)

Now I just have to figure out why I get so poor results on colourful 
images with gocr...

thanks for your work!!

Matt