You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by decoder <de...@own-hero.net> on 2006/08/10 15:56:04 UTC

New version available

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

decoder wrote:
> Hello there,
>
> I have improved the original OcrPlugin (found at
> http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
> fuzzy matching. Like that, mistakes made by the OCR recognition or
>  intentional obfuscations in the text don't make the recognition
> impossible. This is being done with a relative distance calculation
>  between the pattern (word from a given word list) and a line in
> the recognized input. Also, the plugin uses dynamic scoring (more
> matched words means more score, this can be adjusted in the
> source).
>
> You can find a full description and an example in the wiki under:
>
> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>
>
> Ideas for improvements or critics are always welcome :)
>
>
> Best regards,
>
>
> Chris

Hello again,


I just released a new version which contains all suggestions made here
on the mailing list. Changelog:

* Added scoring for wrong content-type
* Added scoring for broken gif images
* Added configuration for helper applications
* Added autodisable_score feature to disable the OCR engine if the
message has already enough points


You can now obtain the plugin as a tarball, the download URL is at the
end of the wiki page. (http://wiki.apache.org/spamassassin/FuzzyOcrPlugin)

All new options in the config file, especially score adjustments for
the new features, are explained there as well and in the sample cf file.

Chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE2zr0JQIKXnJyDxURAu16AKDG9S0aRKxo6PFKKfpHpNUD0WpcNwCgrDJb
lLq+DegAxIQbFXOn26TfNxA=
=UMKm
-----END PGP SIGNATURE-----


Re: New version available

Posted by Max de Mendizábal <ma...@upn.mx>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Just one precision on the wiki page. For Fedora Core 5 you should
install the package libgif-utils where resides the giffix program.

Regards,
Max

decoder escribió:
> decoder wrote:
>>> Hello there,
>>>
>>> I have improved the original OcrPlugin (found at
>>> http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
>>> fuzzy matching. Like that, mistakes made by the OCR recognition or
>>>  intentional obfuscations in the text don't make the recognition
>>> impossible. This is being done with a relative distance calculation
>>>  between the pattern (word from a given word list) and a line in
>>> the recognized input. Also, the plugin uses dynamic scoring (more
>>> matched words means more score, this can be adjusted in the
>>> source).
>>>
>>> You can find a full description and an example in the wiki under:
>>>
>>> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>>>
>>>
>>> Ideas for improvements or critics are always welcome :)
>>>
>>>
>>> Best regards,
>>>
>>>
>>> Chris
> 
> Hello again,
> 
> 
> I just released a new version which contains all suggestions made here
> on the mailing list. Changelog:
> 
> * Added scoring for wrong content-type
> * Added scoring for broken gif images
> * Added configuration for helper applications
> * Added autodisable_score feature to disable the OCR engine if the
> message has already enough points
> 
> 
> You can now obtain the plugin as a tarball, the download URL is at the
> end of the wiki page. (http://wiki.apache.org/spamassassin/FuzzyOcrPlugin)
> 
> All new options in the config file, especially score adjustments for
> the new features, are explained there as well and in the sample cf file.
> 
> Chris
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE20T5QEy5rMguIAsRAmyaAJ440ojnfgl7I/7EYdp2pIyu7QG7twCdEB6K
ur+4xqHMnZwTvROGujPRoyo=
=2u5i
-----END PGP SIGNATURE-----

Re: New version available

Posted by John Andersen <js...@pen.homeip.net>.
On Thursday 10 August 2006 05:56, decoder wrote:
> Hello again,
>
>
> I just released a new version which contains all suggestions made here
> on the mailing list. Changelog:
>
> * Added scoring for wrong content-type
> * Added scoring for broken gif images

Decoder:  Could I convince you to say WHAT you are releasing
a new version OF, rather than leaving the impression 
Spamassassin released a new version?



-- 
_____________________________________
John Andersen

Re: New version available

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthias Keller wrote:
> decoder wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>
>> decoder wrote:
>>
>>> Hello there,
>>>
>>> I have improved the original OcrPlugin (found at
>>> http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
>>> fuzzy matching. Like that, mistakes made by the OCR recognition
>>> or intentional obfuscations in the text don't make the
>>> recognition impossible. This is being done with a relative
>>> distance calculation between the pattern (word from a given
>>> word list) and a line in the recognized input. Also, the plugin
>>> uses dynamic scoring (more matched words means more score, this
>>> can be adjusted in the source).
>>>
>>> You can find a full description and an example in the wiki
>>> under:
>>>
>>> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>>>
>>>
>>> Ideas for improvements or critics are always welcome :)
>>>
>>>
>>> Best regards,
>>>
>>>
>>> Chris
>>>
>>
>> Hello again,
>>
>>
>> I just released a new version which contains all suggestions made
>> here on the mailing list. Changelog:
>>
>> * Added scoring for wrong content-type * Added scoring for broken
>> gif images * Added configuration for helper applications * Added
>> autodisable_score feature to disable the OCR engine if the
>> message has already enough points
>>
>>
>> You can now obtain the plugin as a tarball, the download URL is
>> at the end of the wiki page.
>> (http://wiki.apache.org/spamassassin/FuzzyOcrPlugin)
>>
>> All new options in the config file, especially score adjustments
>> for the new features, are explained there as well and in the
>> sample cf file.
>>
> Hi I get the following warnings when linting: [29661] warn: config:
> warning: description exists for non-existent rule
> FUZZY_OCR_CORRUPT_IMG [29661] warn: config: warning: description
> exists for non-existent rule FUZZY_OCR_WRONG_CTYPE [29661] warn:
> lint: 2 issues detected, please rerun with debug enabled for more
> information
>

Indeed, I didn't notice that. It runs fine though, I'll fix it anyway
by putting the descriptions into the plugin config as to be parsed by
the plugin.

Thx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE20FxJQIKXnJyDxURAuIdAJ9PccWoKPz7mL0MyMqoEN6UMTh5WQCff09N
FMEIgWO7UpMe8ziacyS/tuo=
=6czY
-----END PGP SIGNATURE-----


Re: New version available

Posted by Matthias Keller <li...@matthias-keller.ch>.
decoder wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> decoder wrote:
>   
>> Hello there,
>>
>> I have improved the original OcrPlugin (found at
>> http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
>> fuzzy matching. Like that, mistakes made by the OCR recognition or
>>  intentional obfuscations in the text don't make the recognition
>> impossible. This is being done with a relative distance calculation
>>  between the pattern (word from a given word list) and a line in
>> the recognized input. Also, the plugin uses dynamic scoring (more
>> matched words means more score, this can be adjusted in the
>> source).
>>
>> You can find a full description and an example in the wiki under:
>>
>> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>>
>>
>> Ideas for improvements or critics are always welcome :)
>>
>>
>> Best regards,
>>
>>
>> Chris
>>     
>
> Hello again,
>
>
> I just released a new version which contains all suggestions made here
> on the mailing list. Changelog:
>
> * Added scoring for wrong content-type
> * Added scoring for broken gif images
> * Added configuration for helper applications
> * Added autodisable_score feature to disable the OCR engine if the
> message has already enough points
>
>
> You can now obtain the plugin as a tarball, the download URL is at the
> end of the wiki page. (http://wiki.apache.org/spamassassin/FuzzyOcrPlugin)
>
> All new options in the config file, especially score adjustments for
> the new features, are explained there as well and in the sample cf file.
>   
Hi
I get the following warnings when linting:
[29661] warn: config: warning: description exists for non-existent rule 
FUZZY_OCR_CORRUPT_IMG
[29661] warn: config: warning: description exists for non-existent rule 
FUZZY_OCR_WRONG_CTYPE
[29661] warn: lint: 2 issues detected, please rerun with debug enabled 
for more information


Re: New version available

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Aug 10, 2006 at 04:42:00PM +0200, decoder wrote:
> Is there a way to make the OCR Plugin run as one of the last tests?

You can set priority for a rule (RTM for details).

-- 
Randomly Generated Tagline:
Is that seat saved?  No, but we are praying for it...

Re: New version available

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Coffey, Neal wrote:
> decoder wrote:
>> You can now obtain the plugin as a tarball, the download URL is
>> at the end of the wiki page.
>> (http://wiki.apache.org/spamassassin/FuzzyOcrPlugin)
>>
>
> I haven't installed this plugin on my systems yet, but I've been
> watching it pretty closely (and I'm sorely tempted to put it into
> place now). I noticed a couple of things on the wiki page that I
> wanted to comment on, though...
>
>> The variable $treshold is similarly adjusted with the
>> configuration file parameter focr_treshold.
>
> Nit-picky, but...shouldn't that be "threshold"?  I've made a lot of
>  configuration errors in the past when programs used a different
> spelling than expected.  (Especially greylist/graylist.)
You are absolutely right, the correct spelling is threshold and I will
change that in the next version.
>
>> focr_autodisable_score - If the message has already more points
>> than this value, then the plugin will cancel all further OCR
>> checking.
>
> Can SA plugins look up the required_score setting?  If so, can the
> focr_disable_score default to required_score if not explicitly set
> otherwise?

Yes this is possible. But you need to have in mind that, at the time
where the ocr plugin runs, not all tests have been made probably, and
some tests could throw a negative score again so the mail could
effectively go through as ham.

Is there a way to make the OCR Plugin run as one of the last tests?

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE20W4JQIKXnJyDxURAoChAJ9dsJbCCXm3yUekwsms9DmQeKajdwCgp6Wm
sVAdu8LVJPbh7Dt2QKmI7so=
=a9UW
-----END PGP SIGNATURE-----


RE: Content Filter problem

Posted by Thomas Lindell <tl...@adlmail.com>.
Thank you that seemed to do it

I had only done a reload after doing postsuper -r ALL

Stopping and starting fixed it


Thanks 

-----Original Message-----
From: Gary V [mailto:mr88talent@hotmail.com] 
Sent: Thursday, August 10, 2006 10:27 AM
To: users@spamassassin.apache.org
Subject: RE: Content Filter problem

>
>  My Log File is filling up with this error message
>
>warning: connect to transport spamassassin: Connection refused
>
>
>This is after I installed amavis
>
>I had previously had spamassassin working perfectly I have a working 
>installation but I am getting this warning.
>Spamassassin via amavis is working I think this is just left over from 
>the stand alone solution
>
>
>
>Any help?
>
>Tom
>

This is a postfix question. Assuming postfix settings reside in
/etc/postfix, start with

grep spamassassin /etc/postfix/*
grep amavis /etc/postfix/*

and show the results - so we get an idea of what you have set up.

If you have changed transports and have old mail trying to use a transport
that is no longer available, do:

postsuper -R all

to requeue the mail, then wait a few minutes (or more) for mail to process,
then stop and start postfix (not reload)

Gary V

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


RE: Content Filter problem

Posted by Gary V <mr...@hotmail.com>.
>
>  My Log File is filling up with this error message
>
>warning: connect to transport spamassassin: Connection refused
>
>
>This is after I installed amavis
>
>I had previously had spamassassin working perfectly
>I have a working installation but I am getting this warning.
>Spamassassin via amavis is working I think this is just left over from the
>stand alone solution
>
>
>
>Any help?
>
>Tom
>

This is a postfix question. Assuming postfix settings reside in 
/etc/postfix, start with

grep spamassassin /etc/postfix/*
grep amavis /etc/postfix/*

and show the results - so we get an idea of what you have set up.

If you have changed transports and have old mail trying to use a transport 
that is no longer available, do:

postsuper -R all

to requeue the mail, then wait a few minutes (or more) for mail to process, 
then stop and start postfix (not reload)

Gary V

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Content Filter problem

Posted by Thomas Lindell <tl...@adlmail.com>.
 My Log File is filling up with this error message

warning: connect to transport spamassassin: Connection refused


This is after I installed amavis

I had previously had spamassassin working perfectly 
I have a working installation but I am getting this warning.
Spamassassin via amavis is working I think this is just left over from the
stand alone solution



Any help?

Tom


RE: New version available

Posted by "Coffey, Neal" <nc...@langeveld.com>.
decoder wrote:
> You can now obtain the plugin as a tarball, the download URL is at the
> end of the wiki page.
> (http://wiki.apache.org/spamassassin/FuzzyOcrPlugin) 
> 

I haven't installed this plugin on my systems yet, but I've been
watching it pretty closely (and I'm sorely tempted to put it into place
now). I noticed a couple of things on the wiki page that I wanted to
comment on, though...

> The variable $treshold is similarly adjusted with the configuration
> file parameter focr_treshold. 

Nit-picky, but...shouldn't that be "threshold"?  I've made a lot of
configuration errors in the past when programs used a different spelling
than expected.  (Especially greylist/graylist.)

> focr_autodisable_score - If the message has already more points than
> this value, then the plugin will cancel all further OCR checking.

Can SA plugins look up the required_score setting?  If so, can the
focr_disable_score default to required_score if not explicitly set
otherwise?