You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by decoder <de...@own-hero.net> on 2006/12/10 16:53:55 UTC
FuzzyOcr 3.5.0 RC1 Test version released
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello all,
after a long time of developing, we are proud to say that we can
release a first test version of the 3.5.x branch, which is also the
candidate for a next stable release. Please note that this is a
experimental release, use it at your own risk. Feedback is highly
appreciated, please report bugs using the Ticket System at our Trac,
which can be found at http://fuzzyocr.own-hero.net/
The release (3.5.0 RC1) can be found at the download page:
http://fuzzyocr.own-hero.net/wiki/Downloads
Changelog: http://fuzzyocr.own-hero.net/wiki/Changelog-3.x#version3.5.0
Installation instructions:
http://fuzzyocr.own-hero.net/wiki/Installation-3.5.x
Again, please note that these documents are all very new, they might
contain bugs. Please report them as well :) In the next 3 days, I
won't be at home so I can't fix bugs in that time, but I promise to
take care of everything that was reported/mailed/asked etc. when I am
back :)
Best regards,
Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFfC2SJQIKXnJyDxURAlUZAKDJIU6sZE91CW4FR/pgjq3XXB/WcACcDyNe
hJOwEqCt4JBTbZiW1m0ThE0=
=3hNG
-----END PGP SIGNATURE-----
Re: FuzzyOcr 3.5.0 RC1 Test version released
Posted by snowcrash <sc...@gmail.com>.
> buffer thing, 10 is the default but there is a typo in Config.pm
>
> push (@cmds, {
> setting => 'focr_autosort_buffer',
> default => 10,
> - type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
> + type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
> });
ok. i made the change as suggested.
re-enabled,
- #focr_autosort_buffer 10
+ focr_autosort_buffer 10
in the .cf, and re-lint-ed
all's ok.
one very minor issue in the default .cf/doc
# This is a parameter for the focr_autosort_scanset function, and specifies
# the maximum value of the effectiveness counter used in each scanset. If you
# increase this, it will take longer until the autosort function
adapts to new
# types of spam, setting it too low will lower the effectiveness of the
# function.
# Default value: 10.
focr_autosort_buffer 10
NOTE that Default value is doc'd as "10."
In my setup, i've just an intereger value, "10".
SHOULD there be a decimal point? does it matter?
> at /etc/mail/spamassassinFuzzyOcr/Config.pm line 680.
> > [27949] warn: untie attempted while 1 inner references still exist
> > at /etc/mail/spamassassinFuzzyOcr/Config.pm line 701.
> These seem to be caused because I do both ->UnLock and then untie. The
> untie after the call to unlock might be unnecessary, the MLDBM::Sync
> module states that unlock does already an untie. I'll read into it
> though. :) But this shouldnt pose a problem.
thanks,
finally, one minor typo that showed up while grep'ing to change
hardcoded paths ...
in
Utils/fuzzy-find
note
fuzzy-find: print " Default:
/etc/mail/spamassasin/FuzzyOcr.cf\n";
should be
fuzzy-find: print " Default:
/etc/mail/spamassassin/FuzzyOcr.cf\n";
^^^^
thanks!
Re: FuzzyOcr 3.5.0 RC1 Test version released
Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Thank you very much, I'll looked at the warnings. About the autosort
buffer thing, 10 is the default but there is a typo in Config.pm
push (@cmds, {
setting => 'focr_autosort_buffer',
default => 10,
- type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
+ type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
});
You can fix this if you want to change the value to experiment with
it... simply change the BOOL to NUMERIC
> looking at FuzzyOcr-3.5.0-rc1 tests in more depth, i see,
>
> spamassassin -t -x <
> /projects/FuzzyOcr-3.5.0-rc1/samples/ocr-animated.eml
> [27949] warn: untie attempted while 1 inner references still exist
> at /etc/mail/spamassassinFuzzyOcr/Config.pm line 680.
> [27949] warn: untie attempted while 1 inner references still exist
> at /etc/mail/spamassassinFuzzyOcr/Config.pm line 701.
These seem to be caused because I do both ->UnLock and then untie. The
untie after the call to unlock might be unnecessary, the MLDBM::Sync
module states that unlock does already an untie. I'll read into it
though. :) But this shouldnt pose a problem.
Best regards,
Chris
>
> spamassassin -t -x < /projects/FuzzyOcr-3.5.0-rc1/samples/ocr-gif.eml
> [9041] warn: untie attempted while 1 inner references still exist at
> /etc/mail/spamassassin/FuzzyOcr/Config.pm line 680.
> [9041] warn: untie attempted while 1 inner references still exist at
> /etc/mail/spamassassin/FuzzyOcr/Config.pm line 701.
> [9041] warn: untie attempted while 1 inner references still exist at
> ../FuzzyOcr/Hashing.pm line 193.
> [9041] warn: untie attempted while 1 inner references still exist at
> ../FuzzyOcr/Hashing.pm line 193.
> [9041] warn: untie attempted while 1 inner references still exist at
> ../FuzzyOcr/Hashing.pm line 302.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFfI47JQIKXnJyDxURAhHvAJ0VHz6DsTvEmcM2CkDfCubLpBjcswCfe7rd
DiGjDUayRYUPFu20/tLfP28=
=hpCB
-----END PGP SIGNATURE-----
Re: Some further notes
Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
snowcrash+spamassassin wrote:
> and, i still see,
>
> [9390] warn: (?:(?<=[\s,]))* matches null string many times in
> regex; marked by <-- HERE in m/\G(?:(?<=[\s,]))* <-- HERE \Z/ at
> privlib/Text/Wrap.pm line 47.
Seems to be a bug in Wrap.pm, it was reported earlier for SA:
http://www.nabble.com/-Bug-5056---New:-(-:(-%3C%3D-%5C%5Cs,-))*-matches-null-string-many-times-in-regex--marked-by-%3C---HERE-in-m-%5C%5CG(-:(-%3C%3D-%5C%5Cs,-))*-%3C---HERE-%5C%5CZ--at--usr-lib64-perl5-5.8.8-Text-Wrap.pm-line-46.-t2134161.html
long url ;D but exactly the same error...
In some seconds, I'll put up a patch against RC1 which fixes currently
known bugs :)
Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFfKRcJQIKXnJyDxURAiahAKCyq5FNhDi2ZjTA2XB7Y/32dV0nXACfXc42
I2NybRTH0ijmVcRGRhFecCE=
=GHNA
-----END PGP SIGNATURE-----
Re: FuzzyOcr 3.5.0 RC1 Test version released
Posted by snowcrash+spamassassin <sc...@gmail.com>.
looking at FuzzyOcr-3.5.0-rc1 tests in more depth, i see,
spamassassin -t -x < /projects/FuzzyOcr-3.5.0-rc1/samples/ocr-animated.eml
[27949] warn: untie attempted while 1 inner references still exist
at /etc/mail/spamassassinFuzzyOcr/Config.pm line 680.
[27949] warn: untie attempted while 1 inner references still exist
at /etc/mail/spamassassinFuzzyOcr/Config.pm line 701.
spamassassin -t -x < /projects/FuzzyOcr-3.5.0-rc1/samples/ocr-gif.eml
[9041] warn: untie attempted while 1 inner references still exist at
/etc/mail/spamassassin/FuzzyOcr/Config.pm line 680.
[9041] warn: untie attempted while 1 inner references still exist at
/etc/mail/spamassassin/FuzzyOcr/Config.pm line 701.
[9041] warn: untie attempted while 1 inner references still exist at
../FuzzyOcr/Hashing.pm line 193.
[9041] warn: untie attempted while 1 inner references still exist at
../FuzzyOcr/Hashing.pm line 193.
[9041] warn: untie attempted while 1 inner references still exist at
../FuzzyOcr/Hashing.pm line 302.
Re: FuzzyOcr 3.5.0 RC1 Test version released
Posted by snowcrash+spamassassin <sc...@gmail.com>.
(focr_autosort_buffer --lints as invalid)
updated to latest FuzzyOcr-3.5.0-rc1
changed configs appropriately. also changed instances of hard-coded default,
/etc/mail/spamassassin/
to point to my spamassassin build's config dir path.
on --lint, i get one warn/fail,
[6155] warn: config: SpamAssassin failed to parse line, "10." is not
valid for "focr_autosort_buffer", skipping: focr_autosort_buffer 10.
noting in my config,
# This is a parameter for the focr_autosort_scanset function, and specifi
# the maximum value of the effectiveness counter used in each scanset. If
# increase this, it will take longer until the autosort function adapts t
# types of spam, setting it too low will lower the effectiveness of the
# function.
# Default value: 10.
focr_autosort_buffer 10.
if i DISable the config item,
- focr_autosort_buffer 10.
+ #focr_autosort_buffer 10.
now --lints without failure.
and a 1st test with,
spamassassin -D -t -x < /tmp/ocr-gif.eml
seems to exec ok.
Content analysis details: (24.9 points, 4.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
2.0 RELAY_TW Relayed through Taiwan
0.0 RELAY_DE Relayed through Germany
0.0 DK_POLICY_SIGNSOME Domain Keys: policy says domain signs some mails
0.0 BOTNET_CLIENTWORDS Hostname contains client-like substrings
0.0 BOTNET_IPINHOSTNAME Hostname contains its own IP address
0.0 HTML_MESSAGE BODY: HTML included in message
0.0 BOTNET_CLIENT Hostname looks like a client hostname
0.9 MY_CID_AND_CLOSING SARE cid and closing
1.0 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
5.0 BOTNET The submitting mail server looks like part
of a Botnet
1.5 FUZZY_OCR_WRONG_CTYPE BODY: Img with wrong content-type set
Image has format "GIF" but content-type is
"image/jpeg"
2.5 FUZZY_OCR_CORRUPT_IMG BODY: Corrupted img
Corrupt image: GIF-LIB error: Image is
defective, decoding aborted.
12 FUZZY_OCR_KNOWN_HASH BODY: Image with known hash
Words found:
"target" in 1 lines
"service" in 1 lines
"stock" in 2 lines
"price" in 2 lines
"company" in 1 lines
"alert" in 1 lines
"trade" in 1 lines
"recommendation" in 1 lines
(10 word occurrences found)