You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by decoder <de...@own-hero.net> on 2006/12/10 16:53:55 UTC

FuzzyOcr 3.5.0 RC1 Test version released

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hello all,


after a long time of developing, we are proud to say that we can
release a first test version of the 3.5.x branch, which is also the
candidate for a next stable release. Please note that this is a
experimental release, use it at your own risk. Feedback is highly
appreciated, please report bugs using the Ticket System at our Trac,
which can be found at http://fuzzyocr.own-hero.net/


The release (3.5.0 RC1) can be found at the download page:
http://fuzzyocr.own-hero.net/wiki/Downloads

Changelog: http://fuzzyocr.own-hero.net/wiki/Changelog-3.x#version3.5.0

Installation instructions:
http://fuzzyocr.own-hero.net/wiki/Installation-3.5.x


Again, please note that these documents are all very new, they might
contain bugs. Please report them as well :) In the next 3 days, I
won't be at home so I can't fix bugs in that time, but I promise to
take care of everything that was reported/mailed/asked etc. when I am
back :)


Best regards,


Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFfC2SJQIKXnJyDxURAlUZAKDJIU6sZE91CW4FR/pgjq3XXB/WcACcDyNe
hJOwEqCt4JBTbZiW1m0ThE0=
=3hNG
-----END PGP SIGNATURE-----


Re: FuzzyOcr 3.5.0 RC1 Test version released

Posted by snowcrash <sc...@gmail.com>.
> buffer thing, 10 is the default but there is a typo in Config.pm
>
>   push (@cmds, {
>         setting => 'focr_autosort_buffer',
>         default => 10,
>         - type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
>         + type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
>     });

ok. i made the change as suggested.

re-enabled,

   - #focr_autosort_buffer 10
   + focr_autosort_buffer 10

in the .cf, and re-lint-ed

all's ok.

one very minor issue in the default .cf/doc

   # This is a parameter for the focr_autosort_scanset function, and specifies
   # the maximum value of the effectiveness counter used in each scanset. If you
   # increase this, it will take longer until the autosort function
adapts to new
   # types of spam, setting it too low will lower the effectiveness of the
   # function.
   # Default value: 10.
   focr_autosort_buffer 10

NOTE that Default value is doc'd as "10."

In my setup, i've just an intereger value, "10".

SHOULD there be a decimal point? does it matter?

> at /etc/mail/spamassassinFuzzyOcr/Config.pm line 680.
> >  [27949] warn: untie attempted while 1 inner references still exist
> > at /etc/mail/spamassassinFuzzyOcr/Config.pm line 701.
> These seem to be caused because I do both ->UnLock and then untie. The
> untie after the call to unlock might be unnecessary, the MLDBM::Sync
> module states that unlock does already an untie. I'll read into it
> though. :) But this shouldnt pose a problem.

thanks,

finally, one minor typo that showed up while grep'ing to change
hardcoded paths ...

in

   Utils/fuzzy-find

note

   fuzzy-find:    print "             Default:
/etc/mail/spamassasin/FuzzyOcr.cf\n";

should be

fuzzy-find:    print "             Default:
/etc/mail/spamassassin/FuzzyOcr.cf\n";
                 ^^^^

thanks!

Re: FuzzyOcr 3.5.0 RC1 Test version released

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Thank you very much, I'll looked at the warnings. About the autosort
buffer thing, 10 is the default but there is a typo in Config.pm

  push (@cmds, {
        setting => 'focr_autosort_buffer',
        default => 10,
        - type => $Mail::SpamAssassin::Conf::CONF_TYPE_BOOL
        + type => $Mail::SpamAssassin::Conf::CONF_TYPE_NUMERIC
    });

You can fix this if you want to change the value to experiment with
it... simply change the BOOL to NUMERIC

> looking at FuzzyOcr-3.5.0-rc1 tests in more depth, i see,
>
> spamassassin -t -x <
> /projects/FuzzyOcr-3.5.0-rc1/samples/ocr-animated.eml
>  [27949] warn: untie attempted while 1 inner references still exist
> at /etc/mail/spamassassinFuzzyOcr/Config.pm line 680.
>  [27949] warn: untie attempted while 1 inner references still exist
> at /etc/mail/spamassassinFuzzyOcr/Config.pm line 701.
These seem to be caused because I do both ->UnLock and then untie. The
untie after the call to unlock might be unnecessary, the MLDBM::Sync
module states that unlock does already an untie. I'll read into it
though. :) But this shouldnt pose a problem.


Best regards,


Chris
>
> spamassassin -t -x < /projects/FuzzyOcr-3.5.0-rc1/samples/ocr-gif.eml
>  [9041] warn: untie attempted while 1 inner references still exist at
> /etc/mail/spamassassin/FuzzyOcr/Config.pm line 680.
>  [9041] warn: untie attempted while 1 inner references still exist at
> /etc/mail/spamassassin/FuzzyOcr/Config.pm line 701.
>  [9041] warn: untie attempted while 1 inner references still exist at
> ../FuzzyOcr/Hashing.pm line 193.
>  [9041] warn: untie attempted while 1 inner references still exist at
> ../FuzzyOcr/Hashing.pm line 193.
>  [9041] warn: untie attempted while 1 inner references still exist at
> ../FuzzyOcr/Hashing.pm line 302.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFfI47JQIKXnJyDxURAhHvAJ0VHz6DsTvEmcM2CkDfCubLpBjcswCfe7rd
DiGjDUayRYUPFu20/tLfP28=
=hpCB
-----END PGP SIGNATURE-----


Re: Some further notes

Posted by decoder <de...@own-hero.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


snowcrash+spamassassin wrote:
> and, i still see,
>
> [9390] warn: (?:(?<=[\s,]))* matches null string many times in
> regex; marked by <-- HERE in m/\G(?:(?<=[\s,]))* <-- HERE \Z/ at
> privlib/Text/Wrap.pm line 47.

Seems to be a bug in Wrap.pm, it was reported earlier for SA:

http://www.nabble.com/-Bug-5056---New:-(-:(-%3C%3D-%5C%5Cs,-))*-matches-null-string-many-times-in-regex--marked-by-%3C---HERE-in-m-%5C%5CG(-:(-%3C%3D-%5C%5Cs,-))*-%3C---HERE-%5C%5CZ--at--usr-lib64-perl5-5.8.8-Text-Wrap.pm-line-46.-t2134161.html


long url ;D but exactly the same error...

In some seconds, I'll put up a patch against RC1 which fixes currently
known bugs :)


Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFfKRcJQIKXnJyDxURAiahAKCyq5FNhDi2ZjTA2XB7Y/32dV0nXACfXc42
I2NybRTH0ijmVcRGRhFecCE=
=GHNA
-----END PGP SIGNATURE-----


Re: FuzzyOcr 3.5.0 RC1 Test version released

Posted by snowcrash+spamassassin <sc...@gmail.com>.
looking at FuzzyOcr-3.5.0-rc1 tests in more depth, i see,

spamassassin -t -x < /projects/FuzzyOcr-3.5.0-rc1/samples/ocr-animated.eml
  [27949] warn: untie attempted while 1 inner references still exist
at /etc/mail/spamassassinFuzzyOcr/Config.pm line 680.
  [27949] warn: untie attempted while 1 inner references still exist
at /etc/mail/spamassassinFuzzyOcr/Config.pm line 701.

spamassassin -t -x < /projects/FuzzyOcr-3.5.0-rc1/samples/ocr-gif.eml
  [9041] warn: untie attempted while 1 inner references still exist at
/etc/mail/spamassassin/FuzzyOcr/Config.pm line 680.
  [9041] warn: untie attempted while 1 inner references still exist at
/etc/mail/spamassassin/FuzzyOcr/Config.pm line 701.
  [9041] warn: untie attempted while 1 inner references still exist at
../FuzzyOcr/Hashing.pm line 193.
  [9041] warn: untie attempted while 1 inner references still exist at
../FuzzyOcr/Hashing.pm line 193.
  [9041] warn: untie attempted while 1 inner references still exist at
../FuzzyOcr/Hashing.pm line 302.

Re: FuzzyOcr 3.5.0 RC1 Test version released

Posted by snowcrash+spamassassin <sc...@gmail.com>.
(focr_autosort_buffer --lints as invalid)

updated to latest FuzzyOcr-3.5.0-rc1

changed configs appropriately. also changed instances of hard-coded default,

	/etc/mail/spamassassin/

to point to my spamassassin build's config dir path.

on --lint, i get one warn/fail,

	[6155] warn: config: SpamAssassin failed to parse line, "10." is not
valid for "focr_autosort_buffer", skipping: focr_autosort_buffer 10.

noting in my config,

	# This is a parameter for the focr_autosort_scanset function, and specifi
	# the maximum value of the effectiveness counter used in each scanset. If
	# increase this, it will take longer until the autosort function adapts t
	# types of spam, setting it too low will lower the effectiveness of the
	# function.
	# Default value: 10.
	focr_autosort_buffer 10.

if i DISable the config item,

	-	focr_autosort_buffer 10.
	+	#focr_autosort_buffer 10.

now --lints without failure.

and a 1st test with,

	spamassassin -D -t -x < /tmp/ocr-gif.eml

seems to exec ok.

Content analysis details:   (24.9 points, 4.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 2.0 RELAY_TW               Relayed through Taiwan
 0.0 RELAY_DE               Relayed through Germany
 0.0 DK_POLICY_SIGNSOME     Domain Keys: policy says domain signs some mails
 0.0 BOTNET_CLIENTWORDS     Hostname contains client-like substrings
 0.0 BOTNET_IPINHOSTNAME    Hostname contains its own IP address
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.0 BOTNET_CLIENT          Hostname looks like a client hostname
 0.9 MY_CID_AND_CLOSING     SARE cid and closing
 1.0 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
 5.0 BOTNET                 The submitting mail server looks like part
of a Botnet
 1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Img with wrong content-type set
                            Image has format "GIF" but content-type is
                            "image/jpeg"
 2.5 FUZZY_OCR_CORRUPT_IMG  BODY: Corrupted img
                            Corrupt image: GIF-LIB error: Image is
                            defective, decoding aborted.
  12 FUZZY_OCR_KNOWN_HASH   BODY: Image with known hash
                            Words found:
                            "target" in 1 lines
                            "service" in 1 lines
                            "stock" in 2 lines
                            "price" in 2 lines
                            "company" in 1 lines
                            "alert" in 1 lines
                            "trade" in 1 lines
                            "recommendation" in 1 lines
                            (10 word occurrences found)