You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2007/01/25 18:57:18 UTC

NOTICE: 3.2.0 rescoring mass-checks

hi all --

OK, if you're planning to send us mass-check logs for the 3.2.0 rescoring,
now's the time!

http://wiki.apache.org/spamassassin/RescoreDetails has all the details.

Note that the deadline for result submission is Tuesday, Feb 6 as
described at http://wiki.apache.org/spamassassin/Release320Schedule .

cheers!

--j.

Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by Doc Schneider <ma...@maddoc.net>.
Fred Tarasevicius wrote:
> Hello Justin,
> 
> Thursday, January 25, 2007, 12:57:18 PM, you wrote:
> 
>> hi all --
> 
>> OK, if you're planning to send us mass-check logs for the 3.2.0 rescoring,
>> now's the time!
> 
> OK, so we can start running the tests now?  To ensure I am correct at
> how to go about this, we just svn update the latest release, start the
> mass-checks as outlined on the wiki page and send away when we are
> done?
> 

Nope you need to go to the wiki page he said. There is a custom tarball 
for masschecking.

See:

http://wiki.apache.org/spamassassin/RescoreDetails


-- 

  -Doc

  Penguins: Do it on the ice.
    1:04pm  up 11 days, 22:02, 15 users,  load average: 0.34, 0.50, 0.56

  SARE HQ  http://www.rulesemporium.com/

Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by Fred Tarasevicius <te...@i-is.com>.
Hello Justin,

Thursday, January 25, 2007, 12:57:18 PM, you wrote:

> hi all --

> OK, if you're planning to send us mass-check logs for the 3.2.0 rescoring,
> now's the time!

OK, so we can start running the tests now?  To ensure I am correct at
how to go about this, we just svn update the latest release, start the
mass-checks as outlined on the wiki page and send away when we are
done?

-- 
Best regards,
 Fred                            mailto:tech2@i-is.com


Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by Matthias Leisi <ma...@leisi.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

My mass-check run shows a considerable number of bayes/locking errors:

| bayes: cannot open bayes databases
| /opt/masscheck-3.2.0/mcsnapshot/masses/spamassassin/bayes_* R/W: lock
| failed: Interrupted system call

nohup.out has 118 such entries for 17'912 mails done. Is this something
I should be worried about or even worthy of opening a bug?

It's a bit hard for me to diagnose this in more details, as I don't want
to break the actual mass-check run. I'm running it with

| nohup ./mass-check --progress --bayes --net -j 4 --restart=400 \
|    --learn=35 --reuse --after=1072933200 \
|    spam:dir:...

- -- Matthias
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFFump+xbHw2nyi/okRAlE1AJ9XeguMdklC2JgjE8NGkTM/g+e9xQCgiKs6
Ow+hx/2QnybFWIxWFmjw8fk=
=4SPo
-----END PGP SIGNATURE-----

Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by Doc Schneider <ma...@maddoc.net>.
Matthias Leisi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi all,
> 
> As you may know, I'm running the dnswl.org project. Thanks to the rules
> in Theo's sandbox [1], the mass-checks will query dnswl.org.
> 
> In order to estimate the performance / bandwidth impact on the server
> side, I would ask you to provide me with the name / IP address of the
> DNS servers you use to run the mass-checks and roughly the date/time
> (incl. timezone) when you started the mass-check.
> 
> Only one of the servers is writing detailed logs, and this will not
> influence the actual mass-check results, but it is an opportunity to
> assess the SpamAssassin-related impact.
> 
> Thanks for your help,
> - -- Matthias
> 
> [1]
> http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/felicity/70_dnswl.cf?view=markup
>

I'm using a machine in the 64.21.208.208/28 netblock. I haven't yet 
decided which one I'll be using... am still trying to sort through my 
500k spam.

Just curious if this is a new network test and list? If so, you might 
see about finding some more DNS mirrors to host it. (Yeah I might be 
interested in doing this, contact me privately)
-- 

  -Doc

  SA/SARE -- Ninja
   10:08am  up 12 days, 19:06, 15 users,  load average: 0.41, 1.74, 1.52

  SARE HQ  http://www.rulesemporium.com/

Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by Matthias Leisi <ma...@leisi.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

As you may know, I'm running the dnswl.org project. Thanks to the rules
in Theo's sandbox [1], the mass-checks will query dnswl.org.

In order to estimate the performance / bandwidth impact on the server
side, I would ask you to provide me with the name / IP address of the
DNS servers you use to run the mass-checks and roughly the date/time
(incl. timezone) when you started the mass-check.

Only one of the servers is writing detailed logs, and this will not
influence the actual mass-check results, but it is an opportunity to
assess the SpamAssassin-related impact.

Thanks for your help,
- -- Matthias

[1]
http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/felicity/70_dnswl.cf?view=markup

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFFug+QxbHw2nyi/okRAgL3AJ4nbLi65IIMja5GTZZXG8DkTeDSbwCgpu4E
6/OmkA0qHu1p5n22hT6TrYE=
=ru0Y
-----END PGP SIGNATURE-----

Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by Michael Parker <pa...@pobox.com>.
Daryl C. W. O'Shea wrote:
> Justin Mason wrote:
>> hi all --
>>
>> OK, if you're planning to send us mass-check logs for the 3.2.0
>> rescoring,
>> now's the time!
>>
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
> 
> Why do the instructions have bayes auto learning and AWL turned off?
> 

AWL isn't scored, so no need to slow things down with the DB interaction.

The --learn=35 does 35% random bayes learning, which simulates human
learning so auto learn is turned off.

Michael

>   echo "bayes_auto_learn 0" > spamassassin/user_prefs
>   echo "lock_method flock" >> spamassassin/user_prefs
>   echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >>
> spamassassin/user_prefs
>   echo "use_auto_whitelist 0" >> spamassassin/user_prefs
>   echo "whitelist_bounce_relays example.com" >> spamassassin/user_prefs
> 
> 
> Daryl
> 


Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Nevermind... it looks like my Meridian phone system isn't the only thing 
that lost its memory today.


Daryl C. W. O'Shea wrote:
> Justin Mason wrote:
>> hi all --
>>
>> OK, if you're planning to send us mass-check logs for the 3.2.0 
>> rescoring,
>> now's the time!
>>
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
> 
> Why do the instructions have bayes auto learning and AWL turned off?
> 
>   echo "bayes_auto_learn 0" > spamassassin/user_prefs
>   echo "lock_method flock" >> spamassassin/user_prefs
>   echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> 
> spamassassin/user_prefs
>   echo "use_auto_whitelist 0" >> spamassassin/user_prefs
>   echo "whitelist_bounce_relays example.com" >> spamassassin/user_prefs
> 
> 
> Daryl
> 


RE: NOTICE: 3.2.0 rescoring mass-checks

Posted by Giampaolo Tomassoni <g....@libero.it>.
From: Theo Van Dinter [mailto:felicity@apache.org]
> 
> On Mon, Jan 29, 2007 at 10:32:43PM +0100, Giampaolo Tomassoni wrote:
> > > Why do the instructions have bayes auto learning and AWL turned off?
> > 
> > I guess because mass-check logs must be based on an absolute 
> basis: two copies of the very same e-mail checked at beginning 
> and at end of the list shall score the same. This wouldn't hold 
> with AWK and bayes auto-learning.
> 
> Messages aren't going to score the same at the beginning and end 
> with Bayes.
> The idea is that you *want* to learn from mails as they go through.
> 
> The reasons are:
> 
> a) AWL is meaningless for score runs, so don't bother.
> b) "mass-check --learn" forces autolearning in mass-check on a percentage
>    basis, versus the normal autolearn system which is just based on
>    score -- which aren't set yet.

Ok, I guess I should try this tool. At least, it would avoid a lot of "guessing" from me... :)

Thanks,

Giampaolo

> 
> -- 
> Randomly Selected Tagline:
> The following two statements are usually both true:
>  	There's not enough documentation.
>  	There's too much documentation.
>               -- Larry Wall in <19...@wall.org>
> 


Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Jan 29, 2007 at 10:32:43PM +0100, Giampaolo Tomassoni wrote:
> > Why do the instructions have bayes auto learning and AWL turned off?
> 
> I guess because mass-check logs must be based on an absolute basis: two copies of the very same e-mail checked at beginning and at end of the list shall score the same. This wouldn't hold with AWK and bayes auto-learning.

Messages aren't going to score the same at the beginning and end with Bayes.
The idea is that you *want* to learn from mails as they go through.

The reasons are:

a) AWL is meaningless for score runs, so don't bother.
b) "mass-check --learn" forces autolearning in mass-check on a percentage
   basis, versus the normal autolearn system which is just based on
   score -- which aren't set yet.

-- 
Randomly Selected Tagline:
The following two statements are usually both true:
 	There's not enough documentation.
 	There's too much documentation.
              -- Larry Wall in <19...@wall.org>

RE: NOTICE: 3.2.0 rescoring mass-checks

Posted by Giampaolo Tomassoni <g....@libero.it>.
From: Daryl C. W. O'Shea [mailto:spamassassin@dostech.ca]
> 
> Justin Mason wrote:
> > hi all --
> > 
> > OK, if you're planning to send us mass-check logs for the 3.2.0 
> rescoring,
> > now's the time!
> > 
> > http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
> 
> Why do the instructions have bayes auto learning and AWL turned off?

I guess because mass-check logs must be based on an absolute basis: two copies of the very same e-mail checked at beginning and at end of the list shall score the same. This wouldn't hold with AWK and bayes auto-learning.

Cheers,

Giampaolo


> 
>    echo "bayes_auto_learn 0" > spamassassin/user_prefs
>    echo "lock_method flock" >> spamassassin/user_prefs
>    echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> 
> spamassassin/user_prefs
>    echo "use_auto_whitelist 0" >> spamassassin/user_prefs
>    echo "whitelist_bounce_relays example.com" >> spamassassin/user_prefs
> 
> 
> Daryl


Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Justin Mason wrote:
> hi all --
> 
> OK, if you're planning to send us mass-check logs for the 3.2.0 rescoring,
> now's the time!
> 
> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.

Why do the instructions have bayes auto learning and AWL turned off?

   echo "bayes_auto_learn 0" > spamassassin/user_prefs
   echo "lock_method flock" >> spamassassin/user_prefs
   echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> 
spamassassin/user_prefs
   echo "use_auto_whitelist 0" >> spamassassin/user_prefs
   echo "whitelist_bounce_relays example.com" >> spamassassin/user_prefs


Daryl

Re: NOTICE: 3.2.0 rescoring mass-checks

Posted by Fred Tarasevicius <te...@i-is.com>.
Hello Justin,

Thursday, January 25, 2007, 12:57:18 PM, you wrote:

> hi all --

> OK, if you're planning to send us mass-check logs for the 3.2.0 rescoring,
> now's the time!

OK, so we can start running the tests now?  To ensure I am correct at
how to go about this, we just svn update the latest release, start the
mass-checks as outlined on the wiki page and send away when we are
done?

-- 
Best regards,
 Fred                            mailto:tech2@i-is.com