You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2009/09/04 16:51:33 UTC

NOTICE: SpamAssassin 3.3.0 mass-checks now starting

OK, if you're planning to send us mass-check logs for the
3.3.0 rescoring, now's the time!

http://wiki.apache.org/spamassassin/RescoreDetails has all the details.

cheers!

--j.

Re: SpamAssassin 3.3.0 mass-checks now starting

Posted by Mark Martinec <Ma...@ijs.si>.
> OK, if you're planning to send us mass-check logs for the
> 3.3.0 rescoring, now's the time!
> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.

Docs don't say where one is supposed to put a local.cf with
options which are ignored in masses/spamassassin/user_prefs
(like Bayes SQL options, DCC, Pyzor timeouts etc).

I tried to place local.cf into masses/spamassassin/, with
horror results (some directives in local.cf proclaimed as
invalid, as apparently plugins have not yet been loaded
at the time of parsing this file, but only later).

I finally placed it into ../rules/ as mylocal.cf, which
finally works as expected, but I wonder if the is the proper
solution. Should be documented I guess...

  Mark

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Jason Bertoch <ja...@i6ix.com>.
Warren Togami wrote:
> One day from the deadline for spamassassin-3.3.0 scoring and we 
> currently have only three people reporting.
>
Would love to help, but my current mail environment isn't conducive to 
keeping a corpus of mail to feed the tests.


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Henrik Krohns <he...@hege.li>.
On Sat, Sep 19, 2009 at 03:33:51PM -0400, Warren Togami wrote:
> On 09/16/2009 11:47 AM, Warren Togami wrote:
>> On 09/04/2009 10:51 AM, Justin Mason wrote:
>>> OK, if you're planning to send us mass-check logs for the
>>> 3.3.0 rescoring, now's the time!
>>>
>>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>>
>>> cheers!
>>>
>>> --j.
>>
>> -rw-r--r-- 174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
>> -rw-r--r-- 36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
>> -rw-r--r-- 3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
>> -rw-r--r-- 1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
>> -rw-r--r-- 5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
>> -rw-r--r-- 354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
>> -rw-r--r-- 575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
>> -rw-r--r-- 2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
>> -rw-r--r-- 40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
>> -rw-r--r-- 35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
>> -rw-r--r-- 4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
>> -rw-r--r-- 1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
>> -rw-r--r-- 310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
>> -rw-r--r-- 494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
>> -rw-r--r-- 79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
>> -rw-r--r-- 311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log
>>
>> One day from the deadline for spamassassin-3.3.0 scoring and we
>> currently have only three people reporting.
>
> The deadline has been extended until Monday, September 21st.  But at  
> this moment the number of logs reporting for the rescore masscheck has  
> not changed.
>
> Are the uploaded corpa being processed?
>
> Who else is still working on their own corpus?

Hopefully I'll have my new stuff in time there. I'm separating all the
Finnish stuff (it's over 70%) to own corpus. I guess it's much more useful
that way.


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 19/09/2009 3:33 PM, Warren Togami wrote:
> On 09/16/2009 11:47 AM, Warren Togami wrote:
>> On 09/04/2009 10:51 AM, Justin Mason wrote:
>>> OK, if you're planning to send us mass-check logs for the
>>> 3.3.0 rescoring, now's the time!
>>>
>>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>>
>>> cheers!
>>>
>>> --j.
>>
>> -rw-r--r-- 174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
>> -rw-r--r-- 36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
>> -rw-r--r-- 3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
>> -rw-r--r-- 1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
>> -rw-r--r-- 5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
>> -rw-r--r-- 354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
>> -rw-r--r-- 575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
>> -rw-r--r-- 2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
>> -rw-r--r-- 40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
>> -rw-r--r-- 35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
>> -rw-r--r-- 4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
>> -rw-r--r-- 1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
>> -rw-r--r-- 310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
>> -rw-r--r-- 494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
>> -rw-r--r-- 79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
>> -rw-r--r-- 311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log
>>
>> One day from the deadline for spamassassin-3.3.0 scoring and we
>> currently have only three people reporting.
> 
> The deadline has been extended until Monday, September 21st.  But at
> this moment the number of logs reporting for the rescore masscheck has
> not changed.
> 
> Are the uploaded corpa being processed?

They'll all be processed together when its declared that time to submit
has expired.

> Who else is still working on their own corpus?

Due to unreleated to SA memory leaks in haldaemon on my machines, and me
not noticing and instead fighting with Perl to build modules, I'm just
starting my mass-check now.

I imagine that it will be sometime Tuesday after work before I have
results submitted.

Daryl


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 19/09/2009 3:33 PM, Warren Togami wrote:
> On 09/16/2009 11:47 AM, Warren Togami wrote:
>> On 09/04/2009 10:51 AM, Justin Mason wrote:
>>> OK, if you're planning to send us mass-check logs for the
>>> 3.3.0 rescoring, now's the time!
>>>
>>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>>
>>> cheers!
>>>
>>> --j.
>>
>> -rw-r--r-- 174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
>> -rw-r--r-- 36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
>> -rw-r--r-- 3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
>> -rw-r--r-- 1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
>> -rw-r--r-- 5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
>> -rw-r--r-- 354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
>> -rw-r--r-- 575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
>> -rw-r--r-- 2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
>> -rw-r--r-- 40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
>> -rw-r--r-- 35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
>> -rw-r--r-- 4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
>> -rw-r--r-- 1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
>> -rw-r--r-- 310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
>> -rw-r--r-- 494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
>> -rw-r--r-- 79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
>> -rw-r--r-- 311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log
>>
>> One day from the deadline for spamassassin-3.3.0 scoring and we
>> currently have only three people reporting.
> 
> The deadline has been extended until Monday, September 21st.  But at
> this moment the number of logs reporting for the rescore masscheck has
> not changed.
> 
> Are the uploaded corpa being processed?

They'll all be processed together when its declared that time to submit
has expired.

> Who else is still working on their own corpus?

Due to unreleated to SA memory leaks in haldaemon on my machines, and me
not noticing and instead fighting with Perl to build modules, I'm just
starting my mass-check now.

I imagine that it will be sometime Tuesday after work before I have
results submitted.

Daryl


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/16/2009 11:47 AM, Warren Togami wrote:
> On 09/04/2009 10:51 AM, Justin Mason wrote:
>> OK, if you're planning to send us mass-check logs for the
>> 3.3.0 rescoring, now's the time!
>>
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>
>> cheers!
>>
>> --j.
>
> -rw-r--r-- 174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
> -rw-r--r-- 36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
> -rw-r--r-- 3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
> -rw-r--r-- 1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
> -rw-r--r-- 5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
> -rw-r--r-- 354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
> -rw-r--r-- 575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
> -rw-r--r-- 2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
> -rw-r--r-- 40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
> -rw-r--r-- 35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
> -rw-r--r-- 4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
> -rw-r--r-- 1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
> -rw-r--r-- 310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
> -rw-r--r-- 494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
> -rw-r--r-- 79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
> -rw-r--r-- 311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log
>
> One day from the deadline for spamassassin-3.3.0 scoring and we
> currently have only three people reporting.

The deadline has been extended until Monday, September 21st.  But at 
this moment the number of logs reporting for the rescore masscheck has 
not changed.

Are the uploaded corpa being processed?

Who else is still working on their own corpus?

Warren Togami
wtogami@redhat.com

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Henrik K <he...@hege.li>.
On Thu, Sep 17, 2009 at 02:34:24PM +0200, Mark Martinec wrote:
> Austin,
> 
> > > now hope to do this Thursday/Friday.  I should be able to scan my
> > > million or so messages in a day on my cluster.
> > 
> > Wow, that makes me feel inadequate :)  I'm struggling to clean up my
> > little ham sample of 3600 messages, and looking at another couple
> > thousand that I'll do if I've got time...
> 
> Thanks, that will be nice to have. As the rulesqa site can distinguish
> results based on a corpus submitter, even a small but carefully checked
> collection is worth having.
> 
> I found it valuable to double check ham samples which fire rules
> URIBL_JP_SURBL, URIBL_WS_SURBL, URIBL_OB_SURBL,
> RCVD_IN_PBL, RCVD_IN_XBL, RCVD_IN_PSBL, RCVD_IN_SSBL

There's lots that one can do..

- analyze corpuses through dspam_train, spots misfiles quite nicely (might
  also use crm114, haven't tried)

- clamscan hams with sanesecurity etc

- grep ham/spam.log for rules with S/O >= ~0.98 (most likely includes all
  that Marc said and more)

- grep Subjects from spams and grep all those from ham (and vice versa)

- fuzzily hash duplicate mails away, so miscategoried mails have smaller
  effect on the totals (or does it make good rules seem worse? heh..), you
  can also spot similar mails that are in both ham+spam for double checking

Sadly I don't have a cleanly defined process yet, it's all scripts and
memorized one-liners. Finding FPs from spam-corpus is more important but
harder..


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/17/2009 08:34 AM, Mark Martinec wrote:
> Austin,
>
>>> now hope to do this Thursday/Friday.  I should be able to scan my
>>> million or so messages in a day on my cluster.
>>
>> Wow, that makes me feel inadequate :)  I'm struggling to clean up my
>> little ham sample of 3600 messages, and looking at another couple
>> thousand that I'll do if I've got time...
>
> Thanks, that will be nice to have. As the rulesqa site can distinguish
> results based on a corpus submitter, even a small but carefully checked
> collection is worth having.
>
> I found it valuable to double check ham samples which fire rules
> URIBL_JP_SURBL, URIBL_WS_SURBL, URIBL_OB_SURBL,
> RCVD_IN_PBL, RCVD_IN_XBL, RCVD_IN_PSBL, RCVD_IN_SSBL

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6156
Be aware that gmail, yahoo.co.jp and rr.com were whitelisted from new 
inclusion only 5 days ago.  IP's from prior could still be listed before 
the 2 week timeout.  Auto-whitelisting of yahoo.com is not yet 
implemented.  riel is working on DKIM checking in order to whitelist 
yahoo.com.

FP's of PSBL are already rare, but they should become rarer.

Please let us know if you see FP's from a legitimate ISP MTA server. 
That MTA can be whitelisted from PSBL by either listing itself in DNSWL, 
or letting us know to check it by SPF or DKIM.

Warren Togami
wtogami@redhat.com

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Mark Martinec <Ma...@ijs.si>.
Austin,

> > now hope to do this Thursday/Friday.  I should be able to scan my
> > million or so messages in a day on my cluster.
> 
> Wow, that makes me feel inadequate :)  I'm struggling to clean up my
> little ham sample of 3600 messages, and looking at another couple
> thousand that I'll do if I've got time...

Thanks, that will be nice to have. As the rulesqa site can distinguish
results based on a corpus submitter, even a small but carefully checked
collection is worth having.

I found it valuable to double check ham samples which fire rules
URIBL_JP_SURBL, URIBL_WS_SURBL, URIBL_OB_SURBL,
RCVD_IN_PBL, RCVD_IN_XBL, RCVD_IN_PSBL, RCVD_IN_SSBL

> Also, I need some advice, if someone can provide it.  I'm looking at a
> message (and I have several like this in my corpus at present) which
> generates the following log line
> 
> .  1 /home/gems/ham//cur/n8500ejj019591:2,S
> MISSING_DATE,MISSING_HEADERS,MISSING_MID,T_FSL_HELO_NON_FQDN_2,__DKIM_DEPEN
> DABLE,__DNS_FROM_RFC_ABUSE,__DOS_DIRECT_TO_MX,__DOS_HAS_ANY_URI,__DOS_RCVD_
> FRI,__DOS_SINGLE_EXT_RELAY,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_RCVD,__HAS_S
> UBJECT,__HAVE_BOUNCE_RELAYS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_
> RELAY_NO_AUTH,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__NONEMPTY_BOD
> Y,__NUMBERS_IN_SUBJ,__RCVD_IN_2WEEKS,__RFC_IGNORANT_ENVFROM,__TO_NO_ARROWS_
> R,__TVD_BODY learn=ham,time=1252108840,scantime=1,format=f,reuse=no,set=1
> 
> It's clearly a poorly constructed message, but it's also clearly ham
> (it originated from an application that someone somewhere in my
> organization runs).  It had one header: Subject.  Then a body.  Should
> I leave stuff like this in?  I mean, it is ham, but...

I can't offer a definite answer (other comments are welcome), but I'd say
keep a few samples in your ham collection, but not in many copies.

  Mark

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Austin <me...@gmail.com>.
On Thu, Sep 17, 2009 at 11:39 AM, John Hardin <jh...@impsec.org> wrote:
> On Thu, 17 Sep 2009, LuKreme wrote:
>
>> On Sep 16, 2009, at 22:13, Austin <me...@gmail.com> wrote:
>>
>>> It had one header: Subject.  Then a body.  Should
>>> I leave stuff like this in?  I mean, it is ham, but...
>>
>> My feeling would be if it is local only then don't include it.
>
> Agreed.

Thanks for the guidance, all.  I'll toss the absurd things that never
left our network.  There aren't all that many of them, but I wouldn't
want to pollute the pool.

Austin.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by John Hardin <jh...@impsec.org>.
On Thu, 17 Sep 2009, LuKreme wrote:

> On Sep 16, 2009, at 22:13, Austin <me...@gmail.com> wrote:
>
>> It had one header: Subject.  Then a body.  Should
>> I leave stuff like this in?  I mean, it is ham, but...
>
> My feeling would be if it is local only then don't include it.

Agreed.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Perfect Security and Absolute Safety are unattainable; beware
   those who would try to sell them to you, regardless of the cost,
   for they are trying to sell you your own slavery.
-----------------------------------------------------------------------
  Today: the 222nd anniversary of the signing of the U.S. Constitution

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by LuKreme <kr...@kreme.com>.
On Sep 16, 2009, at 22:13, Austin <me...@gmail.com>  
wrote:

> It had one header: Subject.  Then a body.  Should
> I leave stuff like this in?  I mean, it is ham, but...

My feeling would be if it is local only then don't include it.

-- 
Sent from my iPhone


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Austin <me...@gmail.com>.
On Wed, Sep 16, 2009 at 8:16 PM, Daryl C. W. O'Shea
<sp...@dostech.ca> wrote:
[snip]
> now hope to do this Thursday/Friday.  I should be able to scan my
> million or so messages in a day on my cluster.

Wow, that makes me feel inadequate :)  I'm struggling to clean up my
little ham sample of 3600 messages, and looking at another couple
thousand that I'll do if I've got time...

Also, I need some advice, if someone can provide it.  I'm looking at a
message (and I have several like this in my corpus at present) which
generates the following log line

.  1 /home/gems/ham//cur/n8500ejj019591:2,S
MISSING_DATE,MISSING_HEADERS,MISSING_MID,T_FSL_HELO_NON_FQDN_2,__DKIM_DEPENDABLE,__DNS_FROM_RFC_ABUSE,__DOS_DIRECT_TO_MX,__DOS_HAS_ANY_URI,__DOS_RCVD_FRI,__DOS_SINGLE_EXT_RELAY,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_RCVD,__HAS_SUBJECT,__HAVE_BOUNCE_RELAYS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__NONEMPTY_BODY,__NUMBERS_IN_SUBJ,__RCVD_IN_2WEEKS,__RFC_IGNORANT_ENVFROM,__TO_NO_ARROWS_R,__TVD_BODY
learn=ham,time=1252108840,scantime=1,format=f,reuse=no,set=1

It's clearly a poorly constructed message, but it's also clearly ham
(it originated from an application that someone somewhere in my
organization runs).  It had one header: Subject.  Then a body.  Should
I leave stuff like this in?  I mean, it is ham, but...

thanks in advance for any guidance,
Austin.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by John Hardin <jh...@impsec.org>.
On Thu, 17 Sep 2009, Justin Mason wrote:

> On Thu, Sep 17, 2009 at 04:01, Warren Togami <wt...@redhat.com> wrote:
>> On 09/16/2009 11:25 PM, Justin Mason wrote:
>>>
>>> excellent. That's 2 people who could do with an extension, then!
>>
>> Could we state with clarity the new deadline? I might have other 
>> people with data depending on the extended deadline.
>
> Let's push it out until Monday.

Cool! I may be able to contribute local masscheck results too, then!

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Perfect Security and Absolute Safety are unattainable; beware
   those who would try to sell them to you, regardless of the cost,
   for they are trying to sell you your own slavery.
-----------------------------------------------------------------------
  Today: the 222nd anniversary of the signing of the U.S. Constitution

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
On Thu, Sep 17, 2009 at 04:01, Warren Togami <wt...@redhat.com> wrote:
> On 09/16/2009 11:25 PM, Justin Mason wrote:
>>
>> excellent.  That's 2 people who could do with an extension, then!
>
> Could we state with clarity the new deadline?  I might have other people
> with data depending on the extended deadline.

Let's push it out until Monday.

regarding corpus cleaning, RTFM:
http://wiki.apache.org/spamassassin/CorpusCleaning (linked from the
RescoreDetails page)

-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
On Thu, Sep 17, 2009 at 04:01, Warren Togami <wt...@redhat.com> wrote:
> On 09/16/2009 11:25 PM, Justin Mason wrote:
>>
>> excellent.  That's 2 people who could do with an extension, then!
>
> Could we state with clarity the new deadline?  I might have other people
> with data depending on the extended deadline.

Let's push it out until Monday.

regarding corpus cleaning, RTFM:
http://wiki.apache.org/spamassassin/CorpusCleaning (linked from the
RescoreDetails page)

-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/16/2009 11:25 PM, Justin Mason wrote:
> excellent.  That's 2 people who could do with an extension, then!

Could we state with clarity the new deadline?  I might have other people 
with data depending on the extended deadline.



Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/16/2009 11:25 PM, Justin Mason wrote:
> excellent.  That's 2 people who could do with an extension, then!

Could we state with clarity the new deadline?  I might have other people 
with data depending on the extended deadline.



Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
excellent.  That's 2 people who could do with an extension, then!

On Wed, Sep 16, 2009 at 20:16, Daryl C. W. O'Shea
<sp...@dostech.ca> wrote:
> On 16/09/2009 4:03 PM, Justin Mason wrote:
>> Who is running a mass-check that's still in progress?  (fwiw, I am ;)
>
> I had a NAS failure over the weekend that consumed the time I was
> planning on getting my systems right up-to-date for the mass-check.  I
> now hope to do this Thursday/Friday.  I should be able to scan my
> million or so messages in a day on my cluster.
>
> Daryl
>
>



-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
excellent.  That's 2 people who could do with an extension, then!

On Wed, Sep 16, 2009 at 20:16, Daryl C. W. O'Shea
<sp...@dostech.ca> wrote:
> On 16/09/2009 4:03 PM, Justin Mason wrote:
>> Who is running a mass-check that's still in progress?  (fwiw, I am ;)
>
> I had a NAS failure over the weekend that consumed the time I was
> planning on getting my systems right up-to-date for the mass-check.  I
> now hope to do this Thursday/Friday.  I should be able to scan my
> million or so messages in a day on my cluster.
>
> Daryl
>
>



-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 16/09/2009 4:03 PM, Justin Mason wrote:
> Who is running a mass-check that's still in progress?  (fwiw, I am ;)

I had a NAS failure over the weekend that consumed the time I was
planning on getting my systems right up-to-date for the mass-check.  I
now hope to do this Thursday/Friday.  I should be able to scan my
million or so messages in a day on my cluster.

Daryl


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 16/09/2009 4:03 PM, Justin Mason wrote:
> Who is running a mass-check that's still in progress?  (fwiw, I am ;)

I had a NAS failure over the weekend that consumed the time I was
planning on getting my systems right up-to-date for the mass-check.  I
now hope to do this Thursday/Friday.  I should be able to scan my
million or so messages in a day on my cluster.

Daryl


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Mark Martinec <Ma...@ijs.si>.
On Wednesday September 16 2009 22:03:17 Justin Mason wrote:
> Who is running a mass-check that's still in progress?  (fwiw, I am ;)
> It'll be at least 5 users (with myself and John), but that's not a
> great population of training data.

I spent a couple of afternoons cleaning up my corpus or 60.000 messages
(of which 39000 is ham, checked and rechecked). I have already uploaded
my results, although I will probably do another iteration of hand-weeding
based on nightly ruleqa results - it will be there by the end of the day.

  Mark

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Mark Martinec <Ma...@ijs.si>.
On Wednesday September 16 2009 22:03:17 Justin Mason wrote:
> Who is running a mass-check that's still in progress?  (fwiw, I am ;)
> It'll be at least 5 users (with myself and John), but that's not a
> great population of training data.

I spent a couple of afternoons cleaning up my corpus or 60.000 messages
(of which 39000 is ham, checked and rechecked). I have already uploaded
my results, although I will probably do another iteration of hand-weeding
based on nightly ruleqa results - it will be there by the end of the day.

  Mark

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
On Wed, Sep 16, 2009 at 15:47, Warren Togami <wt...@redhat.com> wrote:
> On 09/04/2009 10:51 AM, Justin Mason wrote:
>>
>> OK, if you're planning to send us mass-check logs for the
>> 3.3.0 rescoring, now's the time!
>>
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>
>> cheers!
>>
>> --j.
>
> -rw-r--r--   174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
> -rw-r--r--    36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
> -rw-r--r--     3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
> -rw-r--r--     1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
> -rw-r--r--     5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
> -rw-r--r--         354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
> -rw-r--r--      575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
> -rw-r--r--     2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
> -rw-r--r--    40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
> -rw-r--r--    35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
> -rw-r--r--     4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
> -rw-r--r--        1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
> -rw-r--r--         310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
> -rw-r--r--      494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
> -rw-r--r--       79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
> -rw-r--r--         311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log
>
> One day from the deadline for spamassassin-3.3.0 scoring and we currently
> have only three people reporting.

Who is running a mass-check that's still in progress?  (fwiw, I am ;)

It'll be at least 5 users (with myself and John), but that's not a
great population of training data.

-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
On Wed, Sep 16, 2009 at 15:47, Warren Togami <wt...@redhat.com> wrote:
> On 09/04/2009 10:51 AM, Justin Mason wrote:
>>
>> OK, if you're planning to send us mass-check logs for the
>> 3.3.0 rescoring, now's the time!
>>
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>
>> cheers!
>>
>> --j.
>
> -rw-r--r--   174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
> -rw-r--r--    36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
> -rw-r--r--     3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
> -rw-r--r--     1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
> -rw-r--r--     5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
> -rw-r--r--         354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
> -rw-r--r--      575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
> -rw-r--r--     2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
> -rw-r--r--    40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
> -rw-r--r--    35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
> -rw-r--r--     4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
> -rw-r--r--        1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
> -rw-r--r--         310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
> -rw-r--r--      494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
> -rw-r--r--       79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
> -rw-r--r--         311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log
>
> One day from the deadline for spamassassin-3.3.0 scoring and we currently
> have only three people reporting.

Who is running a mass-check that's still in progress?  (fwiw, I am ;)

It'll be at least 5 users (with myself and John), but that's not a
great population of training data.

-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/16/2009 11:47 AM, Warren Togami wrote:
> On 09/04/2009 10:51 AM, Justin Mason wrote:
>> OK, if you're planning to send us mass-check logs for the
>> 3.3.0 rescoring, now's the time!
>>
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>
>> cheers!
>>
>> --j.
>
> -rw-r--r-- 174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
> -rw-r--r-- 36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
> -rw-r--r-- 3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
> -rw-r--r-- 1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
> -rw-r--r-- 5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
> -rw-r--r-- 354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
> -rw-r--r-- 575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
> -rw-r--r-- 2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
> -rw-r--r-- 40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
> -rw-r--r-- 35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
> -rw-r--r-- 4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
> -rw-r--r-- 1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
> -rw-r--r-- 310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
> -rw-r--r-- 494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
> -rw-r--r-- 79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
> -rw-r--r-- 311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log
>
> One day from the deadline for spamassassin-3.3.0 scoring and we
> currently have only three people reporting.

The deadline has been extended until Monday, September 21st.  But at 
this moment the number of logs reporting for the rescore masscheck has 
not changed.

Are the uploaded corpa being processed?

Who else is still working on their own corpus?

Warren Togami
wtogami@redhat.com

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Austin <me...@gmail.com>.
I've definitely got the volume required.  I'm currently selecting a
random sample of 1% of my site's inbound & outbound mail on both ham &
spam sides, and will still be reviewing the corpus for several (many?)
hours today to make sure it's clean.  I'll see how soon I can get all
of the pieces in place and fire you a link to the files.  Collecting
all of the samples together seems to be taking me quite a bit longer
than I thought (of course).  Don't hold things up on my account, but
I'm hoping to have some results to share by the deadline.

I've had the wiki page open since Justin sent the initial request, but
hadn't gotten around to the soul crushing work of reviewing thousands
of messages yet...


On Wed, Sep 16, 2009 at 11:43 AM, Warren Togami <wt...@redhat.com> wrote:
> On 09/16/2009 01:01 PM, Austin wrote:
>>
>> Would it be worth contributing data from a brand-new corpus of mail
>> from the last few days?  That's the best I can do presently.
>>
>> I have plenty of dreams of creating a good, hand verified, corpus of
>> mail from the last several months, but the development work keeps
>> getting bumped...
>>
>
> Do you have > 1000+ ham, human verified to contain no spam?  If so I suppose
> it is worthwhile.
>
> http://wiki.apache.org/spamassassin/RescoreDetails
> If you follow these instructions and put your logs somewhere I can grab them
> (preferably via HTTP) I can upload your logs for this one-time rescoring
> masscheck.
>
> http://wiki.apache.org/spamassassin/NightlyMassCheck
> If you want to participate in nightly masscheck you should request your own
> account.
>
> Warren Togami
> wtogami@redhat.com
>

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/16/2009 01:01 PM, Austin wrote:
> Would it be worth contributing data from a brand-new corpus of mail
> from the last few days?  That's the best I can do presently.
>
> I have plenty of dreams of creating a good, hand verified, corpus of
> mail from the last several months, but the development work keeps
> getting bumped...
>

Do you have > 1000+ ham, human verified to contain no spam?  If so I 
suppose it is worthwhile.

http://wiki.apache.org/spamassassin/RescoreDetails
If you follow these instructions and put your logs somewhere I can grab 
them (preferably via HTTP) I can upload your logs for this one-time 
rescoring masscheck.

http://wiki.apache.org/spamassassin/NightlyMassCheck
If you want to participate in nightly masscheck you should request your 
own account.

Warren Togami
wtogami@redhat.com

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Austin <me...@gmail.com>.
Would it be worth contributing data from a brand-new corpus of mail
from the last few days?  That's the best I can do presently.

I have plenty of dreams of creating a good, hand verified, corpus of
mail from the last several months, but the development work keeps
getting bumped...

On Wed, Sep 16, 2009 at 8:47 AM, Warren Togami <wt...@redhat.com> wrote:
> On 09/04/2009 10:51 AM, Justin Mason wrote:
>>
>> OK, if you're planning to send us mass-check logs for the
>> 3.3.0 rescoring, now's the time!
>>
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>
>> cheers!
>>
>> --j.
>
> -rw-r--r--   174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
> -rw-r--r--    36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
> -rw-r--r--     3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
> -rw-r--r--     1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
> -rw-r--r--     5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
> -rw-r--r--         354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
> -rw-r--r--      575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
> -rw-r--r--     2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
> -rw-r--r--    40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
> -rw-r--r--    35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
> -rw-r--r--     4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
> -rw-r--r--        1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
> -rw-r--r--         310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
> -rw-r--r--      494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
> -rw-r--r--       79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
> -rw-r--r--         311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log
>
> One day from the deadline for spamassassin-3.3.0 scoring and we currently
> have only three people reporting.
>
> Warren Togami
> wtogami@redhat.com
>

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 16/09/2009 5:48 PM, Warren Togami wrote:
> Let's process the uploaded corpora and see how well the generated scores
> do.  It will probably be better than the ancient 3.2.x with all these
> bug fixes?  Just get it out the door, then focus on cleaning up the
> documentation/tools and recruiting a greater variety of corpus or
> masscheck participants.

IMO score generation is not something we should take, nor have in the
past taken, lightly.  I'm not sure how we "would see how well the
generated scores do" before potentially inflicting pain on the user
base.  A bad score set could have a drastically bad result on the end
product.  The 3.2 scoreset may be old, but it's known to be pretty safe
for not catching ham.  Generating scores with a small collection of ham
increases the risk of ham being caught by the wider user base.

> If we improve the masscheck sample size substantially, we could safely
> redo the scores entirely for 3.3.1.

I have no objection to frequent updates to the scoreset provided they
are quality updates.

Daryl


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
This is turning into a bit of a failboat.  Many of the people I asked 
would like to help but three key problems stop them:

* They don't trust anyone else with their ham so they wont upload their 
corpora anywhere.
* The docs to run masscheck yourself are very poorly written and 
confusing.  When I personally got started on this I had to ask a ton of 
questions here on the list to be sure I was doing the right thing.
* All of their mail is on a remote server.  (Syncing this could be done, 
but there isn't a good solution for repeated syncing later that 
automatically removes mail that you subsequently deleted at the remote 
source.)

Let's process the uploaded corpora and see how well the generated scores 
do.  It will probably be better than the ancient 3.2.x with all these 
bug fixes?  Just get it out the door, then focus on cleaning up the 
documentation/tools and recruiting a greater variety of corpus or 
masscheck participants.

If we improve the masscheck sample size substantially, we could safely 
redo the scores entirely for 3.3.1.

Warren Togami
wtogami@redhat.com

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by John Hardin <jh...@impsec.org>.
On Wed, 16 Sep 2009, Warren Togami wrote:

> On 09/16/2009 04:50 PM, John Hardin wrote:
>>  On Wed, 16 Sep 2009, Justin Mason wrote:
>> 
>> >  oh crap, I didn't even think of that :( I can certainly use that
>> >  uploaded corpora. (_when_, though, is another matter ;)
>>
>>  Related question: if we also have private corpora and wish to include
>>  their results in the masscheck, do we do a local masscheck on just the
>>  private corpora and upload those results per the wiki, and both those
>>  and the masscheck results on the uploaded corpora will be included in
>>  score generation?
>
> http://wiki.apache.org/spamassassin/RescoreDetails
> Did you ever read this page?

Only in passing. My focus is still on creating rules, not on release score 
generation. I will get onboard for that, the timing of the 3.3.0 release 
was just a bit soon for me to do it this time.

> The instructions here are to run masscheck on your local corpora and 
> upload only the logs.

That's why I asked the initial question about the uploaded corpora. The 
wiki _can_ be out of date, and it struck me as odd that the uploaded 
corpora wouldn't be used in score generation.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Therapeutic Phrenologist - send email for affordable rate schedule.
-----------------------------------------------------------------------
  Tomorrow: the 222nd anniversary of the signing of the U.S. Constitution

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/16/2009 04:50 PM, John Hardin wrote:
> On Wed, 16 Sep 2009, Justin Mason wrote:
>
>> oh crap, I didn't even think of that :( I can certainly use that
>> uploaded corpora. (_when_, though, is another matter ;)
>
> Related question: if we also have private corpora and wish to include
> their results in the masscheck, do we do a local masscheck on just the
> private corpora and upload those results per the wiki, and both those
> and the masscheck results on the uploaded corpora will be included in
> score generation?
>

http://wiki.apache.org/spamassassin/RescoreDetails
Did you ever read this page?  The instructions here are to run masscheck 
on your local corpora and upload only the logs.

http://wiki.apache.org/spamassassin/NightlyMassCheck
This is the nightly version of it.

Warren

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by John Hardin <jh...@impsec.org>.
On Wed, 16 Sep 2009, Justin Mason wrote:

> oh crap, I didn't even think of that :( I can certainly use that 
> uploaded corpora.  (_when_, though, is another matter ;)

Related question: if we also have private corpora and wish to include 
their results in the masscheck, do we do a local masscheck on just the 
private corpora and upload those results per the wiki, and both those and 
the masscheck results on the uploaded corpora will be included in score 
generation?

I have several corpora that I'm not willing to upload that I'd be willing 
to run local masschecks against...

> On Wed, Sep 16, 2009 at 17:19, John Hardin <jh...@impsec.org> wrote:
>> On Wed, 16 Sep 2009, Warren Togami wrote:
>>
>>> One day from the deadline for spamassassin-3.3.0 scoring and we 
>>> currently have only three people reporting.
>>
>> Does making sure my uploaded corpora are current suffice? Or am I 
>> _required_ to run a masscheck locally if my corpora are to affect score 
>> generation?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Gun Control is marketed to the public using the appealing delusion
   that violent criminals will obey the law.
-----------------------------------------------------------------------
  Tomorrow: the 222nd anniversary of the signing of the U.S. Constitution

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
oh crap, I didn't even think of that :(   I can certainly use that
uploaded corpora.  (_when_, though, is another matter ;)

--j.

On Wed, Sep 16, 2009 at 17:19, John Hardin <jh...@impsec.org> wrote:
> On Wed, 16 Sep 2009, Warren Togami wrote:
>
>> One day from the deadline for spamassassin-3.3.0 scoring and we currently
>> have only three people reporting.
>
> Does making sure my uploaded corpora are current suffice? Or am I _required_
> to run a masscheck locally if my corpora are to affect score generation?
>
> --
>  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
>  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
>  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>  It is not the place of government to make right every tragedy and
>  woe that befalls every resident of the nation.
> -----------------------------------------------------------------------
>  Tomorrow: the 222nd anniversary of the signing of the U.S. Constitution
>
>



-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by John Hardin <jh...@impsec.org>.
On Wed, 16 Sep 2009, Warren Togami wrote:

> One day from the deadline for spamassassin-3.3.0 scoring and we 
> currently have only three people reporting.

Does making sure my uploaded corpora are current suffice? Or am I 
_required_ to run a masscheck locally if my corpora are to affect score 
generation?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   It is not the place of government to make right every tragedy and
   woe that befalls every resident of the nation.
-----------------------------------------------------------------------
  Tomorrow: the 222nd anniversary of the signing of the U.S. Constitution

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/04/2009 10:51 AM, Justin Mason wrote:
> OK, if you're planning to send us mass-check logs for the
> 3.3.0 rescoring, now's the time!
>
> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>
> cheers!
>
> --j.

-rw-r--r--   174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
-rw-r--r--    36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
-rw-r--r--     3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
-rw-r--r--     1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
-rw-r--r--     5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
-rw-r--r--         354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
-rw-r--r--      575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
-rw-r--r--     2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
-rw-r--r--    40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
-rw-r--r--    35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
-rw-r--r--     4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
-rw-r--r--        1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
-rw-r--r--         310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
-rw-r--r--      494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
-rw-r--r--       79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
-rw-r--r--         311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log

One day from the deadline for spamassassin-3.3.0 scoring and we 
currently have only three people reporting.

Warren Togami
wtogami@redhat.com

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
On Sat, Sep 12, 2009 at 00:19, Mark Martinec <Ma...@ijs.si> wrote:
> Warren writes:
>> > Will there be a ruleqa-like URL to view the mcs logs like we can
>> > currently see with nightly?
>
> Justin writes:
>> Hmm - good idea. I'll put them up at a nightly URL to do that.
>
> Thanks, I've uploaded my results to /home/corpus-rsync/corpus/submit/,
> seems I'm the only one so far - or have I missed the procedure in some way?

Well, my mass-check machine seemed to have spontaneously rebooted
halfway through :(
so I'm way behind schedule.

-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Mark Martinec <Ma...@ijs.si>.
Warren writes:
> > Will there be a ruleqa-like URL to view the mcs logs like we can
> > currently see with nightly?

Justin writes:
> Hmm - good idea. I'll put them up at a nightly URL to do that.

Thanks, I've uploaded my results to /home/corpus-rsync/corpus/submit/,
seems I'm the only one so far - or have I missed the procedure in some way?

  Mark

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
On Sat, Sep 12, 2009 at 15:31, Warren Togami <wt...@redhat.com> wrote:
> On 09/11/2009 04:17 PM, Justin Mason wrote:
>>
>> Hmm - good idea. I'll put them up at a nightly URL to do that.
>
> Did this go anywhere?

I haven't done it yet!

> http://ruleqa.spamassassin.org/
> Did this cause the nightlies to stop updating?  I know they aren't very
> useful right now, but I still like to see them as I add more corpa to the
> mix.

The nightlies should still be updating.

-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/11/2009 04:17 PM, Justin Mason wrote:
> Hmm - good idea. I'll put them up at a nightly URL to do that.

Did this go anywhere?

http://ruleqa.spamassassin.org/
Did this cause the nightlies to stop updating?  I know they aren't very 
useful right now, but I still like to see them as I add more corpa to 
the mix.

Warren Togami
wtogami@redhat.com

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/11/2009 04:17 PM, Justin Mason wrote:
> Hmm - good idea. I'll put them up at a nightly URL to do that.
>

Sorry to bother you, but will this still happen?  It would be very 
helpful. =)

Warren


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
Hmm - good idea. I'll put them up at a nightly URL to do that.

On Friday, September 11, 2009, Warren Togami <wt...@redhat.com> wrote:
> On 09/11/2009 06:58 AM, Justin Mason wrote:
>
> On Fri, Sep 11, 2009 at 04:00, Warren Togami<wt...@redhat.com>  wrote:
>
> On 09/04/2009 10:51 AM, Justin Mason wrote:
>
>
> OK, if you're planning to send us mass-check logs for the
> 3.3.0 rescoring, now's the time!
>
> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>
> cheers!
>
> --j.
>
>
> $ rsync wtogami@rsync.spamassassin.org::submit/
> @ERROR: auth failed on module submit
> rsync error: error starting client-server protocol (code 5) at main.c(1296)
> [receiver=2.6.8]
>
> I'm able to get into the ::corpus/ rsync module, but not submit as written
> in the RescoreDetails page.  Is the page wrong, or this is an auth problem?
>
>
> the latter -- now fixed! ;)
>
> --j.
>
>
>
> Will there be a ruleqa-like URL to view the mcs logs like we can currently see with nightly?
>
> Warren
>
>

-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/11/2009 06:58 AM, Justin Mason wrote:
> On Fri, Sep 11, 2009 at 04:00, Warren Togami<wt...@redhat.com>  wrote:
>> On 09/04/2009 10:51 AM, Justin Mason wrote:
>>>
>>> OK, if you're planning to send us mass-check logs for the
>>> 3.3.0 rescoring, now's the time!
>>>
>>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>>
>>> cheers!
>>>
>>> --j.
>>
>> $ rsync wtogami@rsync.spamassassin.org::submit/
>> @ERROR: auth failed on module submit
>> rsync error: error starting client-server protocol (code 5) at main.c(1296)
>> [receiver=2.6.8]
>>
>> I'm able to get into the ::corpus/ rsync module, but not submit as written
>> in the RescoreDetails page.  Is the page wrong, or this is an auth problem?
>
> the latter -- now fixed! ;)
>
> --j.
>

Will there be a ruleqa-like URL to view the mcs logs like we can 
currently see with nightly?

Warren

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Justin Mason <jm...@jmason.org>.
On Fri, Sep 11, 2009 at 04:00, Warren Togami <wt...@redhat.com> wrote:
> On 09/04/2009 10:51 AM, Justin Mason wrote:
>>
>> OK, if you're planning to send us mass-check logs for the
>> 3.3.0 rescoring, now's the time!
>>
>> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>>
>> cheers!
>>
>> --j.
>
> $ rsync wtogami@rsync.spamassassin.org::submit/
> @ERROR: auth failed on module submit
> rsync error: error starting client-server protocol (code 5) at main.c(1296)
> [receiver=2.6.8]
>
> I'm able to get into the ::corpus/ rsync module, but not submit as written
> in the RescoreDetails page.  Is the page wrong, or this is an auth problem?

the latter -- now fixed! ;)

--j.

> wtogami@redhat.com
>
>



-- 
--j.

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/04/2009 10:51 AM, Justin Mason wrote:
> OK, if you're planning to send us mass-check logs for the
> 3.3.0 rescoring, now's the time!
>
> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>
> cheers!
>
> --j.

$ rsync wtogami@rsync.spamassassin.org::submit/
@ERROR: auth failed on module submit
rsync error: error starting client-server protocol (code 5) at 
main.c(1296) [receiver=2.6.8]

I'm able to get into the ::corpus/ rsync module, but not submit as 
written in the RescoreDetails page.  Is the page wrong, or this is an 
auth problem?

Warren Togami
wtogami@redhat.com

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

Posted by Warren Togami <wt...@redhat.com>.
On 09/04/2009 10:51 AM, Justin Mason wrote:
> OK, if you're planning to send us mass-check logs for the
> 3.3.0 rescoring, now's the time!
>
> http://wiki.apache.org/spamassassin/RescoreDetails has all the details.
>
> cheers!
>
> --j.

-rw-r--r--   174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
-rw-r--r--    36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
-rw-r--r--     3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
-rw-r--r--     1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
-rw-r--r--     5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
-rw-r--r--         354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
-rw-r--r--      575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
-rw-r--r--     2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
-rw-r--r--    40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
-rw-r--r--    35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
-rw-r--r--     4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
-rw-r--r--        1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
-rw-r--r--         310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
-rw-r--r--      494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
-rw-r--r--       79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
-rw-r--r--         311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log

One day from the deadline for spamassassin-3.3.0 scoring and we 
currently have only three people reporting.

Warren Togami
wtogami@redhat.com