You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Daryl C. W. O'Shea" <sp...@dostech.ca> on 2010/07/14 06:06:38 UTC
3.3 Rule Auto-Update Mass-Checks
We haven't had an auto-update for a while as ham submissions have dropped.
> HAM: 145387 (150000 required)
> SPAM: 546888 (150000 required)
> Insufficient ham corpus to generate scores; aborting.
Currently the cut-off is set for ham no more than 38 months old.
Can anyone contribute more newer ham?
Should we consider allowing ham even older than 3.5 years?
Regards,
Daryl
Wed Jul 14 02:26:57 GMT 2010
[ running log-grep-recent ]
Month distribution:
5311 ( 2%) 5311 0-1 months old
13148 ( 5%) 7837 1-2 months old
20375 ( 8%) 7227 2-3 months old
27714 ( 11%) 7339 3-4 months old
32787 ( 13%) 5073 4-5 months old
36494 ( 15%) 3707 5-6 months old
39234 ( 16%) 2740 6-7 months old
41809 ( 17%) 2575 7-8 months old
44474 ( 18%) 2665 8-9 months old
47950 ( 20%) 3476 9-10 months old
51521 ( 21%) 3571 10-11 months old
54813 ( 22%) 3292 11-12 months old
57906 ( 24%) 3093 12-13 months old
61668 ( 25%) 3762 13-14 months old
64831 ( 27%) 3163 14-15 months old
67841 ( 28%) 3010 15-16 months old
72094 ( 30%) 4253 16-17 months old
76773 ( 32%) 4679 17-18 months old
80058 ( 33%) 3285 18-19 months old
83953 ( 35%) 3895 19-20 months old
87731 ( 36%) 3778 20-21 months old
91384 ( 38%) 3653 21-22 months old
94964 ( 39%) 3580 22-23 months old
98822 ( 41%) 3858 23-24 months old
102412 ( 42%) 3590 24-25 months old
105262 ( 44%) 2850 25-26 months old
108156 ( 45%) 2894 26-27 months old
111131 ( 46%) 2975 27-28 months old
114019 ( 47%) 2888 28-29 months old
117134 ( 48%) 3115 29-30 months old
120535 ( 50%) 3401 30-31 months old
123600 ( 51%) 3065 31-32 months old
126361 ( 52%) 2761 32-33 months old
129122 ( 54%) 2761 33-34 months old
132251 ( 55%) 3129 34-35 months old
135418 ( 56%) 3167 35-36 months old
139040 ( 58%) 3622 36-37 months old
142735 ( 59%) 3695 37-38 months old
145387 ( 60%) 2652 38-39 months old
149082 ( 62%) 3695 39-40 months old
153448 ( 64%) 4366 40-41 months old
158760 ( 66%) 5312 41-42 months old
163672 ( 68%) 4912 42-43 months old
167346 ( 69%) 3674 43-44 months old
172051 ( 71%) 4705 44-45 months old
176994 ( 74%) 4943 45-46 months old
179454 ( 75%) 2460 46-47 months old
182938 ( 76%) 3484 47-48 months old
186115 ( 77%) 3177 48-49 months old
189062 ( 79%) 2947 49-50 months old
192196 ( 80%) 3134 50-51 months old
195329 ( 81%) 3133 51-52 months old
196639 ( 82%) 1310 52-53 months old
196901 ( 82%) 262 53-54 months old
197203 ( 82%) 302 54-55 months old
197468 ( 82%) 265 55-56 months old
197648 ( 82%) 180 56-57 months old
197815 ( 82%) 167 57-58 months old
197954 ( 82%) 139 58-59 months old
198084 ( 82%) 130 59-60 months old
198232 ( 82%) 148 60-61 months old
198363 ( 82%) 131 61-62 months old
198493 ( 83%) 130 62-63 months old
198621 ( 83%) 128 63-64 months old
198750 ( 83%) 129 64-65 months old
198859 ( 83%) 109 65-66 months old
198974 ( 83%) 115 66-67 months old
199017 ( 83%) 43 67-68 months old
199048 ( 83%) 31 68-69 months old
199144 ( 83%) 96 69-70 months old
199189 ( 83%) 45 70-71 months old
199277 ( 83%) 88 71-72 months old
199363 ( 83%) 86 72-73 months old
199397 ( 83%) 34 73-74 months old
199458 ( 83%) 61 74-75 months old
199561 ( 83%) 103 75-76 months old
199597 ( 83%) 36 76-77 months old
199662 ( 83%) 65 77-78 months old
199678 ( 83%) 16 78-79 months old
199704 ( 83%) 26 79-80 months old
199714 ( 83%) 10 80-81 months old
199719 ( 83%) 5 81-82 months old
199723 ( 83%) 4 82-83 months old
199725 ( 83%) 2 83-84 months old
199731 ( 83%) 6 84-85 months old
199732 ( 83%) 1 85-86 months old
199737 ( 83%) 5 86-87 months old
199741 ( 83%) 4 87-88 months old
199748 ( 83%) 7 88-89 months old
199750 ( 83%) 2 89-90 months old
199753 ( 83%) 3 90-91 months old
199758 ( 83%) 5 91-92 months old
199759 ( 83%) 1 92-93 months old
199762 ( 83%) 3 93-94 months old
199768 ( 83%) 6 94-95 months old
199772 ( 83%) 4 95-96 months old
199780 ( 83%) 8 96-97 months old
199797 ( 83%) 17 97-98 months old
200276 ( 83%) 479 98-99 months old
200655 ( 83%) 379 99-100 months old
201038 ( 84%) 383 100-101 months old
202566 ( 84%) 1528 101-102 months old
206095 ( 86%) 3529 102-103 months old
211077 ( 88%) 4982 103-104 months old
214961 ( 89%) 3884 104-105 months old
224099 ( 93%) 9138 105-106 months old
233245 ( 97%) 9146 106-107 months old
236036 ( 98%) 2791 107-108 months old
237743 ( 99%) 1707 108-109 months old
238632 ( 99%) 889 109-110 months old
238665 ( 99%) 33 110-111 months old
238677 ( 99%) 12 111-112 months old
238691 ( 99%) 14 112-113 months old
238709 ( 99%) 18 113-114 months old
238724 ( 99%) 15 114-115 months old
238732 ( 99%) 8 115-116 months old
238743 ( 99%) 11 116-117 months old
238755 ( 99%) 12 117-118 months old
238781 ( 99%) 26 118-119 months old
238797 ( 99%) 16 119-120 months old
238804 ( 99%) 7 120-121 months old
238817 ( 99%) 13 121-122 months old
238823 ( 99%) 6 122-123 months old
238835 ( 99%) 12 123-124 months old
238850 ( 99%) 15 124-125 months old
238862 ( 99%) 12 125-126 months old
238878 ( 99%) 16 126-127 months old
238887 ( 99%) 9 127-128 months old
238893 ( 99%) 6 128-129 months old
238897 ( 99%) 4 129-130 months old
238914 ( 99%) 17 130-131 months old
238932 ( 99%) 18 131-132 months old
238951 ( 99%) 19 132-133 months old
238978 ( 99%) 27 133-134 months old
238989 ( 99%) 11 134-135 months old
239018 ( 99%) 29 135-136 months old
239034 ( 99%) 16 136-137 months old
239046 ( 99%) 12 137-138 months old
239049 ( 99%) 3 138-139 months old
239056 ( 99%) 7 139-140 months old
239066 ( 99%) 10 140-141 months old
239068 ( 99%) 2 141-142 months old
239070 ( 99%) 2 143-144 months old
239071 ( 99%) 1 144-145 months old
239072 ( 99%) 1 145-146 months old
239073 ( 99%) 1 151-152 months old
239075 ( 99%) 2 154-155 months old
239078 ( 99%) 3 155-156 months old
239080 (100%) 2 156-157 months old
Month distribution:
433651 ( 69%) 433651 0-1 months old
533568 ( 85%) 99917 1-2 months old
546888 ( 87%) 13320 2-3 months old
552339 ( 88%) 5451 3-4 months old
553792 ( 88%) 1453 4-5 months old
611606 ( 98%) 57814 5-6 months old
621333 ( 99%) 9727 6-7 months old
621561 ( 99%) 228 7-8 months old
622012 ( 99%) 451 8-9 months old
622247 ( 99%) 235 9-10 months old
622258 ( 99%) 11 10-11 months old
622266 ( 99%) 8 12-13 months old
622271 ( 99%) 5 13-14 months old
622295 ( 99%) 24 14-15 months old
622321 ( 99%) 26 15-16 months old
622412 ( 99%) 91 16-17 months old
622429 ( 99%) 17 17-18 months old
622435 (100%) 6 18-19 months old
HAM: 145387 (150000 required)
SPAM: 546888 (150000 required)
Insufficient ham corpus to generate scores; aborting.
Re: 3.3 Rule Auto-Update Mass-Checks
Posted by Warren Togami <wt...@gmail.com>.
Sorry folks, I have fallen behind in sorting my corpora. I will fix this
situation soon.
I would be against increasing the age of validity of ham.
Warren
Re: 3.3 Rule Auto-Update Mass-Checks
Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 14/07/2010 1:06 AM, John Hardin wrote:
> On Wed, 14 Jul 2010, Daryl C. W. O'Shea wrote:
>
>> We haven't had an auto-update for a while as ham submissions have
>> dropped.
>>
>>> HAM: 145387 (150000 required)
>>> SPAM: 546888 (150000 required)
>>> Insufficient ham corpus to generate scores; aborting.
>>
>> Currently the cut-off is set for ham no more than 38 months old.
>>
>> Should we consider allowing ham even older than 3.5 years?
>
> Spam is ever-changing, but does ham change that much over time?
>
> I'd say yes.
I would say yes, too. Hence the 3.5 year limit. We used to have a
similar age limit for "organized" .0 release mass-checks.
> Doesn't the 38 months rule discard that 38k+ Enron ham corpus?
I don't know the exact age of that corpus, but I would say yes, it'd be
older than 38 months.
I suppose that the worst case is that old mail will only lessen scores
for spam rules and not cause an increase in FPs, so it'd be technically
safe to include older mail.
Daryl
>
Re: 3.3 Rule Auto-Update Mass-Checks
Posted by John Hardin <jh...@impsec.org>.
On Wed, 14 Jul 2010, Daryl C. W. O'Shea wrote:
> We haven't had an auto-update for a while as ham submissions have
> dropped.
>
>> HAM: 145387 (150000 required)
>> SPAM: 546888 (150000 required)
>> Insufficient ham corpus to generate scores; aborting.
>
> Currently the cut-off is set for ham no more than 38 months old.
>
> Should we consider allowing ham even older than 3.5 years?
Spam is ever-changing, but does ham change that much over time?
I'd say yes.
Doesn't the 38 months rule discard that 38k+ Enron ham corpus?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Where We Want You To Go Today 09/13/07: Microsoft patents in-OS
adware architecture that incorporates monitoring and analysis of
user actions and interrupting the user to display apparently
relevant advertisements (U.S. Patent #20070214042)
-----------------------------------------------------------------------
3 days until the 65th anniversary of the dawn of the Atomic Age