You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Amir Reza Rahbaran <am...@gmail.com> on 2014/04/05 12:16:21 UTC

sa-update

Hi list,
I want to know how long it takes custom signatures updated by sa-update.

Thank you.
-- 
Amir Reza Rahbaran

Re: sa-update

Posted by John Hardin <jh...@impsec.org>.
On Mon, 7 Apr 2014, Dave Warren wrote:

> On 2014-04-06 17:21, John Hardin wrote:
>>  On Sun, 6 Apr 2014, Dave Warren wrote:
>> 
>> >  Is older ham useful? It specifically mentions that older spam isn't 
>> >  useful, and why, but I'm thinking older ham is probably useful since old 
>> >  mail clients and legitimately sent mail never dies. But I could filter 
>> >  based on date.
>>
>>  There's some debate about that. :)
>>
>>  I personally agree with you. Others disagree.
>
> I've been giving it some thought and I think that perhaps limiting it to the 
> last few months will make it easier to get a sane set of TRUSTED_NETWORKS and 
> INTERNAL_NETWORKS; I've got mail going back to 
> ~ 2002 but no real recollection of how things were set up or named prior 
> to 2007 or so.
>
> Initially I'll limit it to mail within the last couple of months, but perhaps 
> expand that up to 24-36 months for non-spam and 6 months for spam, is that 
> sane/reasonable?

Sure.

>>  Yes, ham-only masscheck submissions would be very welcome.
>
> Perfect, glad to hear it. At this point I've built a dedicated box to run the 
> masscheck scripts, so now it's just a matter of putting together a corpus and 
> doing some sanity checking and testing.
>
> My current thought is to take user-fed spam and non-spam folders and place 
> copies of messages into a staging path which will then be reviewed before 
> being added to the corpus for learning. Hopefully I'll be ready to go live 
> within a day or two.

Thanks for your participation!

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...every time I sit down in front of a Windows machine I feel as
   if the computer is just a place for the manufacturers to put their
   advertising.                                 -- fwadling on Y! SCOX
-----------------------------------------------------------------------
  6 days until Thomas Jefferson's 271st Birthday

Re: sa-update

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 4/7/2014 3:17 AM, Dave Warren wrote:
> On 2014-04-06 17:21, John Hardin wrote:
>> On Sun, 6 Apr 2014, Dave Warren wrote:
>>
>>> Is older ham useful? It specifically mentions that older spam isn't 
>>> useful, and why, but I'm thinking older ham is probably useful since 
>>> old mail clients and legitimately sent mail never dies. But I could 
>>> filter based on date.
>>
>> There's some debate about that. :)
>>
>> I personally agree with you. Others disagree.
>
> I've been giving it some thought and I think that perhaps limiting it 
> to the last few months will make it easier to get a sane set of 
> TRUSTED_NETWORKS and INTERNAL_NETWORKS; I've got mail going back to 
> ~2002 but no real recollection of how things were set up or named 
> prior to 2007 or so.
>
> Initially I'll limit it to mail within the last couple of months, but 
> perhaps expand that up to 24-36 months for non-spam and 6 months for 
> spam, is that sane/reasonable?
I think 3 years makes a lot of sense for reasons I'd rather not discuss 
on-list for fear the spammers will learn more than I will be able to 
usefully convey.

Regards,
KAM

Re: sa-update

Posted by Dave Warren <da...@hireahit.com>.
On 2014-04-06 17:21, John Hardin wrote:
> On Sun, 6 Apr 2014, Dave Warren wrote:
>
>> Is older ham useful? It specifically mentions that older spam isn't 
>> useful, and why, but I'm thinking older ham is probably useful since 
>> old mail clients and legitimately sent mail never dies. But I could 
>> filter based on date.
>
> There's some debate about that. :)
>
> I personally agree with you. Others disagree.

I've been giving it some thought and I think that perhaps limiting it to 
the last few months will make it easier to get a sane set of 
TRUSTED_NETWORKS and INTERNAL_NETWORKS; I've got mail going back to 
~2002 but no real recollection of how things were set up or named prior 
to 2007 or so.

Initially I'll limit it to mail within the last couple of months, but 
perhaps expand that up to 24-36 months for non-spam and 6 months for 
spam, is that sane/reasonable?


> Yes, ham-only masscheck submissions would be very welcome.

Perfect, glad to hear it. At this point I've built a dedicated box to 
run the masscheck scripts, so now it's just a matter of putting together 
a corpus and doing some sanity checking and testing.

My current thought is to take user-fed spam and non-spam folders and 
place copies of messages into a staging path which will then be reviewed 
before being added to the corpus for learning. Hopefully I'll be ready 
to go live within a day or two.


-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren



Re: sa-update

Posted by John Hardin <jh...@impsec.org>.
On Sun, 6 Apr 2014, Dave Warren wrote:

> On 2014-04-05 09:14, John Hardin wrote:
>>  On Sat, 5 Apr 2014, Amir Reza Rahbaran wrote:
>> 
>> >  I want to know how long it takes custom signatures updated by sa-update.
>>
>>  Daily, if the corpora are sufficient for masscheck scoring to run.
>>
>>  At the moment the masscheck corpus is ham-starved. There's not quite
>>  enough ham available for reliable scores to be generated and published.
>>
>>  Once again, participation as a mass-checker, especially if you can provide
>>  a non-English ham corpus, is solicited. If you have access to thousands of
>>  reliably-categorized messages and can set up a box to run SpamAssassin to
>>  scan them to test the performance of the base rules, please consider
>>  becoming a masscheck contributor. The content of private messages is not
>>  exposed by this process, only the rule hits are public.
>>
>>  If you can do this, see the wiki for the process and contact Kevin McGrail
>>  for upload credentials. Thanks!
>
> I've been idly debating figuring out how to contribute, but having read the 
> wiki articles, I have a few questions:
>
> Is older ham useful? It specifically mentions that older spam isn't useful, 
> and why, but I'm thinking older ham is probably useful since old mail clients 
> and legitimately sent mail never dies. But I could filter based on date.

There's some debate about that. :)

I personally agree with you. Others disagree.

> Is mail "Sent" folder mail of any use? I suspect not, since there's not 
> necessarily a Received header yet (although there might be, it depends on how 
> the user sent the message), so direct-to-MX and similar rules will skew.
>
> Is a ham-only corpus submission useful? Our ham is well cleaned, but we don't 
> archive spam on an ongoing basis, and users primarily just delete spam. But 
> most of our users archive ham and retain it, so depending on what the results 
> look like, it might be useful data source.

Yes, ham-only masscheck submissions would be very welcome.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Men by their constitutions are naturally divided in to two parties:
   1. Those who fear and distrust the people and wish to draw all
   powers from them into the hands of the higher classes. 2. Those who
   identify themselves with the people, have confidence in them,
   cherish and consider them as the most honest and safe, although not
   the most wise, depository of the public interests.
                                                   -- Thomas Jefferson
-----------------------------------------------------------------------
  7 days until Thomas Jefferson's 271st Birthday

Re: sa-update

Posted by Dave Warren <da...@hireahit.com>.
On 2014-04-05 09:14, John Hardin wrote:
> On Sat, 5 Apr 2014, Amir Reza Rahbaran wrote:
>
>> I want to know how long it takes custom signatures updated by sa-update.
>
> Daily, if the corpora are sufficient for masscheck scoring to run.
>
> At the moment the masscheck corpus is ham-starved. There's not quite 
> enough ham available for reliable scores to be generated and published.
>
> Once again, participation as a mass-checker, especially if you can 
> provide a non-English ham corpus, is solicited. If you have access to 
> thousands of reliably-categorized messages and can set up a box to run 
> SpamAssassin to scan them to test the performance of the base rules, 
> please consider becoming a masscheck contributor. The content of 
> private messages is not exposed by this process, only the rule hits 
> are public.
>
> If you can do this, see the wiki for the process and contact Kevin 
> McGrail for upload credentials. Thanks!

I've been idly debating figuring out how to contribute, but having read 
the wiki articles, I have a few questions:

Is older ham useful? It specifically mentions that older spam isn't 
useful, and why, but I'm thinking older ham is probably useful since old 
mail clients and legitimately sent mail never dies. But I could filter 
based on date.

Is mail "Sent" folder mail of any use? I suspect not, since there's not 
necessarily a Received header yet (although there might be, it depends 
on how the user sent the message), so direct-to-MX and similar rules 
will skew.

Is a ham-only corpus submission useful? Our ham is well cleaned, but we 
don't archive spam on an ongoing basis, and users primarily just delete 
spam. But most of our users archive ham and retain it, so depending on 
what the results look like, it might be useful data source.

-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren



Re: sa-update

Posted by jdebert <jd...@garlic.com>.
On Mon, 07 Apr 2014 17:22:37 -0400
Thomas Harold <th...@nybeta.com> wrote:

> On 4/6/2014 11:25 PM, jdebert wrote:
> > 
> > This explains why SA is not catching any spam here? After updating
> > to updates 1584283 and then 1585021, all spam is being passed.
> > Nothing else was done. No other changes made.
> > 
> 
> Our setup is still catching spam, but the performance has definitely
> trended downward in the last week or two.  A lot more stuff is getting
> through to the inbox then before.
> 
> My guess is that the spammers have changed their tactics, again, and
> are now ahead of the various block lists.
> 

After another update, it started catching spam but most was still
getting past. After a second update, it has been catching most spam.

jd




Re: sa-update

Posted by Thomas Harold <th...@nybeta.com>.
On 4/6/2014 11:25 PM, jdebert wrote:
> 
> This explains why SA is not catching any spam here? After updating
> to updates 1584283 and then 1585021, all spam is being passed. Nothing
> else was done. No other changes made.
> 

Our setup is still catching spam, but the performance has definitely
trended downward in the last week or two.  A lot more stuff is getting
through to the inbox then before.

My guess is that the spammers have changed their tactics, again, and are
now ahead of the various block lists.

Re: sa-update

Posted by Dave Warren <da...@hireahit.com>.
On 2014-04-06 20:25, jdebert wrote:
> On Sat, 5 Apr 2014 09:14:56 -0700 (PDT)
> John Hardin <jh...@impsec.org> wrote:
>
>> On Sat, 5 Apr 2014, Amir Reza Rahbaran wrote:
>>
>>> I want to know how long it takes custom signatures updated by
>>> sa-update.
>> Daily, if the corpora are sufficient for masscheck scoring to run.
>>
>> At the moment the masscheck corpus is ham-starved. There's not quite
>> enough ham available for reliable scores to be generated and
>> published.
> This explains why SA is not catching any spam here? After updating
> to updates 1584283 and then 1585021, all spam is being passed. Nothing
> else was done. No other changes made.

No -- This issue just means that rule updates may not get created, but 
the last valid set of rules will still available to sa-update.

-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren



Re: sa-update

Posted by John Hardin <jh...@impsec.org>.
On Sun, 6 Apr 2014, jdebert wrote:

> On Sat, 5 Apr 2014 09:14:56 -0700 (PDT)
> John Hardin <jh...@impsec.org> wrote:
>
>> On Sat, 5 Apr 2014, Amir Reza Rahbaran wrote:
>>
>>> I want to know how long it takes custom signatures updated by
>>> sa-update.
>>
>> Daily, if the corpora are sufficient for masscheck scoring to run.
>>
>> At the moment the masscheck corpus is ham-starved. There's not quite
>> enough ham available for reliable scores to be generated and
>> published.
>
> This explains why SA is not catching any spam here? After updating
> to updates 1584283 and then 1585021, all spam is being passed. Nothing
> else was done. No other changes made.

I was mistaken, the ham corpus has actually been over the threshold for 
several days now. Apologies.

If your install is not catching *any* spam then there's something else 
going on. Are there any headers in FNs indicating that SA is scanning the 
message at all? Are there any error messages in the logs indicating 
problems attempting to run SA? Is the spamd daemon running?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   But if there is no such inalienable right [to self defense], the
   entire nature of the social contract is changed. Each man’s worth
   is measured solely by his utility to the state, and as such the
   value of his life rides a roller coaster not unlike the stock
   market: dependent not only upon the preferences of the party in
   power but upon the whims of its political leaders and the
   permanent bureaucratic class.                      -- Mike McDaniel
-----------------------------------------------------------------------
  6 days until Thomas Jefferson's 271st Birthday

Re: sa-update

Posted by jdebert <jd...@garlic.com>.
On Sat, 5 Apr 2014 09:14:56 -0700 (PDT)
John Hardin <jh...@impsec.org> wrote:

> On Sat, 5 Apr 2014, Amir Reza Rahbaran wrote:
> 
> > I want to know how long it takes custom signatures updated by
> > sa-update.
> 
> Daily, if the corpora are sufficient for masscheck scoring to run.
> 
> At the moment the masscheck corpus is ham-starved. There's not quite 
> enough ham available for reliable scores to be generated and
> published.

This explains why SA is not catching any spam here? After updating
to updates 1584283 and then 1585021, all spam is being passed. Nothing
else was done. No other changes made.

jd



Re: sa-update (nightly mass-check)

Posted by John Hardin <jh...@impsec.org>.
On Mon, 7 Apr 2014, Dave Warren wrote:

> On 2014-04-07 19:23, Thomas Harold wrote:
>> >  NOTE: New masscheck contributors are now being accepted since about 
>> >  2012-08-09.
>>  Is that supposed to say "now being" or "not being"?
>
> I'm assuming "now being" since there are regular mentions of a need for ham 
> corpus. But that's just a hopeful guess, given that I've put some resources 
> into setting up appropriate systems and preparing some messages to start the 
> process.

Get in touch with Kevin McGrail for submission credentials.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Maxim XI: Everything is air-droppable at least once.
-----------------------------------------------------------------------
  5 days until Thomas Jefferson's 271st Birthday

Re: sa-update (nightly mass-check)

Posted by Dave Warren <da...@hireahit.com>.
On 2014-04-08 11:17, Kevin A. McGrail wrote:
> On 4/8/2014 2:15 PM, Dave Warren wrote:
>> On 2014-04-08 03:56, Kevin A. McGrail wrote:
>>> On 4/8/2014 1:16 AM, Dave Warren wrote:
>>>> On 2014-04-07 19:23, Thomas Harold wrote:
>>>>>> NOTE: New masscheck contributors are now being accepted since 
>>>>>> about 2012-08-09.
>>>>> Is that supposed to say "now being" or "not being"?
>>>>
>>>> I'm assuming "now being" since there are regular mentions of a need 
>>>> for ham corpus. But that's just a hopeful guess, given that I've 
>>>> put some resources into setting up appropriate systems and 
>>>> preparing some messages to start the process.
>>>>
>>> Yes, we can make accounts again.  Did you send a request?
>>
>> Indeed, I sent a message to private@ as described on the wiki.
> OK, cc me and send again please.  it might not have been moderated 
> through.

Sent and CC'd, thanks!

-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren



Re: sa-update (nightly mass-check)

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 4/8/2014 2:15 PM, Dave Warren wrote:
> On 2014-04-08 03:56, Kevin A. McGrail wrote:
>> On 4/8/2014 1:16 AM, Dave Warren wrote:
>>> On 2014-04-07 19:23, Thomas Harold wrote:
>>>>> NOTE: New masscheck contributors are now being accepted since 
>>>>> about 2012-08-09.
>>>> Is that supposed to say "now being" or "not being"?
>>>
>>> I'm assuming "now being" since there are regular mentions of a need 
>>> for ham corpus. But that's just a hopeful guess, given that I've put 
>>> some resources into setting up appropriate systems and preparing 
>>> some messages to start the process.
>>>
>> Yes, we can make accounts again.  Did you send a request?
>
> Indeed, I sent a message to private@ as described on the wiki.
OK, cc me and send again please.  it might not have been moderated through.

Re: sa-update (nightly mass-check)

Posted by Dave Warren <da...@hireahit.com>.
On 2014-04-08 03:56, Kevin A. McGrail wrote:
> On 4/8/2014 1:16 AM, Dave Warren wrote:
>> On 2014-04-07 19:23, Thomas Harold wrote:
>>>> NOTE: New masscheck contributors are now being accepted since about 
>>>> 2012-08-09.
>>> Is that supposed to say "now being" or "not being"?
>>
>> I'm assuming "now being" since there are regular mentions of a need 
>> for ham corpus. But that's just a hopeful guess, given that I've put 
>> some resources into setting up appropriate systems and preparing some 
>> messages to start the process.
>>
> Yes, we can make accounts again.  Did you send a request?

Indeed, I sent a message to private@ as described on the wiki.

>
> However, the ham is not starved.  We have been publishing rules. Not 
> sure where the disconnect on the firing of the script is coming from.

Understood. However, over the last couple years, there have been 
multiple times that this was mentioned (whether it was actually true or 
not), which is what motivated me to attempt to contribute.

-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren



Re: sa-update (nightly mass-check)

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 4/9/2014 2:53 PM, Thomas Harold wrote:
> On 4/8/2014 6:56 AM, Kevin A. McGrail wrote:
>> Yes, we can make accounts again.  Did you send a request?
>>
>> However, the ham is not starved.  We have been publishing rules. Not
>> sure where the disconnect on the firing of the script is coming from.
>>
>> Regards,
>> KAM
>>
>>
> Assisting in the mass-check is on my to-do list later this year.  I have
> a mail account that gets 80+ spam per day and I get 5-30 outside mails
> each day.
Every little bit helps!

Re: sa-update (nightly mass-check)

Posted by Thomas Harold <th...@nybeta.com>.
On 4/8/2014 6:56 AM, Kevin A. McGrail wrote:
> Yes, we can make accounts again.  Did you send a request?
> 
> However, the ham is not starved.  We have been publishing rules. Not
> sure where the disconnect on the firing of the script is coming from.
> 
> Regards,
> KAM
> 
> 

Assisting in the mass-check is on my to-do list later this year.  I have
a mail account that gets 80+ spam per day and I get 5-30 outside mails
each day.



Re: sa-update (nightly mass-check)

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 4/8/2014 1:16 AM, Dave Warren wrote:
> On 2014-04-07 19:23, Thomas Harold wrote:
>>> NOTE: New masscheck contributors are now being accepted since about 
>>> 2012-08-09.
>> Is that supposed to say "now being" or "not being"?
>
> I'm assuming "now being" since there are regular mentions of a need 
> for ham corpus. But that's just a hopeful guess, given that I've put 
> some resources into setting up appropriate systems and preparing some 
> messages to start the process.
>
Yes, we can make accounts again.  Did you send a request?

However, the ham is not starved.  We have been publishing rules. Not 
sure where the disconnect on the firing of the script is coming from.

Regards,
KAM



Re: sa-update (nightly mass-check)

Posted by Dave Warren <da...@hireahit.com>.
On 2014-04-07 19:23, Thomas Harold wrote:
>> NOTE: New masscheck contributors are now being accepted since about 2012-08-09.
> Is that supposed to say "now being" or "not being"?

I'm assuming "now being" since there are regular mentions of a need for 
ham corpus. But that's just a hopeful guess, given that I've put some 
resources into setting up appropriate systems and preparing some 
messages to start the process.

-- 
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren



Re: sa-update (nightly mass-check)

Posted by Thomas Harold <th...@nybeta.com>.
On 4/5/2014 12:14 PM, John Hardin wrote:
> On Sat, 5 Apr 2014, Amir Reza Rahbaran wrote:
> 
>> I want to know how long it takes custom signatures updated by sa-update.
> 
> Daily, if the corpora are sufficient for masscheck scoring to run.
> 
> At the moment the masscheck corpus is ham-starved. There's not quite
> enough ham available for reliable scores to be generated and published.
> 

http://wiki.apache.org/spamassassin/NightlyMassCheck

> NOTE: New masscheck contributors are now being accepted since about 2012-08-09. 

Is that supposed to say "now being" or "not being"?


Re: sa-update

Posted by John Hardin <jh...@impsec.org>.
On Sat, 5 Apr 2014, Amir Reza Rahbaran wrote:

> I want to know how long it takes custom signatures updated by sa-update.

Daily, if the corpora are sufficient for masscheck scoring to run.

At the moment the masscheck corpus is ham-starved. There's not quite 
enough ham available for reliable scores to be generated and published.

Once again, participation as a mass-checker, especially if you can provide 
a non-English ham corpus, is solicited. If you have access to thousands of 
reliably-categorized messages and can set up a box to run SpamAssassin to 
scan them to test the performance of the base rules, please consider 
becoming a masscheck contributor. The content of private messages is not 
exposed by this process, only the rule hits are public.

If you can do this, see the wiki for the process and contact Kevin McGrail 
for upload credentials. Thanks!

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The difference is that Unix has had thirty years of technical
   types demanding basic functionality of it. And the Macintosh has
   had fifteen years of interface fascist users shaping its progress.
   Windows has the hairpin turns of the Microsoft marketing machine
   and that's all.                                    -- Red Drag Diva
-----------------------------------------------------------------------
  8 days until Thomas Jefferson's 271st Birthday