You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2014/04/01 02:19:52 UTC
Re: seek-phrases-in-corpus output
On Mon, 31 Mar 2014, John Hardin wrote:
> On Mon, 31 Mar 2014, darxus@chaosreigns.com wrote:
>
>> I've been getting more missed spam lately. Ran
>> seek-phrases-in-corpus from SA svn. Output is here:
>> http://www.chaosreigns.com/sa/seek-2014-03-31.txt I'd
>> like to see some of these tested, particularly these:
>> http://www.chaosreigns.com/sa/seek-good-2014-03-31.txt
>
> Any objections to my adding these to my sandbox?
Hearing no objections I've added the seek-good ones (plus one that matches
text I've seen in spams before) to my sandbox. Search using /__DX_TEST_
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
People think they're trading chaos for order [by ceding more and
more power to the Government], but they're just trading normal
human evil for the really dangerous organized kind of evil, the
kind that simply does not give a shit. Only bureaucrats can give
you true evil. -- Larry Correia
-----------------------------------------------------------------------
Tomorrow: April Fools' day
Re: seek-phrases-in-corpus output
Posted by John Hardin <jh...@impsec.org>.
On Wed, 2 Apr 2014, darxus@chaosreigns.com wrote:
> On 04/01, John Hardin wrote:
>> These appear to be doing pretty good, I've exposed them for scoring
>> and renamed them to __DX_TEXT_*
>
> Cool, thanks.
Question:
I'm seeing spams that are minor variations on those rules. Do you think
it's better to add more individual static (non-alternation) rules, or
modify those to cover the variants?
Which would be better from the point of view of external optimization
tools, e.g. the rules compiler?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Where We Want You To Go Today 09/13/07: Microsoft patents in-OS
adware architecture that incorporates monitoring and analysis of
user actions and interrupting the user to display apparently
relevant advertisements (U.S. Patent #20070214042)
-----------------------------------------------------------------------
671 days since the first successful private support mission to ISS (SpaceX)
Re: seek-phrases-in-corpus output
Posted by Axb <ax...@gmail.com>.
On 04/02/2014 07:29 PM, darxus@chaosreigns.com wrote:
> 2) More automation of rule generation and testing.... Maybe modify
> the mass check script to run seek-phrases-in-corpus, only on spams
> below the default threshold? Upload results automatically, score them
> automatically?
No need to modify masschecks scripts.
you masscheck your spam corpus and run seek-phrases-in-log.
IOW, run the SOUGHT rule routine.
Re: seek-phrases-in-corpus output
Posted by da...@chaosreigns.com.
On 04/01, John Hardin wrote:
> These appear to be doing pretty good, I've exposed them for scoring
> and renamed them to __DX_TEXT_*
Cool, thanks. Two related thoughts:
1) Has there been any consideration in recent years of adjusting whatever
thresholds limit the number of rules, to include more rules? That's some
kind of automated decision, right?
2) More automation of rule generation and testing.... Maybe modify
the mass check script to run seek-phrases-in-corpus, only on spams
below the default threshold? Upload results automatically, score them
automatically?
--
"Hello, babies. Welcome to Earth. It's hot in the summer and cold in
the winter. It's round and wet and crowded. At the outside, babies,
you've got about a hundred years here. There's only one rule that I
know of, babies—God damn it, you've got to be kind." - Kurt Vonnegut
http://www.ChaosReigns.com
Re: seek-phrases-in-corpus output
Posted by John Hardin <jh...@impsec.org>.
On Mon, 31 Mar 2014, John Hardin wrote:
> On Mon, 31 Mar 2014, John Hardin wrote:
>
>> On Mon, 31 Mar 2014, darxus@chaosreigns.com wrote:
>>
>> > I've been getting more missed spam lately. Ran
>> > seek-phrases-in-corpus from SA svn. Output is here:
>> > http://www.chaosreigns.com/sa/seek-2014-03-31.txt I'd
>> > like to see some of these tested, particularly these:
>> > http://www.chaosreigns.com/sa/seek-good-2014-03-31.txt
>>
>> Any objections to my adding these to my sandbox?
>
> Hearing no objections I've added the seek-good ones (plus one that matches
> text I've seen in spams before) to my sandbox. Search using /__DX_TEST_
These appear to be doing pretty good, I've exposed them for scoring and
renamed them to __DX_TEXT_*
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
An AR-15 in civilian hands used to defend a home or business:
a High Velocity Assault Weapon with High Capacity Magazines
An AR-15 in Law Enforcement Officer hands used to murder six kids:
a Police-Style Patrol Rifle
-----------------------------------------------------------------------
Today: April Fools' day