You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2014/04/01 02:19:52 UTC

Re: seek-phrases-in-corpus output

On Mon, 31 Mar 2014, John Hardin wrote:

> On Mon, 31 Mar 2014, darxus@chaosreigns.com wrote:
>
>>  I've been getting more missed spam lately.  Ran
>>  seek-phrases-in-corpus from SA svn.  Output is here:
>>  http://www.chaosreigns.com/sa/seek-2014-03-31.txt I'd
>>  like to see some of these tested, particularly these:
>>  http://www.chaosreigns.com/sa/seek-good-2014-03-31.txt
>
> Any objections to my adding these to my sandbox?

Hearing no objections I've added the seek-good ones (plus one that matches 
text I've seen in spams before) to my sandbox. Search using /__DX_TEST_

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   People think they're trading chaos for order [by ceding more and
   more power to the Government], but they're just trading normal
   human evil for the really dangerous organized kind of evil, the
   kind that simply does not give a shit. Only bureaucrats can give
   you true evil.                                     -- Larry Correia
-----------------------------------------------------------------------
  Tomorrow: April Fools' day

Re: seek-phrases-in-corpus output

Posted by John Hardin <jh...@impsec.org>.
On Wed, 2 Apr 2014, darxus@chaosreigns.com wrote:

> On 04/01, John Hardin wrote:
>> These appear to be doing pretty good, I've exposed them for scoring
>> and renamed them to __DX_TEXT_*
>
> Cool, thanks.

Question:

I'm seeing spams that are minor variations on those rules. Do you think 
it's better to add more individual static (non-alternation) rules, or 
modify those to cover the variants?

Which would be better from the point of view of external optimization 
tools, e.g. the rules compiler?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Where We Want You To Go Today 09/13/07: Microsoft patents in-OS
   adware architecture that incorporates monitoring and analysis of
   user actions and interrupting the user to display apparently
   relevant advertisements (U.S. Patent #20070214042)
-----------------------------------------------------------------------
  671 days since the first successful private support mission to ISS (SpaceX)

Re: seek-phrases-in-corpus output

Posted by Axb <ax...@gmail.com>.
On 04/02/2014 07:29 PM, darxus@chaosreigns.com wrote:
> 2) More automation of rule generation and testing.... Maybe modify
> the mass check script to run seek-phrases-in-corpus, only on spams
> below the default threshold?  Upload results automatically, score them
> automatically?

No need to modify masschecks scripts.

you masscheck your spam corpus and run seek-phrases-in-log.
IOW, run the SOUGHT rule routine.

Re: seek-phrases-in-corpus output

Posted by da...@chaosreigns.com.
On 04/01, John Hardin wrote:
> These appear to be doing pretty good, I've exposed them for scoring
> and renamed them to __DX_TEXT_*

Cool, thanks.  Two related thoughts:

1) Has there been any consideration in recent years of adjusting whatever
thresholds limit the number of rules, to include more rules?  That's some
kind of automated decision, right?

2) More automation of rule generation and testing.... Maybe modify
the mass check script to run seek-phrases-in-corpus, only on spams
below the default threshold?  Upload results automatically, score them
automatically?

-- 
"Hello, babies. Welcome to Earth. It's hot in the summer and cold in
the winter. It's round and wet and crowded. At the outside, babies,
you've got about a hundred years here. There's only one rule that I
know of, babies—God damn it, you've got to be kind." - Kurt Vonnegut
http://www.ChaosReigns.com

Re: seek-phrases-in-corpus output

Posted by John Hardin <jh...@impsec.org>.
On Mon, 31 Mar 2014, John Hardin wrote:

> On Mon, 31 Mar 2014, John Hardin wrote:
>
>>  On Mon, 31 Mar 2014, darxus@chaosreigns.com wrote:
>> 
>> >   I've been getting more missed spam lately.  Ran
>> >   seek-phrases-in-corpus from SA svn.  Output is here:
>> >   http://www.chaosreigns.com/sa/seek-2014-03-31.txt I'd
>> >   like to see some of these tested, particularly these:
>> >   http://www.chaosreigns.com/sa/seek-good-2014-03-31.txt
>>
>>  Any objections to my adding these to my sandbox?
>
> Hearing no objections I've added the seek-good ones (plus one that matches 
> text I've seen in spams before) to my sandbox. Search using /__DX_TEST_

These appear to be doing pretty good, I've exposed them for scoring and 
renamed them to __DX_TEXT_*

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   An AR-15 in civilian hands used to defend a home or business:
     a High Velocity Assault Weapon with High Capacity Magazines
   An AR-15 in Law Enforcement Officer hands used to murder six kids:
     a Police-Style Patrol Rifle
-----------------------------------------------------------------------
  Today: April Fools' day