You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Michael Orlitzky <mi...@orlitzky.com> on 2012/11/08 03:34:03 UTC

Claims manager / LOTTO_AGENT

So, LOTTO_AGENT will hit the string "Claims Manager" for 3.5 points.
This is bad news for,

  Barbara R. Krieg, Claims Manager
  Foodliner, Inc. / Quest Liner / Truck Country P.O. Box 1565 Dubuque,IA

who has a signature at the bottom of her messages.

This is compounded by the fact that

  ADVANCE_FEE_2_NEW_MONEY = __ADVANCE_FEE_2_NEW_MONEY && ...
  __ADVANCE_FEE_2_NEW_MONEY = LOTS_OF_MONEY && __ADVANCE_FEE_2_NEW
  __ADVANCE_FEE_2_NEW  = (__AFRICAN_STATE + ... + LOTTO_AGENT + ... > 1)

for a total score of around 7.8. Believe it or not, claims managers talk
about LOTS_OF_MONEY =)

Can one of these be made a little more strict? Sorry to be a pain and
submit these one at a time, but most of the ones that give me trouble
are confidential.

Re: Claims manager / LOTTO_AGENT

Posted by Michael Orlitzky <mi...@orlitzky.com>.
On 11/07/2012 10:21 PM, darxus@chaosreigns.com wrote:
> On 11/07, Michael Orlitzky wrote:
>> On 11/07/2012 09:49 PM, darxus@chaosreigns.com wrote:
>>> On 11/07, Michael Orlitzky wrote:
>>>> So, LOTTO_AGENT will hit the string "Claims Manager" for 3.5 points.
>>>> This is bad news for,
>>>>
>>>>   Barbara R. Krieg, Claims...
>>>
>>> When you put a string an an email that hits a spamassassin rule... your
>>> email then hits that spamassassin rule.  You should generally try to avoid
>>> that.
>>
>> Yeah, well it's her job title, so...? You misunderstand statistics. The
>> data aren't wrong.
> 
> After re-reading, I think you may have misunderstood my suggestion to avoid
> putting stuff in emails that is known to hit spam rules.  I wasn't
> suggesting that Barbara R. Krieg change her signature, I was suggesting
> that you not include it intact when posting to this mailing list about it.
> 

I see. My apologies. Disregard the first half of that last message.

Re: Claims manager / LOTTO_AGENT

Posted by da...@chaosreigns.com.
On 11/07, Michael Orlitzky wrote:
> On 11/07/2012 09:49 PM, darxus@chaosreigns.com wrote:
> > On 11/07, Michael Orlitzky wrote:
> >> So, LOTTO_AGENT will hit the string "Claims Manager" for 3.5 points.
> >> This is bad news for,
> >>
> >>   Barbara R. Krieg, Claims...
> > 
> > When you put a string an an email that hits a spamassassin rule... your
> > email then hits that spamassassin rule.  You should generally try to avoid
> > that.
> 
> Yeah, well it's her job title, so...? You misunderstand statistics. The
> data aren't wrong.

After re-reading, I think you may have misunderstood my suggestion to avoid
putting stuff in emails that is known to hit spam rules.  I wasn't
suggesting that Barbara R. Krieg change her signature, I was suggesting
that you not include it intact when posting to this mailing list about it.

-- 
"You shall know the truth, and it shall make you odd."
-- Flannery O'Connor
http://www.ChaosReigns.com

Re: Claims manager / LOTTO_AGENT

Posted by Alexandre Boyer <bi...@gmail.com>.
Hello there,

Well if you feel uncomfortable with running mass-check and send data
(not the email themselves, just the rules they hit, as Darxus is
pointing out), you may want to override the score for those rules in
your local.cf.

You may even write you own rules to compensate those false positives.

If you can't contribute to SA by giving feedback via the mass-check,
then do what you need to do on your side. Everybody here will be glad to
help ;)

Alex, from prypiat.
Yes, I recycle.


On 12-11-07 11:02 PM, Michael Orlitzky wrote:
> On 11/07/2012 10:36 PM, darxus@chaosreigns.com wrote:
>> On 11/07, Michael Orlitzky wrote:
>>> Sorry, I was a little rude. But saying that she shouldn't put her job
>>> title anywhere in an email, ever, is ridiculous. 
>> Certainly.
>>
>>> The inputs (spam, ham)
>>> to the classifier are assumed god-given; and the classification needs to
>>> reflect the data, not the other way around.
>> If "the classifier" is spamassassin, and "The inputs" are the spam
>> and ham data provided via masscheck, then... the scores provided via
>> sa-update *do* reflect the data.  So I'm not sure what you mean.
>>
>> The ideal rule scores are chosen to cause one false positive (ham flagged
>> as spam) in every 2,500 hams, while maximizing the number of spams
>> correctly flagged as spams.  With so few hams hitting this rule in the
>> masscheck corpora, we're way below that threshold based on the data we
>> have.
>>
> I wrote that before I saw your clarification, sorry again for coming off
> as a jerk. Ignore it.
>
>
>>> This is my fault, of course, but I'm not allowed to mass-check this
>>> stuff. It's ongoing legal correspondence.
>> Er, what?  You're not allowed to provide a list of which rules hit each
>> of your emails?  Or you're not allowed to run a program on your emails
>> that isn't spamassassin?  Or did I just not put "This does not require
>> sending us your email" in bold enough times on the masscheck page?
>>
> This is a client of ours (a law firm) and not the company that I work
> for. *I* know there's probably nothing sensitive in there, but just to
> cover my ass I'd need to get permission to send the results off-site.
> From their perspective, it's just simpler to say no: it's not worth the
> time or effort to even think about if there's a minute chance of it
> coming back to bite them legally.


Re: Claims manager / LOTTO_AGENT

Posted by John Hardin <jh...@impsec.org>.
On Thu, 8 Nov 2012, Michael Orlitzky wrote:
> On 11/08/2012 10:44 AM, John Hardin wrote:
>>
>> I will take a look at "claims manager" in the 419 rules.
>
> I appreciate it, thanks.

Okay, I've committed some tuning for that rule. I will probably take a 
couple of days before it shows up in a rules update.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   When I say "I don't want the government to do X", do not
   automatically assume that means I don't want X to happen.
-----------------------------------------------------------------------
  3 days until Veterans Day

Re: Claims manager / LOTTO_AGENT

Posted by Michael Orlitzky <mi...@orlitzky.com>.
On 11/08/2012 10:44 AM, John Hardin wrote:
>>
>> This is a client of ours (a law firm) and not the company that I work
>> for. *I* know there's probably nothing sensitive in there, but just to
>> cover my ass I'd need to get permission to send the results off-site.
> 
> Only the list of rules which hit is publicly visible, the actual content 
> of the message is not. Any leakage of confidential information is very 
> unlikely.

I know, but there chance isn't zero. For example, I wouldn't want to
mass-check a corpus of emails to my girlfriend, and have it report that
they hit LOTS_OF_VIAGRA.

Likewise, things like LOTTO_AGENT can reveal that someone communicated
with a claims manager. I've explained both sides, and as long as it's a
non-zero chance, they aren't having it. It isn't even that there's a
risk of leaking anything -- the fact that anything at all is sent could
be used as justification for a pain-in-the-ass investigation that nobody
wants.


>> From their perspective, it's just simpler to say no: it's not worth the
>> time or effort to even think about if there's a minute chance of it
>> coming back to bite them legally.
> 
> I will take a look at "claims manager" in the 419 rules.
> 

I appreciate it, thanks.

Re: Claims manager / LOTTO_AGENT

Posted by John Hardin <jh...@impsec.org>.
On Wed, 7 Nov 2012, Michael Orlitzky wrote:

> On 11/07/2012 10:36 PM, darxus@chaosreigns.com wrote:
>> On 11/07, Michael Orlitzky wrote:
>>> This is my fault, of course, but I'm not allowed to mass-check this
>>> stuff. It's ongoing legal correspondence.
>>
>> Er, what?  You're not allowed to provide a list of which rules hit each
>> of your emails?  Or you're not allowed to run a program on your emails
>> that isn't spamassassin?  Or did I just not put "This does not require
>> sending us your email" in bold enough times on the masscheck page?
>
> This is a client of ours (a law firm) and not the company that I work
> for. *I* know there's probably nothing sensitive in there, but just to
> cover my ass I'd need to get permission to send the results off-site.

Only the list of rules which hit is publicly visible, the actual content 
of the message is not. Any leakage of confidential information is very 
unlikely.

> From their perspective, it's just simpler to say no: it's not worth the
> time or effort to even think about if there's a minute chance of it
> coming back to bite them legally.

I will take a look at "claims manager" in the 419 rules.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...the good of having the government prohibited from doing harm
   far outweighs the harm of having it obstructed from doing good.
                                                    -- Mike@mike-istan
-----------------------------------------------------------------------
  3 days until Veterans Day

Re: Claims manager / LOTTO_AGENT

Posted by Michael Orlitzky <mi...@orlitzky.com>.
On 11/07/2012 10:36 PM, darxus@chaosreigns.com wrote:
> On 11/07, Michael Orlitzky wrote:
>> Sorry, I was a little rude. But saying that she shouldn't put her job
>> title anywhere in an email, ever, is ridiculous. 
> 
> Certainly.
> 
>> The inputs (spam, ham)
>> to the classifier are assumed god-given; and the classification needs to
>> reflect the data, not the other way around.
> 
> If "the classifier" is spamassassin, and "The inputs" are the spam
> and ham data provided via masscheck, then... the scores provided via
> sa-update *do* reflect the data.  So I'm not sure what you mean.
> 
> The ideal rule scores are chosen to cause one false positive (ham flagged
> as spam) in every 2,500 hams, while maximizing the number of spams
> correctly flagged as spams.  With so few hams hitting this rule in the
> masscheck corpora, we're way below that threshold based on the data we
> have.
> 

I wrote that before I saw your clarification, sorry again for coming off
as a jerk. Ignore it.


>> This is my fault, of course, but I'm not allowed to mass-check this
>> stuff. It's ongoing legal correspondence.
> 
> Er, what?  You're not allowed to provide a list of which rules hit each
> of your emails?  Or you're not allowed to run a program on your emails
> that isn't spamassassin?  Or did I just not put "This does not require
> sending us your email" in bold enough times on the masscheck page?
> 

This is a client of ours (a law firm) and not the company that I work
for. *I* know there's probably nothing sensitive in there, but just to
cover my ass I'd need to get permission to send the results off-site.
>From their perspective, it's just simpler to say no: it's not worth the
time or effort to even think about if there's a minute chance of it
coming back to bite them legally.

Re: Claims manager / LOTTO_AGENT

Posted by da...@chaosreigns.com.
On 11/07, Michael Orlitzky wrote:
> Sorry, I was a little rude. But saying that she shouldn't put her job
> title anywhere in an email, ever, is ridiculous. 

Certainly.

> The inputs (spam, ham)
> to the classifier are assumed god-given; and the classification needs to
> reflect the data, not the other way around.

If "the classifier" is spamassassin, and "The inputs" are the spam
and ham data provided via masscheck, then... the scores provided via
sa-update *do* reflect the data.  So I'm not sure what you mean.

The ideal rule scores are chosen to cause one false positive (ham flagged
as spam) in every 2,500 hams, while maximizing the number of spams
correctly flagged as spams.  With so few hams hitting this rule in the
masscheck corpora, we're way below that threshold based on the data we
have.

> This is my fault, of course, but I'm not allowed to mass-check this
> stuff. It's ongoing legal correspondence.

Er, what?  You're not allowed to provide a list of which rules hit each
of your emails?  Or you're not allowed to run a program on your emails
that isn't spamassassin?  Or did I just not put "This does not require
sending us your email" in bold enough times on the masscheck page?

-- 
"It's never too late to panic."
http://www.ChaosReigns.com

Re: Claims manager / LOTTO_AGENT

Posted by Michael Orlitzky <mi...@orlitzky.com>.
On 11/07/2012 10:12 PM, darxus@chaosreigns.com wrote:
> On 11/07, Michael Orlitzky wrote:
>> Yeah, well it's her job title, so...? You misunderstand statistics. The
>> data aren't wrong.
> 
> Do I?  I think it's more likely that you misunderstand what is expected of
> spamassassin rules.
> 

Sorry, I was a little rude. But saying that she shouldn't put her job
title anywhere in an email, ever, is ridiculous. The inputs (spam, ham)
to the classifier are assumed god-given; and the classification needs to
reflect the data, not the other way around.


> Somebody really should put up a page in the wiki explaining that rules all
> have false positives, and that's the entire reason we don't flag an email
> as spam for any one rule, etc..

Sure, that's why I pointed out that LOTTO_AGENT also helps trigger
ADVANCE_FEE_2_NEW_MONEY, and combined they score 7.8.


> But if you provide us with more masscheck data, we can do a better job of
> automatically calculating ideal scores.

This is my fault, of course, but I'm not allowed to mass-check this
stuff. It's ongoing legal correspondence.

Re: Claims manager / LOTTO_AGENT

Posted by da...@chaosreigns.com.
On 11/07, Michael Orlitzky wrote:
> Yeah, well it's her job title, so...? You misunderstand statistics. The
> data aren't wrong.

Do I?  I think it's more likely that you misunderstand what is expected of
spamassassin rules.

Somebody really should put up a page in the wiki explaining that rules all
have false positives, and that's the entire reason we don't flag an email
as spam for any one rule, etc..


But if you provide us with more masscheck data, we can do a better job of
automatically calculating ideal scores.

-- 
"Of course there's strength in numbers. But there's strength in sharp
weaponry too. Ironically, this lead to what we call 'civilization'."
- spore
http://www.ChaosReigns.com

Re: Claims manager / LOTTO_AGENT

Posted by Michael Orlitzky <mi...@orlitzky.com>.
On 11/07/2012 09:49 PM, darxus@chaosreigns.com wrote:
> On 11/07, Michael Orlitzky wrote:
>> So, LOTTO_AGENT will hit the string "Claims Manager" for 3.5 points.
>> This is bad news for,
>>
>>   Barbara R. Krieg, Claims...
> 
> When you put a string an an email that hits a spamassassin rule... your
> email then hits that spamassassin rule.  You should generally try to avoid
> that.
> 

Yeah, well it's her job title, so...? You misunderstand statistics. The
data aren't wrong.

Re: Claims manager / LOTTO_AGENT

Posted by da...@chaosreigns.com.
Just in case nobody has pointed you toward it before:
https://wiki.apache.org/spamassassin/NightlyMassCheck

Stats we currently have on that rule:
http://ruleqa.spamassassin.org/?daterev=20121103&rule=LOTTO_AGENT

  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
      0   0.5022   0.0011   0.998    0.74    3.50  LOTTO_AGENT  

It hits 2 of the 180,272 non-spams we have for use in optimal score
generation.  


On 11/07, Michael Orlitzky wrote:
> So, LOTTO_AGENT will hit the string "Claims Manager" for 3.5 points.
> This is bad news for,
> 
>   Barbara R. Krieg, Claims...

When you put a string an an email that hits a spamassassin rule... your
email then hits that spamassassin rule.  You should generally try to avoid
that.

-- 
"It's never too late to panic."
http://www.ChaosReigns.com