You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by Michael Monnerie <li...@is.it-management.at> on 2012/08/21 18:12:10 UTC

crazy rule FROM_12LTRDOM

I had this rule in a HAM:

 3.7 FROM_12LTRDOM          From a 12-letter domain

which caused it to be identified as SPAM. Who ever wrote that rule? 
There may have been a spammer who used 12 letter domains, but causing 
EVERY 12 letter domain to be identified by 3.7 points is... too much.

Maybe the mass-check script "learned" that this rule should have 3.7, or 
who decided to have 3.7? If it was mass-check, there should be a way to 
limit points for a rule to a realistic range by human. For example, to 
define that FROM_12LTRDOM could have from 0.1 to 1.5 points, so we don't 
cause nearly every 12 letter domain mail to be spam.

I hope this is the right list to ask this?

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

Re: crazy rule FROM_12LTRDOM

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 8/24/2012 8:33 AM, Daniel McDonald wrote:
> No, auto-reporting is*bad*.  Masschecks are done on human-verified ham and
> spam.  If something is misclassified by SpamAssassin as spam, and then the
> rules are tuned to classify more things that look like that as spam, you end
> up with a cascade of FPs.  Masschecks try to break that cycle by using
> humans.
>
> Also, ham is just a valuable (perhaps more so) as spam.
But centralized concepts are good.  One of the reasons ruleqa is not 
just masschecks is that it really encompasses everything we do for 
rules.  We need to expand on data users can provide to continue 
improving rules.

regards,
KAM

Re: crazy rule FROM_12LTRDOM

Posted by John Hardin <jh...@impsec.org>.
On Fri, 24 Aug 2012, Daniel McDonald wrote:

> On 8/24/12 5:58 AM, "Michael Monnerie"
> <li...@is.it-management.at> wrote:
>
>> Am Mittwoch, 22. August 2012, 07:34:23 schrieb Kevin A. McGrail:
>>> Yup, that's exactly why we need more masscheck data, and we all know
>>> it.  Basing most SA scores on fifteen people's email who are not
>>> representative of all email users (we're all computer geeks who mostly
>>> speak English) is *bad*.  Please fix it
>>
>> Make an autoreporter in SA, sending e-mails with >15 or so points to a
>> central report address. Must be enabled, user must subscribe to get an
>> id, that id is reportet with the mail. You'll get a lot of reports if
>> subscription is made easy.
>
> No, auto-reporting is *bad*.
>
> Also, ham is just a valuable (perhaps more so) as spam.

Lack of ham is what caused the 12LTRDOM rules to get such high generated 
scores in the first place...

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The social contract exists so that everyone doesn't have to squat
   in the dust holding a spear to protect his woman and his meat all
   day every day. It does not exist so that the government can take
   your spear, your meat, and your woman because it knows better what
   to do with them.                           -- Dagny @ Ace of Spades
-----------------------------------------------------------------------
  Today: the 1933rd anniversary of the destruction of Pompeii

Re: crazy rule FROM_12LTRDOM

Posted by Daniel McDonald <da...@austinenergy.com>.
On 8/24/12 5:58 AM, "Michael Monnerie"
<li...@is.it-management.at> wrote:

> Am Mittwoch, 22. August 2012, 07:34:23 schrieb Kevin A. McGrail:
>> Yup, that's exactly why we need more masscheck data, and we all know
>> it.  Basing most SA scores on fifteen people's email who are not
>> representative of all email users (we're all computer geeks who mostly
>> speak English) is *bad*.  Please fix it
> 
> Make an autoreporter in SA, sending e-mails with >15 or so points to a
> central report address. Must be enabled, user must subscribe to get an
> id, that id is reportet with the mail. You'll get a lot of reports if
> subscription is made easy.

No, auto-reporting is *bad*.  Masschecks are done on human-verified ham and
spam.  If something is misclassified by SpamAssassin as spam, and then the
rules are tuned to classify more things that look like that as spam, you end
up with a cascade of FPs.  Masschecks try to break that cycle by using
humans.

Also, ham is just a valuable (perhaps more so) as spam.


-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281


Re: crazy rule FROM_12LTRDOM

Posted by Michael Monnerie <li...@is.it-management.at>.
Am Mittwoch, 22. August 2012, 07:34:23 schrieb Kevin A. McGrail:
> What is the score line in 72_active.cf that you show for
> FROM_12LTRDOM?
> 
> 3.004000/updates_spamassassin_org/72_scores.cf:score
> FROM_12LTRDOM                         0.099 0.098 0.099 0.098

Yes, that's what's there now. It was crazy high (3.7), hope it stays 
down now.

And darxus answered to the other list:
> As to what I believe is your objection to the existence of the rule: 
> I understand, I sympathize.  There are rules against doing other
> entirely valid things that I do, for example not having a "real name"
> in my From address.  But these rules exist because they might
> correspond highly to spam, and are very carefully automatically scored
> accordingly (when we have enough data).  It's a complicated subject.

True. Those 0.1 points for 12 letter domains will at least fix *my* 
problems, don't know for others, though.

> Yup, that's exactly why we need more masscheck data, and we all know
> it.  Basing most SA scores on fifteen people's email who are not
> representative of all email users (we're all computer geeks who mostly
> speak English) is *bad*.  Please fix it 

Make an autoreporter in SA, sending e-mails with >15 or so points to a 
central report address. Must be enabled, user must subscribe to get an 
id, that id is reportet with the mail. You'll get a lot of reports if 
subscription is made easy.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

Re: crazy rule FROM_12LTRDOM

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
Moving to Dev:

On 8/22/2012 1:18 AM, Michael Monnerie wrote:
>> Current score on this rule is 0.098. Have you updated your rules
>> >recently?
> Yes, thank you. I just worked up some false positive reports and found
> that FROM_12LTRDOM hit some of our customers, who could not receive from
> their customers anymore, as that was all wrongly classified as spam
> suddenly.
What is the score line in 72_active.cf that you show for FROM_12LTRDOM?

I show:

3.004000/updates_spamassassin_org/72_scores.cf:score 
FROM_12LTRDOM                         0.099 0.098 0.099 0.098

Regards,
KAM

Re: crazy rule FROM_12LTRDOM

Posted by da...@chaosreigns.com.
On 08/22, Michael Monnerie wrote:
> As a last statement I'd like to say that I found the reasoning for the 
> FROM_12LTRDOM strange: It's obviously harsh to punish such domains. If 

I haven't read through all the relevant discussion, but I suspect everyone
involved mostly agrees with you.  I think the creation of a significantly
large score for it was the result of insufficient masscheck data, a
situation which substantially improved about the time this list was
created.  We can always use more data exactly to avoid problems like this.

As to what I believe is your objection to the existence of the rule:  I
understand, I sympathize.  There are rules against doing other entirely
valid things that *I* do, for example not having a "real name" in my From
address.  But these rules exist because they might correspond highly to
spam, and are very carefully automatically scored accordingly (when we have
enough data).  It's a complicated subject.

> you'd own such a domain and are a seller, and suddenly 70% of your 
> customers don't get your mails anymore because they are spam filtered, 
> you are in deep trouble. And you can't do anything, as all customers 
> have their own SA installation. Just because the small mass-check team 
> has good results doesn't make it better for any owner of a 12 letter 
> domain.

Yup, that's exactly why we need more masscheck data, and we all know
it.  Basing most SA scores on fifteen people's email who are *not*
representative of all email users (we're all computer geeks who mostly
speak English) is *bad*.  Please fix it :)

> Please don't answer to this on this list, just PM if you feel so.

Heh.

-- 
"Go forth, and be excellent to one another." - http://www.jhuger.com/fredski.php
http://www.ChaosReigns.com

Re: crazy rule FROM_12LTRDOM

Posted by Michael Monnerie <li...@is.it-management.at>.
Am Dienstag, 21. August 2012, 12:27:11 schrieb darxus@chaosreigns.com:
> This was recently discussed on the users list:
> http://www.gossamer-threads.com/lists/spamassassin/users/173721

Thank you. I didn't find that as I searched for FROM_12LTRDOM but that 
wasn't mentioned in that mails.

> I'd still like to try to keep the traffic on this list to a minimum to
> avoid masscheck contributors unsubscribing due to noise, so we can
> contact them when something actually specific to masschecking comes
> up.

OK, from the discussion on @users I found the discussion on @dev. Looks 
like @dev is the list to discuss rule things.

As a last statement I'd like to say that I found the reasoning for the 
FROM_12LTRDOM strange: It's obviously harsh to punish such domains. If 
you'd own such a domain and are a seller, and suddenly 70% of your 
customers don't get your mails anymore because they are spam filtered, 
you are in deep trouble. And you can't do anything, as all customers 
have their own SA installation. Just because the small mass-check team 
has good results doesn't make it better for any owner of a 12 letter 
domain.

Please don't answer to this on this list, just PM if you feel so.

> Current score on this rule is 0.098. Have you updated your rules
> recently?

Yes, thank you. I just worked up some false positive reports and found 
that FROM_12LTRDOM hit some of our customers, who could not receive from 
their customers anymore, as that was all wrongly classified as spam 
suddenly.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

Re: crazy rule FROM_12LTRDOM

Posted by da...@chaosreigns.com.
This was recently discussed on the users list:
http://www.gossamer-threads.com/lists/spamassassin/users/173721

I'd still like to try to keep the traffic on this list to a minimum to
avoid masscheck contributors unsubscribing due to noise, so we can contact
them when something actually specific to masschecking comes up.

On 08/21, Michael Monnerie wrote:
> I had this rule in a HAM:
> 
>  3.7 FROM_12LTRDOM          From a 12-letter domain
> 
> which caused it to be identified as SPAM. Who ever wrote that rule? 
> There may have been a spammer who used 12 letter domains, but causing 
> EVERY 12 letter domain to be identified by 3.7 points is... too much.
> 
> Maybe the mass-check script "learned" that this rule should have 3.7, or 
> who decided to have 3.7? If it was mass-check, there should be a way to 
> limit points for a rule to a realistic range by human. For example, to 
> define that FROM_12LTRDOM could have from 0.1 to 1.5 points, so we don't 
> cause nearly every 12 letter domain mail to be spam.
> 
> I hope this is the right list to ask this?
> 
> -- 
> mit freundlichen Grüssen,
> Michael Monnerie, Ing. BSc
> 
> it-management Internet Services: Protéger
> http://proteger.at [gesprochen: Prot-e-schee]
> Tel: +43 660 / 415 6531



-- 
"Whom God wishes to destroy, he first makes mad."
- Euripides (c.480 - 406 BC).
http://www.ChaosReigns.com

Re: crazy rule FROM_12LTRDOM

Posted by Bowie Bailey <Bo...@BUC.com>.
On 8/21/2012 12:12 PM, Michael Monnerie wrote:
> I had this rule in a HAM:
>
>   3.7 FROM_12LTRDOM          From a 12-letter domain
>
> which caused it to be identified as SPAM. Who ever wrote that rule?
> There may have been a spammer who used 12 letter domains, but causing
> EVERY 12 letter domain to be identified by 3.7 points is... too much.
>
> Maybe the mass-check script "learned" that this rule should have 3.7, or
> who decided to have 3.7? If it was mass-check, there should be a way to
> limit points for a rule to a realistic range by human. For example, to
> define that FROM_12LTRDOM could have from 0.1 to 1.5 points, so we don't
> cause nearly every 12 letter domain mail to be spam.
>
> I hope this is the right list to ask this?

Current score on this rule is 0.098.  Have you updated your rules recently?

-- 
Bowie