You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by jimimaseye <gr...@yahoo.com> on 2016/06/09 11:55:23 UTC

Where to find DETAIL for spamassassin default RULES

Once upon a time the include rules for spamassassin was published in its wiki
(example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
turn gave a link to an 'explanation' detail of the individual rules.

However, as you know, these wiki ages are no longer updated due to "rules
being updated nightly".  And googling an individual rule does only give
something useful as long as it appeared in the old 3.3 wiki (like in the
link above). So where does one find details of new rules?

For those of use that have no idea on the behind-the-scenes workings of
spamassassin (including the development contributions, scoring evaluations
etc), could someone give me a starter please to where I can start to look or
find a page giving detail similar to the above in order I can then lookup. 
(I assume that every rule has some form of explanation to it before it gets
committed and included).  How do I find such detail (in a readable, end-user
understandable form)?  (Links to development, discussion, commits or
whatever are fine just as long as ultimately it ends up giving the rule
detail).

(Currently it seems I am just having 'to trust' whatever scores are given to
the rules, and that the rules are pertinent to every system.  And without an
explanation of the rules, it seems a little strange that we, as admins, are
allowed to then tailor the scoring of such rules (if we wish to) even though
we have no idea what the rules are in the first place).

TIA



--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Where-to-find-DETAIL-for-spamassassin-default-RULES-tp121218.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Where to find DETAIL for spamassassin default RULES

Posted by Joe Quinn <jq...@pccc.com>.
On 6/9/2016 7:55 AM, jimimaseye wrote:
> Once upon a time the include rules for spamassassin was published in its wiki
> (example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
> turn gave a link to an 'explanation' detail of the individual rules.
>
> However, as you know, these wiki ages are no longer updated due to "rules
> being updated nightly".  And googling an individual rule does only give
> something useful as long as it appeared in the old 3.3 wiki (like in the
> link above). So where does one find details of new rules?
>
> For those of use that have no idea on the behind-the-scenes workings of
> spamassassin (including the development contributions, scoring evaluations
> etc), could someone give me a starter please to where I can start to look or
> find a page giving detail similar to the above in order I can then lookup.
> (I assume that every rule has some form of explanation to it before it gets
> committed and included).  How do I find such detail (in a readable, end-user
> understandable form)?  (Links to development, discussion, commits or
> whatever are fine just as long as ultimately it ends up giving the rule
> detail).
>
> (Currently it seems I am just having 'to trust' whatever scores are given to
> the rules, and that the rules are pertinent to every system.  And without an
> explanation of the rules, it seems a little strange that we, as admins, are
> allowed to then tailor the scoring of such rules (if we wish to) even though
> we have no idea what the rules are in the first place).
>
> TIA
>
>
>
> --
> View this message in context: http://spamassassin.1065346.n5.nabble.com/Where-to-find-DETAIL-for-spamassassin-default-RULES-tp121218.html
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
I have a bookmark in Firefox that points to 
http://ruleqa.spamassassin.org/?rule=%s&srcpath=&g=Change which is the 
status page for the nightly rule updates and is likely what you are 
looking for.

I give it a keyword too, so I can type "ruleqa RULENAME" and it will 
replace the "%s" with whatever I type.

Re: Where to find DETAIL for spamassassin default RULES

Posted by Groach <gr...@yahoo.com>.
On 12/06/2016 21:14, Bill Cole wrote:
> but can you explain why the world needs yet another new mail server 
> implementation?
>
> As an example of why I ask this, consider that Microsoft rewrote the 
> SMTP implementation in Exchange 2013 and did it wrong,

A question and answer all in one.  I like it.

(p.s no one ever said anything about my project being a NEW mail server)

Re: Where to find DETAIL for spamassassin default RULES

Posted by Groach <gr...@yahoo.com>.
On 12/06/2016 21:14, Bill Cole wrote:
>
> I was not at all confused, but sometimes when people are Wrong On The 
> Internet in special ways I cannot resist the urge to respond with a 
> paraphrased geek meme...
>
> Look up Jamie Zawinski's famous "2 problems" quote regarding regular 
> expressions. It is a perfect fit for the application of regular 
> expressions to address validation
>
>> It is actually for another software project (a mail server)
>
> Please don't take this as derogatory, because I DO NOT mean it to be, 
> but can you explain why the world needs yet another new mail server 
> implementation?
>
> As an example of why I ask this, consider that Microsoft rewrote the 
> SMTP implementation in Exchange 2013 and did it wrong, breaking 
> multi-recipient message handling. I guess they had some reason, but 
> the point is that new code means new bugs, even when you have an 
> elaborate QA organization in place to prevent that.
>
>> that, being a mail server, must ensure email addresses are valid.
>
> Not really. It needs to make sure that it never generates invalid 
> addresses and it probably should check addresses in its inputs for 
> types of invalidity that your later code will assume not to be 
> present, but those are both far from a need to validate addresses 
> perfectly (or even near-perfectly) to the RFC specification. Having a 
> logical set of addresses that you'd never generate but will still 
> blindly and harmlessly work with, some of which may not fit the RFC 
> specs, is a NON-PROBLEM.
>
> Even if you wanted to draw a RFC-perfect boundary between valid and 
> invalid addresses, complex regular expressions are a poor tool for 
> that because the logic of REs don't align to that of the ABNF used in 
> RFCs. A single regular expression CANNOT precisely match the whole  
> RFC822/2822/5322 address space. The closest approximation in Perl RE 
> is huge, indecipherable, and machine-generated. It also cannot deal 
> with nested comments, a valid albeit pathological address structure 
> under the ABNF definition. In POSIX RE the problems are MUCH worse.
>
> On the other hand, you COULD use very simple REs to serially and 
> recursively decompose addresses into the constructs defined by the 
> ABNF spec, using the same logic as the spec to validate addresses. 
> This is not as interesting a "problem" as writing the One True RFC822 
> RE, but it is a fairly trivial coding exercise and would run more 
> efficiently than a single RE with the benefit of being more readable 
> and debuggable.
>
>> I quoted the regexp in context of showing my point about how 
>> 'squiggly' they can be and that I am able to read them.....to a 
>> point. (I was proud because 'googling' around for a regex email 
>> address validator string shows some VERY suspicious and 
>> extortionately,seemingly unnecessarily, long offerings. So I had a go 
>> myself).
>
> And just like a hilariously long list of predecessors, came up with a 
> RE which fails to precisely reproduce the ABNF definition of a valid 
> address for message headers. This is why you now have 2 problems:
>
> 1. The one you invented of needing to precisely validate email 
> addresses to a RFC specification that is not a perfect match for the 
> addressing supported by any coherent package of production-grade mail 
> software.
>
> 2. A regular expression that is absurdly complex which you incorrectly 
> believe solves (1) while in fact it does not. It is maybe good enough, 
> but maybe not. It's an untestable approximation of its design goal, 
> which is an intrinsic problem for software.




.......AND relax!

Re: Where to find DETAIL for spamassassin default RULES

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 11 Jun 2016, at 4:21, Groach wrote:

> On 11/06/2016 05:09, Bill Cole wrote:
>> So, you thought validating email addresses was a problem demanding a 
>> solution? And you "solved" it with a regular expression?
>>
>> Congratulations on now having 2 problems. They should be very happy 
>> together.
>
> The regex I quoted was out of context to the problem and completely 
> unrelated (sorry if you feel so confused with that).

I was not at all confused, but sometimes when people are Wrong On The 
Internet in special ways I cannot resist the urge to respond with a 
paraphrased geek meme...

Look up Jamie Zawinski's famous "2 problems" quote regarding regular 
expressions. It is a perfect fit for the application of regular 
expressions to address validation

> It is actually for another software project (a mail server)

Please don't take this as derogatory, because I DO NOT mean it to be, 
but can you explain why the world needs yet another new mail server 
implementation?

As an example of why I ask this, consider that Microsoft rewrote the 
SMTP implementation in Exchange 2013 and did it wrong, breaking 
multi-recipient message handling. I guess they had some reason, but the 
point is that new code means new bugs, even when you have an elaborate 
QA organization in place to prevent that.

> that, being a mail server, must ensure email addresses are valid.

Not really. It needs to make sure that it never generates invalid 
addresses and it probably should check addresses in its inputs for types 
of invalidity that your later code will assume not to be present, but 
those are both far from a need to validate addresses perfectly (or even 
near-perfectly) to the RFC specification. Having a logical set of 
addresses that you'd never generate but will still blindly and 
harmlessly work with, some of which may not fit the RFC specs, is a 
NON-PROBLEM.

Even if you wanted to draw a RFC-perfect boundary between valid and 
invalid addresses, complex regular expressions are a poor tool for that 
because the logic of REs don't align to that of the ABNF used in RFCs. A 
single regular expression CANNOT precisely match the whole  
RFC822/2822/5322 address space. The closest approximation in Perl RE is 
huge, indecipherable, and machine-generated. It also cannot deal with 
nested comments, a valid albeit pathological address structure under the 
ABNF definition. In POSIX RE the problems are MUCH worse.

On the other hand, you COULD use very simple REs to serially and 
recursively decompose addresses into the constructs defined by the ABNF 
spec, using the same logic as the spec to validate addresses. This is 
not as interesting a "problem" as writing the One True RFC822 RE, but it 
is a fairly trivial coding exercise and would run more efficiently than 
a single RE with the benefit of being more readable and debuggable.

> I quoted the regexp in context of showing my point about how 
> 'squiggly' they can be and that I am able to read them.....to a point. 
> (I was proud because 'googling' around for a regex email address 
> validator string shows some VERY suspicious and 
> extortionately,seemingly unnecessarily, long offerings. So I had a go 
> myself).

And just like a hilariously long list of predecessors, came up with a RE 
which fails to precisely reproduce the ABNF definition of a valid 
address for message headers. This is why you now have 2 problems:

1. The one you invented of needing to precisely validate email addresses 
to a RFC specification that is not a perfect match for the addressing 
supported by any coherent package of production-grade mail software.

2. A regular expression that is absurdly complex which you incorrectly 
believe solves (1) while in fact it does not. It is maybe good enough, 
but maybe not. It's an untestable approximation of its design goal, 
which is an intrinsic problem for software.

Re: Where to find DETAIL for spamassassin default RULES

Posted by Groach <gr...@yahoo.com>.
On 11/06/2016 05:09, Bill Cole wrote:
> So, you thought validating email addresses was a problem demanding a 
> solution? And you "solved" it with a regular expression?
>
> Congratulations on now having 2 problems. They should be very happy 
> together.

The regex I quoted was out of context to the problem and completely 
unrelated (sorry if you feel so confused with that).  It is actually for 
another software project (a mail server) that, being a mail server, must 
ensure email addresses are valid.  I quoted the regexp in context of 
showing my point about how 'squiggly' they can be and that I am able to 
read them.....to a point. (I was proud because 'googling' around for a 
regex email address validator string shows some VERY suspicious and 
extortionately,seemingly unnecessarily, long offerings. So I had a go 
myself).

Re: Where to find DETAIL for spamassassin default RULES

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 10 Jun 2016, at 3:09, jimimaseye wrote:

> REGEXP:  I dont mind having a go at reading them (I have written some
> myself) but, as you know, even though some are easy and obvious 
> sometimes it
> can be like reading music - a blur of blobs, dots and squiggles that 
> take a
> lot of deciphering.  Of course, many of them rely on 'functionality' 
> of the
> plugins (which I cant say I would fully understand) and the 
> understanding of
> a RULE structure (some are easy and obvious, some are very 
> convoluted).
>
> (I recently developed this one from scratch:  Its an RFC2822 email 
> address
> validator:
> ^(?=.{1,64}@)("[^<>@\\]+"|(?!\.|.*\.(\.|@))[^<>
> @\\"]+)@(\[(\d{1,3}\.){3}\d{1,3}\]|\[IPv6:(?:[A-Fa-f\d]{1,4}:){7}[A-Fa-f\d]{1,4}\]|(?=.{1,255}$)((?!-|\.|\d+($|\.))[a-zA-Z\d-]{0,62}[a-zA-Z\d])(|\.(?!-|\.|\d+($|\.))[a-zA-Z\d-]{0,62}[a-zA-Z\d]){1,126})$
>
> Very proud of it too.  )


So, you thought validating email addresses was a problem demanding a 
solution? And you "solved" it with a regular expression?

Congratulations on now having 2 problems. They should be very happy 
together.

Re: Where to find DETAIL for spamassassin default RULES

Posted by jimimaseye <gr...@yahoo.com>.
Thanks for the replies guys

So in essence, there is no user friendly method as there were before.

On 09/06/2016 14:19, Joe Quinn wrote:
> I have a bookmark in Firefox that points to
> http://ruleqa.spamassassin.org/?rule=%s&srcpath=&g=Change which is the
> status page for the nightly rule updates and is likely what you are
> looking for.
>
> I give it a keyword too, so I can type "ruleqa RULENAME" and it will
> replace the "%s" with whatever I type. 

As for looking up and search those nightly listings, its true I can find an
individual rule, but then I cant exactly see how to drill in to it and see
its expression or detail - I can only see a load of links showing how
effective it is in tests (its not really what I was looking for).  Am I
missing something?


REGEXP:  I dont mind having a go at reading them (I have written some
myself) but, as you know, even though some are easy and obvious sometimes it
can be like reading music - a blur of blobs, dots and squiggles that take a
lot of deciphering.  Of course, many of them rely on 'functionality' of the
plugins (which I cant say I would fully understand) and the understanding of
a RULE structure (some are easy and obvious, some are very convoluted).

(I recently developed this one from scratch:  Its an RFC2822 email address
validator:
^(?=.{1,64}@)("[^<>@\\]+"|(?!\.|.*\.(\.|@))[^<>
@\\"]+)@(\[(\d{1,3}\.){3}\d{1,3}\]|\[IPv6:(?:[A-Fa-f\d]{1,4}:){7}[A-Fa-f\d]{1,4}\]|(?=.{1,255}$)((?!-|\.|\d+($|\.))[a-zA-Z\d-]{0,62}[a-zA-Z\d])(|\.(?!-|\.|\d+($|\.))[a-zA-Z\d-]{0,62}[a-zA-Z\d]){1,126})$

Very proud of it too.  )




--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Where-to-find-DETAIL-for-spamassassin-default-RULES-tp121218p121250.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: Where to find DETAIL for spamassassin default RULES

Posted by Bowie Bailey <Bo...@BUC.com>.
On 6/9/2016 7:55 AM, jimimaseye wrote:
> Once upon a time the include rules for spamassassin was published in its wiki
> (example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
> turn gave a link to an 'explanation' detail of the individual rules.
>
> However, as you know, these wiki ages are no longer updated due to "rules
> being updated nightly".  And googling an individual rule does only give
> something useful as long as it appeared in the old 3.3 wiki (like in the
> link above). So where does one find details of new rules?
>
> For those of use that have no idea on the behind-the-scenes workings of
> spamassassin (including the development contributions, scoring evaluations
> etc), could someone give me a starter please to where I can start to look or
> find a page giving detail similar to the above in order I can then lookup.
> (I assume that every rule has some form of explanation to it before it gets
> committed and included).  How do I find such detail (in a readable, end-user
> understandable form)?  (Links to development, discussion, commits or
> whatever are fine just as long as ultimately it ends up giving the rule
> detail).
>
> (Currently it seems I am just having 'to trust' whatever scores are given to
> the rules, and that the rules are pertinent to every system.  And without an
> explanation of the rules, it seems a little strange that we, as admins, are
> allowed to then tailor the scoring of such rules (if we wish to) even though
> we have no idea what the rules are in the first place).

The best place to find detail about the rules is in the rule files 
themselves.  You can get the description, scores, and actual regex for 
the rule.  Of course, you do need to be able to read the regex...

For example:

$ cd /var/lib/spamassassin/3.004001/updates_spamassassin_org
$ grep ONLINE_PHARMACY *
20_drugs.cf:body ONLINE_PHARMACY                /\bonline 
pharmacy|\b(?:drugs|medications) online/i
20_drugs.cf:describe ONLINE_PHARMACY    Online Pharmacy
50_scores.cf:score ONLINE_PHARMACY 0.843 2.371 0.008 0.650

Your rule files may be in a slightly different place depending on your 
OS and how you installed.

-- 
Bowie

Re: Where to find DETAIL for spamassassin default RULES

Posted by Reindl Harald <h....@thelounge.net>.

Am 09.06.2016 um 13:55 schrieb jimimaseye:
> Once upon a time the include rules for spamassassin was published in its wiki
> (example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
> turn gave a link to an 'explanation' detail of the individual rules.
>
> However, as you know, these wiki ages are no longer updated due to "rules
> being updated nightly".  And googling an individual rule does only give
> something useful as long as it appeared in the old 3.3 wiki (like in the
> link above). So where does one find details of new rules?

frankly i would be happy if just rules without a description would be 
refused to publish so that you can at least gasp what the reason was to 
hit in a report-header