You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by jimimaseye <gr...@yahoo.com> on 2016/06/09 11:55:23 UTC
Where to find DETAIL for spamassassin default RULES
Once upon a time the include rules for spamassassin was published in its wiki
(example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
turn gave a link to an 'explanation' detail of the individual rules.
However, as you know, these wiki ages are no longer updated due to "rules
being updated nightly". And googling an individual rule does only give
something useful as long as it appeared in the old 3.3 wiki (like in the
link above). So where does one find details of new rules?
For those of use that have no idea on the behind-the-scenes workings of
spamassassin (including the development contributions, scoring evaluations
etc), could someone give me a starter please to where I can start to look or
find a page giving detail similar to the above in order I can then lookup.
(I assume that every rule has some form of explanation to it before it gets
committed and included). How do I find such detail (in a readable, end-user
understandable form)? (Links to development, discussion, commits or
whatever are fine just as long as ultimately it ends up giving the rule
detail).
(Currently it seems I am just having 'to trust' whatever scores are given to
the rules, and that the rules are pertinent to every system. And without an
explanation of the rules, it seems a little strange that we, as admins, are
allowed to then tailor the scoring of such rules (if we wish to) even though
we have no idea what the rules are in the first place).
TIA
--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Where-to-find-DETAIL-for-spamassassin-default-RULES-tp121218.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Where to find DETAIL for spamassassin default RULES
Posted by Joe Quinn <jq...@pccc.com>.
On 6/9/2016 7:55 AM, jimimaseye wrote:
> Once upon a time the include rules for spamassassin was published in its wiki
> (example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
> turn gave a link to an 'explanation' detail of the individual rules.
>
> However, as you know, these wiki ages are no longer updated due to "rules
> being updated nightly". And googling an individual rule does only give
> something useful as long as it appeared in the old 3.3 wiki (like in the
> link above). So where does one find details of new rules?
>
> For those of use that have no idea on the behind-the-scenes workings of
> spamassassin (including the development contributions, scoring evaluations
> etc), could someone give me a starter please to where I can start to look or
> find a page giving detail similar to the above in order I can then lookup.
> (I assume that every rule has some form of explanation to it before it gets
> committed and included). How do I find such detail (in a readable, end-user
> understandable form)? (Links to development, discussion, commits or
> whatever are fine just as long as ultimately it ends up giving the rule
> detail).
>
> (Currently it seems I am just having 'to trust' whatever scores are given to
> the rules, and that the rules are pertinent to every system. And without an
> explanation of the rules, it seems a little strange that we, as admins, are
> allowed to then tailor the scoring of such rules (if we wish to) even though
> we have no idea what the rules are in the first place).
>
> TIA
>
>
>
> --
> View this message in context: http://spamassassin.1065346.n5.nabble.com/Where-to-find-DETAIL-for-spamassassin-default-RULES-tp121218.html
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
I have a bookmark in Firefox that points to
http://ruleqa.spamassassin.org/?rule=%s&srcpath=&g=Change which is the
status page for the nightly rule updates and is likely what you are
looking for.
I give it a keyword too, so I can type "ruleqa RULENAME" and it will
replace the "%s" with whatever I type.
Re: Where to find DETAIL for spamassassin default RULES
Posted by Groach <gr...@yahoo.com>.
On 12/06/2016 21:14, Bill Cole wrote:
> but can you explain why the world needs yet another new mail server
> implementation?
>
> As an example of why I ask this, consider that Microsoft rewrote the
> SMTP implementation in Exchange 2013 and did it wrong,
A question and answer all in one. I like it.
(p.s no one ever said anything about my project being a NEW mail server)
Re: Where to find DETAIL for spamassassin default RULES
Posted by Groach <gr...@yahoo.com>.
On 12/06/2016 21:14, Bill Cole wrote:
>
> I was not at all confused, but sometimes when people are Wrong On The
> Internet in special ways I cannot resist the urge to respond with a
> paraphrased geek meme...
>
> Look up Jamie Zawinski's famous "2 problems" quote regarding regular
> expressions. It is a perfect fit for the application of regular
> expressions to address validation
>
>> It is actually for another software project (a mail server)
>
> Please don't take this as derogatory, because I DO NOT mean it to be,
> but can you explain why the world needs yet another new mail server
> implementation?
>
> As an example of why I ask this, consider that Microsoft rewrote the
> SMTP implementation in Exchange 2013 and did it wrong, breaking
> multi-recipient message handling. I guess they had some reason, but
> the point is that new code means new bugs, even when you have an
> elaborate QA organization in place to prevent that.
>
>> that, being a mail server, must ensure email addresses are valid.
>
> Not really. It needs to make sure that it never generates invalid
> addresses and it probably should check addresses in its inputs for
> types of invalidity that your later code will assume not to be
> present, but those are both far from a need to validate addresses
> perfectly (or even near-perfectly) to the RFC specification. Having a
> logical set of addresses that you'd never generate but will still
> blindly and harmlessly work with, some of which may not fit the RFC
> specs, is a NON-PROBLEM.
>
> Even if you wanted to draw a RFC-perfect boundary between valid and
> invalid addresses, complex regular expressions are a poor tool for
> that because the logic of REs don't align to that of the ABNF used in
> RFCs. A single regular expression CANNOT precisely match the whole
> RFC822/2822/5322 address space. The closest approximation in Perl RE
> is huge, indecipherable, and machine-generated. It also cannot deal
> with nested comments, a valid albeit pathological address structure
> under the ABNF definition. In POSIX RE the problems are MUCH worse.
>
> On the other hand, you COULD use very simple REs to serially and
> recursively decompose addresses into the constructs defined by the
> ABNF spec, using the same logic as the spec to validate addresses.
> This is not as interesting a "problem" as writing the One True RFC822
> RE, but it is a fairly trivial coding exercise and would run more
> efficiently than a single RE with the benefit of being more readable
> and debuggable.
>
>> I quoted the regexp in context of showing my point about how
>> 'squiggly' they can be and that I am able to read them.....to a
>> point. (I was proud because 'googling' around for a regex email
>> address validator string shows some VERY suspicious and
>> extortionately,seemingly unnecessarily, long offerings. So I had a go
>> myself).
>
> And just like a hilariously long list of predecessors, came up with a
> RE which fails to precisely reproduce the ABNF definition of a valid
> address for message headers. This is why you now have 2 problems:
>
> 1. The one you invented of needing to precisely validate email
> addresses to a RFC specification that is not a perfect match for the
> addressing supported by any coherent package of production-grade mail
> software.
>
> 2. A regular expression that is absurdly complex which you incorrectly
> believe solves (1) while in fact it does not. It is maybe good enough,
> but maybe not. It's an untestable approximation of its design goal,
> which is an intrinsic problem for software.
.......AND relax!
Re: Where to find DETAIL for spamassassin default RULES
Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 11 Jun 2016, at 4:21, Groach wrote:
> On 11/06/2016 05:09, Bill Cole wrote:
>> So, you thought validating email addresses was a problem demanding a
>> solution? And you "solved" it with a regular expression?
>>
>> Congratulations on now having 2 problems. They should be very happy
>> together.
>
> The regex I quoted was out of context to the problem and completely
> unrelated (sorry if you feel so confused with that).
I was not at all confused, but sometimes when people are Wrong On The
Internet in special ways I cannot resist the urge to respond with a
paraphrased geek meme...
Look up Jamie Zawinski's famous "2 problems" quote regarding regular
expressions. It is a perfect fit for the application of regular
expressions to address validation
> It is actually for another software project (a mail server)
Please don't take this as derogatory, because I DO NOT mean it to be,
but can you explain why the world needs yet another new mail server
implementation?
As an example of why I ask this, consider that Microsoft rewrote the
SMTP implementation in Exchange 2013 and did it wrong, breaking
multi-recipient message handling. I guess they had some reason, but the
point is that new code means new bugs, even when you have an elaborate
QA organization in place to prevent that.
> that, being a mail server, must ensure email addresses are valid.
Not really. It needs to make sure that it never generates invalid
addresses and it probably should check addresses in its inputs for types
of invalidity that your later code will assume not to be present, but
those are both far from a need to validate addresses perfectly (or even
near-perfectly) to the RFC specification. Having a logical set of
addresses that you'd never generate but will still blindly and
harmlessly work with, some of which may not fit the RFC specs, is a
NON-PROBLEM.
Even if you wanted to draw a RFC-perfect boundary between valid and
invalid addresses, complex regular expressions are a poor tool for that
because the logic of REs don't align to that of the ABNF used in RFCs. A
single regular expression CANNOT precisely match the whole
RFC822/2822/5322 address space. The closest approximation in Perl RE is
huge, indecipherable, and machine-generated. It also cannot deal with
nested comments, a valid albeit pathological address structure under the
ABNF definition. In POSIX RE the problems are MUCH worse.
On the other hand, you COULD use very simple REs to serially and
recursively decompose addresses into the constructs defined by the ABNF
spec, using the same logic as the spec to validate addresses. This is
not as interesting a "problem" as writing the One True RFC822 RE, but it
is a fairly trivial coding exercise and would run more efficiently than
a single RE with the benefit of being more readable and debuggable.
> I quoted the regexp in context of showing my point about how
> 'squiggly' they can be and that I am able to read them.....to a point.
> (I was proud because 'googling' around for a regex email address
> validator string shows some VERY suspicious and
> extortionately,seemingly unnecessarily, long offerings. So I had a go
> myself).
And just like a hilariously long list of predecessors, came up with a RE
which fails to precisely reproduce the ABNF definition of a valid
address for message headers. This is why you now have 2 problems:
1. The one you invented of needing to precisely validate email addresses
to a RFC specification that is not a perfect match for the addressing
supported by any coherent package of production-grade mail software.
2. A regular expression that is absurdly complex which you incorrectly
believe solves (1) while in fact it does not. It is maybe good enough,
but maybe not. It's an untestable approximation of its design goal,
which is an intrinsic problem for software.
Re: Where to find DETAIL for spamassassin default RULES
Posted by Groach <gr...@yahoo.com>.
On 11/06/2016 05:09, Bill Cole wrote:
> So, you thought validating email addresses was a problem demanding a
> solution? And you "solved" it with a regular expression?
>
> Congratulations on now having 2 problems. They should be very happy
> together.
The regex I quoted was out of context to the problem and completely
unrelated (sorry if you feel so confused with that). It is actually for
another software project (a mail server) that, being a mail server, must
ensure email addresses are valid. I quoted the regexp in context of
showing my point about how 'squiggly' they can be and that I am able to
read them.....to a point. (I was proud because 'googling' around for a
regex email address validator string shows some VERY suspicious and
extortionately,seemingly unnecessarily, long offerings. So I had a go
myself).
Re: Where to find DETAIL for spamassassin default RULES
Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 10 Jun 2016, at 3:09, jimimaseye wrote:
> REGEXP: I dont mind having a go at reading them (I have written some
> myself) but, as you know, even though some are easy and obvious
> sometimes it
> can be like reading music - a blur of blobs, dots and squiggles that
> take a
> lot of deciphering. Of course, many of them rely on 'functionality'
> of the
> plugins (which I cant say I would fully understand) and the
> understanding of
> a RULE structure (some are easy and obvious, some are very
> convoluted).
>
> (I recently developed this one from scratch: Its an RFC2822 email
> address
> validator:
> ^(?=.{1,64}@)("[^<>@\\]+"|(?!\.|.*\.(\.|@))[^<>
> @\\"]+)@(\[(\d{1,3}\.){3}\d{1,3}\]|\[IPv6:(?:[A-Fa-f\d]{1,4}:){7}[A-Fa-f\d]{1,4}\]|(?=.{1,255}$)((?!-|\.|\d+($|\.))[a-zA-Z\d-]{0,62}[a-zA-Z\d])(|\.(?!-|\.|\d+($|\.))[a-zA-Z\d-]{0,62}[a-zA-Z\d]){1,126})$
>
> Very proud of it too. )
So, you thought validating email addresses was a problem demanding a
solution? And you "solved" it with a regular expression?
Congratulations on now having 2 problems. They should be very happy
together.
Re: Where to find DETAIL for spamassassin default RULES
Posted by jimimaseye <gr...@yahoo.com>.
Thanks for the replies guys
So in essence, there is no user friendly method as there were before.
On 09/06/2016 14:19, Joe Quinn wrote:
> I have a bookmark in Firefox that points to
> http://ruleqa.spamassassin.org/?rule=%s&srcpath=&g=Change which is the
> status page for the nightly rule updates and is likely what you are
> looking for.
>
> I give it a keyword too, so I can type "ruleqa RULENAME" and it will
> replace the "%s" with whatever I type.
As for looking up and search those nightly listings, its true I can find an
individual rule, but then I cant exactly see how to drill in to it and see
its expression or detail - I can only see a load of links showing how
effective it is in tests (its not really what I was looking for). Am I
missing something?
REGEXP: I dont mind having a go at reading them (I have written some
myself) but, as you know, even though some are easy and obvious sometimes it
can be like reading music - a blur of blobs, dots and squiggles that take a
lot of deciphering. Of course, many of them rely on 'functionality' of the
plugins (which I cant say I would fully understand) and the understanding of
a RULE structure (some are easy and obvious, some are very convoluted).
(I recently developed this one from scratch: Its an RFC2822 email address
validator:
^(?=.{1,64}@)("[^<>@\\]+"|(?!\.|.*\.(\.|@))[^<>
@\\"]+)@(\[(\d{1,3}\.){3}\d{1,3}\]|\[IPv6:(?:[A-Fa-f\d]{1,4}:){7}[A-Fa-f\d]{1,4}\]|(?=.{1,255}$)((?!-|\.|\d+($|\.))[a-zA-Z\d-]{0,62}[a-zA-Z\d])(|\.(?!-|\.|\d+($|\.))[a-zA-Z\d-]{0,62}[a-zA-Z\d]){1,126})$
Very proud of it too. )
--
View this message in context: http://spamassassin.1065346.n5.nabble.com/Where-to-find-DETAIL-for-spamassassin-default-RULES-tp121218p121250.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Where to find DETAIL for spamassassin default RULES
Posted by Bowie Bailey <Bo...@BUC.com>.
On 6/9/2016 7:55 AM, jimimaseye wrote:
> Once upon a time the include rules for spamassassin was published in its wiki
> (example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
> turn gave a link to an 'explanation' detail of the individual rules.
>
> However, as you know, these wiki ages are no longer updated due to "rules
> being updated nightly". And googling an individual rule does only give
> something useful as long as it appeared in the old 3.3 wiki (like in the
> link above). So where does one find details of new rules?
>
> For those of use that have no idea on the behind-the-scenes workings of
> spamassassin (including the development contributions, scoring evaluations
> etc), could someone give me a starter please to where I can start to look or
> find a page giving detail similar to the above in order I can then lookup.
> (I assume that every rule has some form of explanation to it before it gets
> committed and included). How do I find such detail (in a readable, end-user
> understandable form)? (Links to development, discussion, commits or
> whatever are fine just as long as ultimately it ends up giving the rule
> detail).
>
> (Currently it seems I am just having 'to trust' whatever scores are given to
> the rules, and that the rules are pertinent to every system. And without an
> explanation of the rules, it seems a little strange that we, as admins, are
> allowed to then tailor the scoring of such rules (if we wish to) even though
> we have no idea what the rules are in the first place).
The best place to find detail about the rules is in the rule files
themselves. You can get the description, scores, and actual regex for
the rule. Of course, you do need to be able to read the regex...
For example:
$ cd /var/lib/spamassassin/3.004001/updates_spamassassin_org
$ grep ONLINE_PHARMACY *
20_drugs.cf:body ONLINE_PHARMACY /\bonline
pharmacy|\b(?:drugs|medications) online/i
20_drugs.cf:describe ONLINE_PHARMACY Online Pharmacy
50_scores.cf:score ONLINE_PHARMACY 0.843 2.371 0.008 0.650
Your rule files may be in a slightly different place depending on your
OS and how you installed.
--
Bowie
Re: Where to find DETAIL for spamassassin default RULES
Posted by Reindl Harald <h....@thelounge.net>.
Am 09.06.2016 um 13:55 schrieb jimimaseye:
> Once upon a time the include rules for spamassassin was published in its wiki
> (example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
> turn gave a link to an 'explanation' detail of the individual rules.
>
> However, as you know, these wiki ages are no longer updated due to "rules
> being updated nightly". And googling an individual rule does only give
> something useful as long as it appeared in the old 3.3 wiki (like in the
> link above). So where does one find details of new rules?
frankly i would be happy if just rules without a description would be
refused to publish so that you can at least gasp what the reason was to
hit in a report-header