You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Henrik Krohns <he...@hege.li> on 2021/05/01 09:54:41 UTC
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
These kinds of changes just make you wonder what's the point of doing such
plugins inside SA distribution.. if we ever do get 4.0 released, I really
doubt if there are enough resources in the project to even release monthly
updates after that..
On Sat, May 01, 2021 at 09:41:28AM -0000, gbechis@apache.org wrote:
> Author: gbechis
> Date: Sat May 1 09:41:28 2021
> New Revision: 1889364
>
> URL: http://svn.apache.org/viewvc?rev=1889364&view=rev
> Log:
> cope with recent MailUP changes
>
> Modified:
> spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
>
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm?rev=1889364&r1=1889363&r2=1889364&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm Sat May 1 09:41:28 2021
> @@ -388,20 +388,27 @@ sub esp_sendinblue_check {
>
> sub esp_mailup_check {
> my ($self, $pms) = @_;
> - my $mailup_id;
> + my ($mailup_id, $xabuse, $listid);
>
> my $rulename = $pms->get_current_eval_rule_name();
>
> # All Mailup emails have the X-CSA-Complaints header set to whitelist-complaints@eco.de
> my $xcsa = $pms->get("X-CSA-Complaints", undef);
> - if((not defined $xcsa) or ($xcsa !~ /whitelist-complaints\@eco\.de/)) {
> + if((not defined $xcsa) or ($xcsa !~ /complaints\@eco\.de/)) {
> return;
> }
> # All Mailup emails have the X-Abuse header that must match
> - $mailup_id = $pms->get("X-Abuse", undef);
> - return if not defined $mailup_id;
> - $mailup_id =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/;
> - $mailup_id = $2;
> + $xabuse = $pms->get("X-Abuse", undef);
> + return if not defined $xabuse;
> + if($xabuse =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/) {
> + $mailup_id = $2;
> + }
> + if(not defined $mailup_id) {
> + $listid = $pms->get("list-id", undef);
> + if($listid =~ /\<(\d+)\.\d+\>/) {
> + $mailup_id = $1;
> + }
> + }
> # if regexp doesn't match it's not Mailup
> return if not defined $mailup_id;
> chomp($mailup_id);
>
Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Loren Wilton <lw...@earthlink.net>.
> I guess the risk is exactly the same as rulenames colliding.. better not
> use very generic names and you can always prepend the rulename yourself.
> :-)
My other concern is thta as far as I know, SA rules are still limited to a
single line of text. If the rule name plus item name gets long, the rule
text using rule_name:item_name starts to become very long and unreadable,
espcially when multiple items are used in a single rule body.
Loren
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by John Hardin <jh...@impsec.org>.
On Sat, 8 May 2021, Henrik K wrote:
> On Fri, May 07, 2021 at 02:44:48PM -0700, John Hardin wrote:
>> On Fri, 7 May 2021, Loren Wilton wrote:
>>
>>> The only nitpick I'd offer is that I'd prefer that the capture tokens be
>>> at a single level, like rule names. So you might get:
>>>
>>>> $pms->{captured_values}->{NAME} = $+{NAME};
>>>>
>>>> Then use it in a rule:
>>>>
>>>> body MATCHER /My name is ${NAME}/
>>
>> The risk with that is rules from multiple sources using colliding variable
>> names.
>>
>> body MATCHER /My name is ${FROM_NAME:NAME}/
>>
>> ...is explicit and doesn't carry that risk.
>
> I guess the risk is exactly the same as rulenames colliding.. better not
> use very generic names and you can always prepend the rulename yourself. :-)
heh. True.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Tomorrow: the 76th anniversary of VE day
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Henrik K <he...@hege.li>.
On Fri, May 07, 2021 at 02:44:48PM -0700, John Hardin wrote:
> On Fri, 7 May 2021, Loren Wilton wrote:
>
> > The only nitpick I'd offer is that I'd prefer that the capture tokens be
> > at a single level, like rule names. So you might get:
> >
> > > $pms->{captured_values}->{NAME} = $+{NAME};
> > >
> > > Then use it in a rule:
> > >
> > > body MATCHER /My name is ${NAME}/
>
> The risk with that is rules from multiple sources using colliding variable
> names.
>
> body MATCHER /My name is ${FROM_NAME:NAME}/
>
> ...is explicit and doesn't carry that risk.
I guess the risk is exactly the same as rulenames colliding.. better not
use very generic names and you can always prepend the rulename yourself. :-)
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Henrik K <he...@hege.li>.
On Fri, May 07, 2021 at 10:34:10AM -0700, Loren Wilton wrote:
>
> The only nitpick I'd offer is that I'd prefer that the capture tokens be at
> a single level, like rule names. So you might get:
> >
> > body MATCHER /My name is ${NAME}/
Yes, probably more flexible this way.
Re: Capturing and reusing strings for matching across rules
Posted by John Hardin <jh...@impsec.org>.
On Sun, 15 May 2022, Michael Storz wrote:
> Just use a different sigil than $. Perl uses $, @, %, & and *. Looking at my
> keyboard, I see §
Difficult on US keyboards and possibly others, but compose-able.
> and #
Comment start, must be escaped.
> which could be used.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
715 days since the first private commercial manned orbital mission (SpaceX)
Re: Capturing and reusing strings for matching across rules
Posted by Michael Storz <Mi...@lrz.de>.
Am 2022-05-14 17:43, schrieb Henrik K:
> On Sat, May 14, 2022 at 04:54:01PM +0200, Michael Storz wrote:
>>
>> After Henrik has presented his implementation, I guess I have to tell
>> you
>> what I have been working on lately. I am working on a general Tag.pm
>> Plugin.
>> I took the Tagmatch.pm plugin from Paul and rewrote and extended it.
>> With
>> Paul's plugin you can do all kinds of operations on tags (I use tag
>> instead
>> of tagmatch because this looks similar to the header and body
>> keywords). I
>> extended it with a settag command that allows you to extract data from
>> header, body or other tags via regexp and assign it to a tag. These
>> tags can
>> then be used as usual. Coming back to the Esp.pm plugin: for me the
>> definition for an ESP looks like this:
>>
>> ####################
>> #
>> # Mailchimp
>> #
>> ####################
>>
>> # header field X-MC-User has the customer-id
>> settag _LRZ_MCID_ X-MC-User =~
>> /^([0-9a-z]{25})$/
>
> Maybe we can consider tags and regex captures the same in the future..
> they
> are simply global variables. In that case, a separate "settag" command
> wouldn't even be needed, since you could just do the "header FOO
> /(?<LRZMCID>bar)/" stanza.
>
> Btw we already agreed somewhere that tags are not supposed to contain
> underscores, since it's the tag delimiter itself. It could be awkward
> to
> parse and make sense.
I know and I do not agree :-) The _ around a tag are only needed as an
explizit representation to distinguish them from other stuff. For me
tags will play a big role in SpamAssassin. Since tags should always be
in uppercase we need _ to make them more readable. I do not think there
will be big problems in parsing with templates. But we'll see.
>
>> thing which I have not done yet, is using tags in regexps like the
>> example
>> above
>>
>> body MATCHER /My name is ${FROM_NAME:NAME}/
>
> There should consensus for a general form that will work well in the
> future
> for all these causes. If we consider that there is only one type of
> global
> variable/tag, it would be simpler.
Yes.
>
> I really dislike anything resembling $ { } because they are valid
> regexp
> meta characters, that's asking for some trouble.
Just use a different sigil than $. Perl uses $, @, %, & and *. Looking
at my keyboard, I see § and # which could be used.
>
> If we start adding lot of "tag" stuff in the mix too, there will
> probably be
> a horrible web of dependencies all around. Not sure if there is
> anything we
> can do up front to ease it. Whole SA with it's arcane priority system
> and
> dozen plugins doing their thing in independent ways would really need
> to be
> rewritten from ground up. And then we can likely forget any backwards
> compatibility for people that are still using years old versions. Any
> takers? :-D
I think we are getting the asynchronous stuff working with 4.0 Therefore
I do not see problems with this approach.
>
>> However, to fully create this design, I believe more time is needed
>> and such
>> functionality should not be incorporated into SpamAssassin until after
>> the
>> 4.0 release. First the handling of the tags must be improved, which is
>> currently totally broken. I am still writing together where the
>> problems
>> with the tags are and how to fix them.
>
> Good to see some enthusiasm. Personally I will be satisfied after
> 4.0.0 is
> released and will stay lurking and acting on any bugs, but that's
> probably
> it on my behalf.. there's hobbies and then there's hobbies that start
> to
> feel like a payless job..
Michael
Re: Capturing and reusing strings for matching across rules
Posted by Henrik K <he...@hege.li>.
On Sat, May 14, 2022 at 04:54:01PM +0200, Michael Storz wrote:
>
> After Henrik has presented his implementation, I guess I have to tell you
> what I have been working on lately. I am working on a general Tag.pm Plugin.
> I took the Tagmatch.pm plugin from Paul and rewrote and extended it. With
> Paul's plugin you can do all kinds of operations on tags (I use tag instead
> of tagmatch because this looks similar to the header and body keywords). I
> extended it with a settag command that allows you to extract data from
> header, body or other tags via regexp and assign it to a tag. These tags can
> then be used as usual. Coming back to the Esp.pm plugin: for me the
> definition for an ESP looks like this:
>
> ####################
> #
> # Mailchimp
> #
> ####################
>
> # header field X-MC-User has the customer-id
> settag _LRZ_MCID_ X-MC-User =~ /^([0-9a-z]{25})$/
Maybe we can consider tags and regex captures the same in the future.. they
are simply global variables. In that case, a separate "settag" command
wouldn't even be needed, since you could just do the "header FOO
/(?<LRZMCID>bar)/" stanza.
Btw we already agreed somewhere that tags are not supposed to contain
underscores, since it's the tag delimiter itself. It could be awkward to
parse and make sense.
> thing which I have not done yet, is using tags in regexps like the example
> above
>
> body MATCHER /My name is ${FROM_NAME:NAME}/
There should consensus for a general form that will work well in the future
for all these causes. If we consider that there is only one type of global
variable/tag, it would be simpler.
I really dislike anything resembling $ { } because they are valid regexp
meta characters, that's asking for some trouble.
If we start adding lot of "tag" stuff in the mix too, there will probably be
a horrible web of dependencies all around. Not sure if there is anything we
can do up front to ease it. Whole SA with it's arcane priority system and
dozen plugins doing their thing in independent ways would really need to be
rewritten from ground up. And then we can likely forget any backwards
compatibility for people that are still using years old versions. Any
takers? :-D
> However, to fully create this design, I believe more time is needed and such
> functionality should not be incorporated into SpamAssassin until after the
> 4.0 release. First the handling of the tags must be improved, which is
> currently totally broken. I am still writing together where the problems
> with the tags are and how to fix them.
Good to see some enthusiasm. Personally I will be satisfied after 4.0.0 is
released and will stay lurking and acting on any bugs, but that's probably
it on my behalf.. there's hobbies and then there's hobbies that start to
feel like a payless job..
Re: Capturing and reusing strings for matching across rules
Posted by Henrik K <he...@hege.li>.
On Sat, May 14, 2022 at 04:54:01PM +0200, Michael Storz wrote:
> And the last point is modifier functions, like Henrik
> implemented for the HEADER tag: :addr, :name, :trim, :base64, :domain, :lc,
> :uc, :pop, :first, you name it. It would be best if these modifier functions
> could be registered by a plugin and then used similarly to eval functions,
> which are also registered and then used.
I think you might be little bit confused about what is going on here.
First of all, no such thing as "tag modifiers" similar to :addr exists as
you imply.
Some tags in the past have indeed been made as if they act as a "function",
and take some parameter, for example:
_HAMMYTOKENS(N)_ the N most significant hammy tokens (default, 5)
_TESTS(,)_ tests hit separated by "," (or other separator)
_HEADER(NAME)_ includes the value of a message header. value is the same
as is found for header rules (see elsewhere in this doc)
All the :addr :name modifiers you refer to, are specific to HEADER RULE
SELECTOR ($pms->get) and have absolutely nothing to do with tags.
Proposing that all tags could accept a generic modifier is a completely
separate issue and the format would need to be specified.
Re: Capturing and reusing strings for matching across rules
Posted by Henrik K <he...@hege.li>.
On Sat, May 14, 2022 at 04:54:01PM +0200, Michael Storz wrote:
> It would be best if these modifier functions could be registered by a
> plugin and then used similarly to eval functions, which are also
> registered and then used.
Tags are SpamAssassin core (PerMsgStatus) function. I always shun when I
see proposal to "pluginize" something. It means creating more hooks and
stuff that can break. The less public APIs/hooks we offer, the simpler it
is for us to maintain into the future.
Quoting you from recent post:
"Recently the question was asked why Check.pm is a plugin if it is not
optional. Check.pm is a plugin so that you can implement more than one
check plugin."
You can implement more than one check plugin? Do you realize what effort it
would take to create your "own check plugin"? There's a bazillion things in
PerMsgStatus etc that Check depends on and vice versa. It's almost
impossible for anyone outside of SA developers to understand the internal
dependencies and idiosyncrasies that have been piling up over the years.
In the past there was developer talk for example about all rule types being
plugins, yadda yadda. I agree that in IDEAL world everything would be
pluginized and wildly customizable by anyone. That's just not reality with
the resources the project has. For example a small little change somewhere
can easily break sa-compile users, because that's pluginized and obscured
away doing it's own thing with rules etc, yet all that is highly coupled
into how PMS/Check internals work. All of the SpamAssassin test suite
assumes that sa-compile is not used, it only has a single t/sa_compile.t
which doesn't test much.
> For a good and effizient use of lists a full rewrite of the WLBL.pm plugin
> is needed. E.g. enlist_addrlist can be used for Mailchimp because the
> customer id is a lowercase hex string, whereas the cid for Salesforce uses
> lowercase and uppercase chars. Therefore we need lists where we can specify
> the syntax of the list members.
Addrlist should remain as (email/domain) addrlist and not be expanded into a
generic [key/]value lookup store.
Anything that is a large list, should be looked up from an external database
(redis, sql, DNS etc). As a last resort a generic FileDB.pm could be
provided that loads things into a memory hash. But I guess reading a file
is few lines of code in a Plugin, so probably not needed. As long as it
loads (and reloads?) the stuff pre-fork, so it doesn't waste memory.
Re: Capturing and reusing strings for matching across rules
Posted by Michael Storz <Mi...@lrz.de>.
Am 2022-05-14 14:27, schrieb Henrik K:
> On Fri, May 07, 2021 at 07:23:05PM +0300, Henrik K wrote:
>> On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
>> > > > > header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
>> > > >
>> > > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
>> > >
>> > > It might as well be a tflag or something. Why limit capturing to headers
>> > > only?
>> >
>> > I hadn't intended it to be limited to headers only, but I guess the syntax
>> > woudl have to be a little different for raw, body, full, etc, since they
>> > don't have a part keyword in the rule syntax.
>>
>> Perl already has named capture groups as legit syntax, so it would be
>> most
>> simple to actually use them.
>>
>> https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)
>>
>> header FROM_NAME /^From: "(?<NAME>\w+)/
>>
>> ... just save the matches it in the rule code
>> $pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};
>>
>> Then use it in a rule:
>>
>> body MATCHER /My name is ${FROM_NAME:NAME}/
>>
>> Don't nitpick on ${}, could be any similar syntax. Code adds this
>> rule to
>> FROM_NAME dependency chain. When FROM_NAME hits, run MATCHER regex
>> (obviously first recompile the regexp).
>
> Implementation pending:
>
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7992
After Henrik has presented his implementation, I guess I have to tell
you what I have been working on lately. I am working on a general Tag.pm
Plugin. I took the Tagmatch.pm plugin from Paul and rewrote and extended
it. With Paul's plugin you can do all kinds of operations on tags (I use
tag instead of tagmatch because this looks similar to the header and
body keywords). I extended it with a settag command that allows you to
extract data from header, body or other tags via regexp and assign it to
a tag. These tags can then be used as usual. Coming back to the Esp.pm
plugin: for me the definition for an ESP looks like this:
####################
#
# Mailchimp
#
####################
# header field X-MC-User has the customer-id
settag _LRZ_MCID_ X-MC-User =~ /^([0-9a-z]{25})$/
# check of tag _LRZ_MCID_, different possibilities
# askdns __LRZ_MCID_FOUND _LRZ_MCID_.esp.dnsbl.lrz.de A
127.0.0.5
# tag __LRZ_MCID_FOUND _LRZ_MCID_ =~
/^566e95f0930918dfb8d575a40$/
# header __LRZ_MCID_FOUND
eval:check_in_addrlist('_LRZ_MCID_', Mailchimp)
# tflags __LRZ_MCID_FOUND tagify
header __LRZ_MCID_FOUND
eval:check_tag_in_addrlist('_LRZ_MCID_', Mailchimp)
# all Mailchimp emails have the X-Mailer header set to "MailChimp
Mailer"
header __LRZ_XM_MAILCHIMP X-Mailer =~ /MailChimp\sMailer/
# scoring rule
meta LRZ_MCID_FOUND (__LRZ_MCID_FOUND &&
__LRZ_XM_MAILCHIMP)
score LRZ_MCID_FOUND 7.2
# list of Mailchimp-IDs
enlist_addrlist (Mailchimp) 4ecb620f8ed264d1d84aa0981
enlist_addrlist (Mailchimp) 566e95f0930918dfb8d575a40
At the moment I am working on the tflags tagify. This should take a
normal eval function and automatically allow the usage of tags as
arguments. With the above example, it takes the eval function
check_in_addrlist which normally would only allow strings as argument
and make it work with tags instead. At the moment I have to use the eval
function check_tag_in_addrlist where the ability to work with a tag is
coded into the function. The other thing which I have not done yet, is
using tags in regexps like the example above
body MATCHER /My name is ${FROM_NAME:NAME}/
I had the same idea, instead of the explicit representation _TAG_ for
the tag TAG, you could use the alternative form ${TAG} in regular
expressions (and maybe templates). And the last point is modifier
functions, like Henrik implemented for the HEADER tag: :addr, :name,
:trim, :base64, :domain, :lc, :uc, :pop, :first, you name it. It would
be best if these modifier functions could be registered by a plugin and
then used similarly to eval functions, which are also registered and
then used.
For a good and effizient use of lists a full rewrite of the WLBL.pm
plugin is needed. E.g. enlist_addrlist can be used for Mailchimp because
the customer id is a lowercase hex string, whereas the cid for
Salesforce uses lowercase and uppercase chars. Therefore we need lists
where we can specify the syntax of the list members.
However, to fully create this design, I believe more time is needed and
such functionality should not be incorporated into SpamAssassin until
after the 4.0 release. First the handling of the tags must be improved,
which is currently totally broken. I am still writing together where the
problems with the tags are and how to fix them.
Michael
Capturing and reusing strings for matching across rules
Posted by Henrik K <he...@hege.li>.
On Fri, May 07, 2021 at 07:23:05PM +0300, Henrik K wrote:
> On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
> > > > > header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> > > >
> > > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
> > >
> > > It might as well be a tflag or something. Why limit capturing to headers
> > > only?
> >
> > I hadn't intended it to be limited to headers only, but I guess the syntax
> > woudl have to be a little different for raw, body, full, etc, since they
> > don't have a part keyword in the rule syntax.
>
> Perl already has named capture groups as legit syntax, so it would be most
> simple to actually use them.
>
> https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)
>
> header FROM_NAME /^From: "(?<NAME>\w+)/
>
> ... just save the matches it in the rule code
> $pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};
>
> Then use it in a rule:
>
> body MATCHER /My name is ${FROM_NAME:NAME}/
>
> Don't nitpick on ${}, could be any similar syntax. Code adds this rule to
> FROM_NAME dependency chain. When FROM_NAME hits, run MATCHER regex
> (obviously first recompile the regexp).
Implementation pending:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7992
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by John Hardin <jh...@impsec.org>.
On Fri, 7 May 2021, Loren Wilton wrote:
> The only nitpick I'd offer is that I'd prefer that the capture tokens be at a
> single level, like rule names. So you might get:
>
>> $pms->{captured_values}->{NAME} = $+{NAME};
>>
>> Then use it in a rule:
>>
>> body MATCHER /My name is ${NAME}/
The risk with that is rules from multiple sources using colliding
variable names.
body MATCHER /My name is ${FROM_NAME:NAME}/
...is explicit and doesn't carry that risk.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Autocorrect is the work of the Devil, and whoever invented it
should go straight to hello. -- Windy Wilson
-----------------------------------------------------------------------
Tomorrow: the 76th anniversary of VE day
Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Loren Wilton <lw...@earthlink.net>.
> Perl already has named capture groups as legit syntax, so it would be most
> simple to actually use them.
>
> https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)
>
> header FROM_NAME /^From: "(?<NAME>\w+)/
Good. I thought there was someting there, but I didn't remember the exact
syntax and was too lazy to dig it out. Works for me.
>
> ... just save the matches it in the rule code
> $pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};
>
> Then use it in a rule:
>
> body MATCHER /My name is ${FROM_NAME:NAME}/
>
> Don't nitpick on ${}, could be any similar syntax. Code adds this rule to
> FROM_NAME dependency chain. When FROM_NAME hits, run MATCHER regex
> (obviously first recompile the regexp).
The only nitpick I'd offer is that I'd prefer that the capture tokens be at
a single level, like rule names. So you might get:
> $pms->{captured_values}->{NAME} = $+{NAME};
>
> Then use it in a rule:
>
> body MATCHER /My name is ${NAME}/
Loren
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Henrik K <he...@hege.li>.
On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
> > > > header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> > >
> > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
> >
> > It might as well be a tflag or something. Why limit capturing to headers
> > only?
>
> I hadn't intended it to be limited to headers only, but I guess the syntax
> woudl have to be a little different for raw, body, full, etc, since they
> don't have a part keyword in the rule syntax.
Perl already has named capture groups as legit syntax, so it would be most
simple to actually use them.
https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)
header FROM_NAME /^From: "(?<NAME>\w+)/
... just save the matches it in the rule code
$pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};
Then use it in a rule:
body MATCHER /My name is ${FROM_NAME:NAME}/
Don't nitpick on ${}, could be any similar syntax. Code adds this rule to
FROM_NAME dependency chain. When FROM_NAME hits, run MATCHER regex
(obviously first recompile the regexp).
Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Loren Wilton <lw...@earthlink.net>.
>> > header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
>>
>> Would :capture play well with (e.g.) :addr, :name, :raw, etc?
>
> It might as well be a tflag or something. Why limit capturing to headers
> only?
I hadn't intended it to be limited to headers only, but I guess the syntax
woudl have to be a little different for raw, body, full, etc, since they
don't have a part keyword in the rule syntax.
Originally I hadn't wanted to have the ":Capture" part, just have the
capture assignment following the rule body. But then, how do you know if
there is a capture assignment at the end? I didn't like the idea of trying
to stick it into the match flags, especially for the (probably rare) case of
multiple captures in a single rule.
I suppose that the rule scanner probably is looking past the flags that may
follow a regex closing bracket, so would pick up an assignment if there was
one there. So, for instance, this should work:
body SOME_RULE /Your (\w+) Order/i $(__COMPANY)=\1
Alternately (which I don't much care for) we could have
body SOME_RULE /Your (\w+) Order for \$(\d+)/i
assign __COMPANY,__AMOUNT
or keyworded
assign 1=__COMPANY,2=__AMOUNT
What worries me about that sort of syntax is there is no real
juxtapositioning requirement between a rule name definition and any modifier
flag lines with the same rule name. The capture could be in a completely
different rule file, and I suppose could even be before the defining rule by
a thousand lines or so in a single file. But you pretty much need to see
both the regex and the assignments to know what is happening to what. So
allowing the assignments to be separated from the regex isn't necessarily
good.
Loren
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Henrik K <he...@hege.li>.
On Fri, May 07, 2021 at 06:08:00PM +0300, Henrik K wrote:
>
> All this is petty details compared to the overall logic that is required in
> the background.
I'm mostly interested in tackling the meta-rule dependency mess right now:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735#c7
Variable capturing would be a logical enhancement that follows it, as the
supporting dependency logic would likely be already implemented then.
Any thoughts on implementing it efficiently are welcome.
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Henrik K <he...@hege.li>.
On Fri, May 07, 2021 at 07:58:18AM -0700, John Hardin wrote:
> On Sun, 2 May 2021, Loren Wilton wrote:
>
> > Now consider variable capture from the message:
> >
> > header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
>
> I like this syntax. I was thinking that the capture would be implied - any
> capturing group in a rule would automagically save its (single) match in a
> variable named after the rule (kept separate from the rule's score) for
> later use, but I like the explicit nature of this approach.
>
> Would :capture play well with (e.g.) :addr, :name, :raw, etc?
It might as well be a tflag or something. Why limit capturing to headers
only?
Or not a tflag at all. Just a uncommon enclosure format that is parsed from
_any_ regex anywhere.
All this is petty details compared to the overall logic that is required in
the background.
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Henrik K <he...@hege.li>.
On Sat, May 08, 2021 at 08:51:29AM -0700, Loren Wilton wrote:
> > An alternative approach is creating new strings from parsed data:
> >
> > string TO_BODY = TO:addr ":" BODY(500)
> >
> > string TO_BODY ~= /<whatever>/
> >
> > the advantage of this is that there are no dependencies.
> >
> > I'm thinking that BODY(500) would be a multi-line string constructed
> > from the first 500 byte of the rendered body. For me having multi-line
> > body matching is more important than any of this.
>
> Hum, interesting.
>
> As a small nitpick, maybe it's just my 40 years programming C++, but I'm
> bothered by using 'string' for both the creation operation and rule-parsing
> operation. I realize they are differentiated by the operator, but that just
> seems too easy to screw up, at least for me and my bad eyesight. Maybe
> 'makestring' and 'string' or some other non-identical pair of words,
> whatever seems nice.
>
> I assume you would want BODY, RAWBODY, FULL, etc. as possibilities. At least
> I would.
>
> I think that rather than the character count, I'd do a range: BODY(1:500) or
> the like. This lets you capture from an offset location.
>
> Actually I think I'd prefer a regex there, at least as an alternative:
> BODY/(.{500})/m to get the equivalent first 500 characters. Or BODY/Your
> order number (\d+)/ to get a capture of a specific thing from the body.
>
> Thoughts?
There's already a bug discussing something like this:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=4691
But really, perl/memory/CPU are not a bottleneck anymore, there is no
problem matching a ~50k text with regexp. The simplest solution is just
ditching the "body chunking" completely in 4.0, or implement new methods for
full body matching like I proposed:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7745
Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Loren Wilton <lw...@earthlink.net>.
> An alternative approach is creating new strings from parsed data:
>
> string TO_BODY = TO:addr ":" BODY(500)
>
> string TO_BODY ~= /<whatever>/
>
> the advantage of this is that there are no dependencies.
>
> I'm thinking that BODY(500) would be a multi-line string constructed
> from the first 500 byte of the rendered body. For me having multi-line
> body matching is more important than any of this.
Hum, interesting.
As a small nitpick, maybe it's just my 40 years programming C++, but I'm
bothered by using 'string' for both the creation operation and rule-parsing
operation. I realize they are differentiated by the operator, but that just
seems too easy to screw up, at least for me and my bad eyesight. Maybe
'makestring' and 'string' or some other non-identical pair of words,
whatever seems nice.
I assume you would want BODY, RAWBODY, FULL, etc. as possibilities. At least
I would.
I think that rather than the character count, I'd do a range: BODY(1:500) or
the like. This lets you capture from an offset location.
Actually I think I'd prefer a regex there, at least as an alternative:
BODY/(.{500})/m to get the equivalent first 500 characters. Or BODY/Your
order number (\d+)/ to get a capture of a specific thing from the body.
Thoughts?
Loren
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by RW <rw...@googlemail.com>.
On Fri, 7 May 2021 07:58:18 -0700 (PDT)
John Hardin wrote:
> On Sun, 2 May 2021, Loren Wilton wrote:
>
> > Now consider variable capture from the message:
> >
> > header __SUB_CAP Subject:Capture /Your (\w+) Order/i
> > $(__COMPANY)=\1
>
> I like this syntax. I was thinking that the capture would be implied
> - any capturing group in a rule would automagically save its (single)
> match in a variable named after the rule (kept separate from the
> rule's score) for later use, but I like the explicit nature of this
> approach.
An alternative approach is creating new strings from parsed data:
string TO_BODY = TO:addr ":" BODY(500)
string TO_BODY ~= /<whatever>/
the advantage of this is that there are no dependencies.
I'm thinking that BODY(500) would be a multi-line string constructed
from the first 500 byte of the rendered body. For me having multi-line
body matching is more important than any of this.
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by John Hardin <jh...@impsec.org>.
On Sun, 2 May 2021, Loren Wilton wrote:
> Now consider variable capture from the message:
>
> header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
I like this syntax. I was thinking that the capture would be implied - any
capturing group in a rule would automagically save its (single) match in a
variable named after the rule (kept separate from the rule's score) for
later use, but I like the explicit nature of this approach.
Would :capture play well with (e.g.) :addr, :name, :raw, etc?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Tomorrow: the 76th anniversary of VE day
Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Loren Wilton <lw...@earthlink.net>.
> Now consider variable capture from the message:
>
> header __SUB_CAP Subject:Capture /Your (\w+) Order/i
> $(__COMPANY)=\1
The text above was intended to all appear on one line. "$(__COMPANY)=\1"
followed /i.
Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Loren Wilton <lw...@earthlink.net>.
John Hardin wrote:
>> An awful lot I think could be done simply by having rules that can
>> capture to named per-message-global variables, and allowing those
>> variables to be used in other (or the same) rules.
>
> I've been wanting this for years.
Proposal for discussion:
Consider the following rules that could be in a user_prefs on a system that
allows per-user rules:
header __TO_ME To:Addr /<me\@myhost\.com>/
meta NOT_TO_ME !__TO_ME
This could be useful for a single user, but obviously could not be
site-wide, even if the site found such a thing useful, as all users have
their own email addresses. The problem is obviously the hard-coded address
string.
PMS has a number of per-message variables attached to it that can be used in
the Perl code for various things. I'm proposing a way to add per-message
constants and variables to this collection, and a way to access them in rule
text.
Consider this variation on the above rules:
variable __ME /me\@myhost\.com/
header __TO_ME To:Addr /<$(__ME)>/
meta NOT_TO_ME !__TO_ME
The format of the "variable" declaration is deliberately the simplified form
of a rule declaration to simplify parsing and help file description
considerations. Since it is actually defining a constant it could be called
"constant". I called it "variable" because the thing it defines could vary
from user to user and message to message.
Now we can put the __ME in each user_prefs and have a global rule in some
site-global rules file. Each __ME instance would stick the string into a PMS
variable for the duration of the message being parsed. The name of the PMS
variable would be some variation on the "rule" name of the variable.
The text of the rules with a $(name) string in them, to be compiled, would
have to have a way to reach into the relevent PMS variable to resolve that
part of the string. Perhaps this means that rules containing variables could
not be compiled. As the number of them is likely to be relatively small
compared to the number of all rules, that is probably an acceptable
tradeoff.
There are no execution ordering problems as long as 'variable' declarations
are parsed before rules are run.
Now consider variable capture from the message:
header __SUB_CAP Subject:Capture /Your (\w+) Order/i
$(__COMPANY)=\1
Here we can define a PMS variable and populate it on a rule match. The rule
can be used as any normal rule, it just additionally captures one or more
variables while it runs.
We can use this is a match against some other message part with a rule
similar to the __TO_ME rule above. Obviously in this case we have rule
ordering to consider, since we have to capture (or attempt the capture; the
string may come up null) before we can run the rule depending on the string.
An alternative to the above capture symtax could be:
header __COMPANY Subject /Your (\w+) Order/
The disadvantages I see to this are that you can only capture one string
from the match, and you now have to wonder whether a rule name represents
and integer or a string or both. I'm not in favor of the mess this could
create.
Note that you could extend this fairly trivially (in a syntactic sense) to
allow a match against multiple captured strings in a pattern:
variable __GROUP /Order for $(__ME_) from $(__ORDER)/
body MY_ORDER /$(__GROUP)/i # __GROUP exists in the body
This also gets into problems of rule dependencies, since now some 'variable'
declarations could not be executed until other rules have run. But it might
be worth considering as an extension. Likely the mechanisms to implement the
constant declaration, capture, and match code would be most all that was
necessary to implement this too.
I think that the above would do most of what people would like to do.
Errors:
A reference to an undefined variable would be a rule syntax error,
invalidating that rule.
A poorly formatted capture would be a rule syntax error
A poorly formatted variable would essentially be a rule syntax error.
Circular references would be a rule syntax error, invalidating the rule
where it was detected (which could then invalidate other depending rules)
A dependency on text from a net rule would push other depending rule
evaluation to after the net rules returned results. I assume this is already
done for meta rules that depend on net rules. But I could see this
potentially being a pain point.
I'm not married to any of the above suggested syntax; it just seemed like a
reasonable starting point, and simple to describe and to use. Discussion and
suggestions on the formats are welcome.
I don't know what it would mean to put a 'variable' name in a meta. Likely
it would be meaningless and probably disallowed. Likewise I don't see much
point in assigning a score or a description to a varable, since they aren't
really rules themselves.
Discussion is open!
Loren
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by John Hardin <jh...@impsec.org>.
On Sat, 1 May 2021, Loren Wilton wrote:
>> Ideally rules could be written with some pseudo-language that could do
>> complex things, grabbing things into variables, modifying, comparing to
>> other things etc. Then there wouldn't be any need for Perl plugins doing
>> some trivial stuff.
>
> An awful lot I think could be done simply by having rules that can capture to
> named per-message-global variables, and allowing those variables to be used
> in other (or the same) rules. The Perl RE syntax almost allows for this
> as-is, so it shouldn't be a great stretch to modify the rules parser to allow
> such things and capture the names of the variables and create the variables.
I've been wanting this for years.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The yardstick you should use when considering whether to support a
given piece of legislation is "what if my worst enemy is chosen to
administer this law?"
-----------------------------------------------------------------------
Today: May Day - Remember 110 million people murdered by Communism
Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Loren Wilton <lw...@earthlink.net>.
> Ideally rules could be written with some pseudo-language that could do
> complex things, grabbing things into variables, modifying, comparing to
> other things etc. Then there wouldn't be any need for Perl plugins doing
> some trivial stuff.
An awful lot I think could be done simply by having rules that can capture
to named per-message-global variables, and allowing those variables to be
used in other (or the same) rules. The Perl RE syntax almost allows for this
as-is, so it shouldn't be a great stretch to modify the rules parser to
allow such things and capture the names of the variables and create the
variables.
Obviously though this gets into complexities of rule dependencies and
creates possible circular dependencies. I'd just treat any such as rule
errors and let the user worry about it. Usage of undefined (ungenerated)
variables could also just be treated as errors. But it would still take
evaluating the dependency chains, and that could reorder rule priorities and
the like, so it would take some thought.
There should also be some way to simply define variable values without
parsing anything. For instance, I have a bunch of rules that are dependent
on knowing my email address and the name string I expect in a proper To
address, and seeing if the mail is sent to me or the like. On a system that
allows user_prefs it would be trivial to have these strings placed there for
use by various rules. Currently I have the strings hard-coded in a bunch of
different rules, making them unuseful for anyone but me.
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Henrik K <he...@hege.li>.
On Sat, May 01, 2021 at 04:01:05AM -0700, Loren Wilton wrote:
>
> Given that plugins are by and large the basis for (some) rules, and rule
> updates happen frequently, some thought should be given to treating at least
> those plugins called from rules as in fact being rules themselves, at least
> as far as packaging and distribution is concerned.
The problem is that distributing Perl code with sa-update is inherently
dangerous and should be considered deprecated (which is why I renamed
--allowplugins too). Only official SA version releases go through proper
scrutiny to release code that can run with root permissions around the
world. I don't think having a separate "plugins-release" would make any
difference, same problems remain with with vendors OS packaging etc.
Ideally rules could be written with some pseudo-language that could do
complex things, grabbing things into variables, modifying, comparing to
other things etc. Then there wouldn't be any need for Perl plugins doing
some trivial stuff.
Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Loren Wilton <lw...@earthlink.net>.
> These kinds of changes just make you wonder what's the point of doing such
> plugins inside SA distribution.. if we ever do get 4.0 released, I really
> doubt if there are enough resources in the project to even release monthly
> updates after that..
Given that plugins are by and large the basis for (some) rules, and rule
updates happen frequently, some thought should be given to treating at least
those plugins called from rules as in fact being rules themselves, at least
as far as packaging and distribution is concerned.
Obviously since interfaces can change from release to release this could get
complicated. But just because something is complicated does not mean it is
either insoluable nor necessarily unmaintainable. I admit though it has been
long enough since I've coded Perl that I'm in no position to suggest a good
method with any authority. The obvious possibilities are subdirectories for
different releases (or at least at points of interface change), or
alternately conditional code.
Loren
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Henrik K <he...@hege.li>.
... and what would be the point of those monthly releases, if no distribution
will ever pick them up ...
On Sat, May 01, 2021 at 12:54:42PM +0300, Henrik Krohns wrote:
>
> These kinds of changes just make you wonder what's the point of doing such
> plugins inside SA distribution.. if we ever do get 4.0 released, I really
> doubt if there are enough resources in the project to even release monthly
> updates after that..
>
>
> On Sat, May 01, 2021 at 09:41:28AM -0000, gbechis@apache.org wrote:
> > Author: gbechis
> > Date: Sat May 1 09:41:28 2021
> > New Revision: 1889364
> >
> > URL: http://svn.apache.org/viewvc?rev=1889364&view=rev
> > Log:
> > cope with recent MailUP changes
> >
> > Modified:
> > spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> >
> > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> > URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm?rev=1889364&r1=1889363&r2=1889364&view=diff
> > ==============================================================================
> > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm (original)
> > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm Sat May 1 09:41:28 2021
> > @@ -388,20 +388,27 @@ sub esp_sendinblue_check {
> >
> > sub esp_mailup_check {
> > my ($self, $pms) = @_;
> > - my $mailup_id;
> > + my ($mailup_id, $xabuse, $listid);
> >
> > my $rulename = $pms->get_current_eval_rule_name();
> >
> > # All Mailup emails have the X-CSA-Complaints header set to whitelist-complaints@eco.de
> > my $xcsa = $pms->get("X-CSA-Complaints", undef);
> > - if((not defined $xcsa) or ($xcsa !~ /whitelist-complaints\@eco\.de/)) {
> > + if((not defined $xcsa) or ($xcsa !~ /complaints\@eco\.de/)) {
> > return;
> > }
> > # All Mailup emails have the X-Abuse header that must match
> > - $mailup_id = $pms->get("X-Abuse", undef);
> > - return if not defined $mailup_id;
> > - $mailup_id =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/;
> > - $mailup_id = $2;
> > + $xabuse = $pms->get("X-Abuse", undef);
> > + return if not defined $xabuse;
> > + if($xabuse =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/) {
> > + $mailup_id = $2;
> > + }
> > + if(not defined $mailup_id) {
> > + $listid = $pms->get("list-id", undef);
> > + if($listid =~ /\<(\d+)\.\d+\>/) {
> > + $mailup_id = $1;
> > + }
> > + }
> > # if regexp doesn't match it's not Mailup
> > return if not defined $mailup_id;
> > chomp($mailup_id);
> >
Re: Esp module discussion
Posted by Axb <ax...@gmail.com>.
On 5/13/22 19:03, Henrik K wrote:
> On Sun, May 02, 2021 at 10:36:37AM +0200, Giovanni Bechis wrote:
>> On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
>>>
>>> These kinds of changes just make you wonder what's the point of doing such
>>> plugins inside SA distribution.. if we ever do get 4.0 released, I really
>>> doubt if there are enough resources in the project to even release monthly
>>> updates after that..
>>>
>> I have no problem in working out-of-tree but distros will probably
>> never package an external plugin and, in some cases, it may be better
>> an outdated plugin then no plugin at all.
>> Releasing every some months could help, distro will ship outdated packages
>> in any case but I do not think we can do much to improve that situation.
>
> Bumping up this discussion due to:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7989
>
> In current state the Esp.pm likely too vague for any users to use. It
> should be clearly documented where the feed files are found, and what format
> they should be expected to be in. An external github page related to the
> plugin does not constitute as documentation.
>
> Somewhat agree with Michael's comment that the module shouldn't be an
> advertisement for Invaluement either. Then again, at this time Invaluement
> doesn't even seem to be providing the data, so what use is for the module?
>
> For the actual reply to Giovannis comment, "it may be better an outdated
> plugin then no plugin at all": Again, what use is the plugin as users
> actually need knowledge and effort to setup the feed downloads and make sure
> they work. With the same effort they can download up-to-date module from
> Github into /etc/mail/spamassassin.
>
+1 that this can be removed without any loss.
Axb
Re: Esp module discussion
Posted by Michael Storz <Mi...@lrz.de>.
Am 2022-05-13 19:03, schrieb Henrik K:
> On Sun, May 02, 2021 at 10:36:37AM +0200, Giovanni Bechis wrote:
>> On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
>> >
>> > These kinds of changes just make you wonder what's the point of doing such
>> > plugins inside SA distribution.. if we ever do get 4.0 released, I really
>> > doubt if there are enough resources in the project to even release monthly
>> > updates after that..
>> >
>> I have no problem in working out-of-tree but distros will probably
>> never package an external plugin and, in some cases, it may be better
>> an outdated plugin then no plugin at all.
>> Releasing every some months could help, distro will ship outdated
>> packages
>> in any case but I do not think we can do much to improve that
>> situation.
>
> Bumping up this discussion due to:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7989
>
> In current state the Esp.pm likely too vague for any users to use. It
> should be clearly documented where the feed files are found, and what
> format
> they should be expected to be in. An external github page related to
> the
> plugin does not constitute as documentation.
>
> Somewhat agree with Michael's comment that the module shouldn't be an
> advertisement for Invaluement either. Then again, at this time
> Invaluement
> doesn't even seem to be providing the data, so what use is for the
> module?
>
> For the actual reply to Giovannis comment, "it may be better an
> outdated
> plugin then no plugin at all": Again, what use is the plugin as users
> actually need knowledge and effort to setup the feed downloads and make
> sure
> they work. With the same effort they can download up-to-date module
> from
> Github into /etc/mail/spamassassin.
From my point of view, the Esp.pm plugin should not be a standard plugin
of SpamAssassin. Standard plugins should provide tools or language
extensions for the SA meta language. This plugin on the other hand
provides highly specialized functions. Since new functions have to be
programmed for each newly included ESP, the plugin needs a much faster
release cycle than SpamAssassin can provide. Therefore, the maintenance
of this plugin should be done outside SpamAssassin. Announcements about
new releases should definitely be made on the SpamAssassin user list.
The inclusion of new ESPs is certainly necessary, as there are dozens of
ESPs. For example, while I don't have a single spam mail from Maildome,
Mailup or Mdirector, other ESPs like ActiveCampaign, eC-messenger (was
eCircle), Klaviyo, Salesforce and Sparkpost are in my configuration. How
to do the configuration of new ESPs generically with standard
SpamAssassin means I will describe later when I have fully programmed
the changes needed for this.
Michael
Re: Esp module discussion
Posted by John Hardin <jh...@impsec.org>.
On Sat, 14 May 2022, Bill Cole wrote:
> On 2022-05-14 at 06:52:02 UTC-0400 (Sat, 14 May 2022 12:52:02 +0200)
> <gi...@paclan.it>
> is rumored to have said:
>
>> Esp module may be effectively outdated and SpamAssassin releases are not frequent as I would love to, for me there is no problem in removing the module from SpamAssassin src tree and work on it out-of-tree.
>
> +1
+1
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Maxim V: Close air support and friendly fire should be easier to
tell apart.
-----------------------------------------------------------------------
Today: the 74th anniversary of Israel's independence
Re: Esp module discussion
Posted by Bill Cole <bi...@apache.org>.
On 2022-05-14 at 06:52:02 UTC-0400 (Sat, 14 May 2022 12:52:02 +0200)
<gi...@paclan.it>
is rumored to have said:
> Esp module may be effectively outdated and SpamAssassin releases are not frequent as I would love to, for me there is no problem in removing the module from SpamAssassin src tree and work on it out-of-tree.
+1
Re: Esp module discussion
Posted by Henrik K <he...@hege.li>.
On Sat, May 14, 2022 at 10:57:19AM -0700, Matt Corallo wrote:
>
> At the same time, the ESP module was one of the features of SA 4 that I was
> most excited about - the ability to learn on and classify specific senders
> even though they hide behind ESPs sounds like it could, at least in theory,
> be quite effective.
>
> Of course updates for such a thing are going to be a problem, and I don't
> know enough about the SA updates architecture, but if that could provide
> feeds to keep the ESP match-and-decode (regex) rules up-to-date it seems
> like it'd be very powerful for semi-default-install SA, which isn't all that
> uncommon.
AFAIK most of the stuff that Esp.pm does could now be done in plain rules
only with the new 4.0.0 features.
Having the module or not is irrevant. What matters is if Giovanni or
someone can provide the necessary rules for the "Esp" stuff for stock
sa-update. The proper way looking up this stuff would be using a DNS query
anyway, maybe we can enhance askdns to look up the new regex capture stuff.
Re: Esp module discussion
Posted by Matt Corallo <sa...@mattcorallo.com>.
On 5/14/22 3:52 AM, giovanni@paclan.it wrote:
> Esp module may be effectively outdated and SpamAssassin releases are not frequent as I would love to,
> for me there is no problem in removing the module from SpamAssassin src tree and work on it
> out-of-tree.
As a user with no business commenting on dev@ or material knowledge of the SA architecture, this is
a bit disappointing to me. Keeping external plugins up-to-date is somewhat of a pain, at least
unless it comes via distro packages, which would also suffer the update problem.
At the same time, the ESP module was one of the features of SA 4 that I was most excited about - the
ability to learn on and classify specific senders even though they hide behind ESPs sounds like it
could, at least in theory, be quite effective.
Of course updates for such a thing are going to be a problem, and I don't know enough about the SA
updates architecture, but if that could provide feeds to keep the ESP match-and-decode (regex) rules
up-to-date it seems like it'd be very powerful for semi-default-install SA, which isn't all that
uncommon.
- someone asking someone else to (continue) do(ing) work for them
Matt
Re: Esp module discussion
Posted by gi...@paclan.it.
On 5/13/22 19:03, Henrik K wrote:
> On Sun, May 02, 2021 at 10:36:37AM +0200, Giovanni Bechis wrote:
>> On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
>>>
>>> These kinds of changes just make you wonder what's the point of doing such
>>> plugins inside SA distribution.. if we ever do get 4.0 released, I really
>>> doubt if there are enough resources in the project to even release monthly
>>> updates after that..
>>>
>> I have no problem in working out-of-tree but distros will probably
>> never package an external plugin and, in some cases, it may be better
>> an outdated plugin then no plugin at all.
>> Releasing every some months could help, distro will ship outdated packages
>> in any case but I do not think we can do much to improve that situation.
>
> Bumping up this discussion due to:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7989
>
> In current state the Esp.pm likely too vague for any users to use. It
> should be clearly documented where the feed files are found, and what format
> they should be expected to be in. An external github page related to the
> plugin does not constitute as documentation.
>
Atm there are no public feeds afaik, I am using my private feeds.
I will work on improving documentation about how to create feeds and (maybe) publish my feeds.
> Somewhat agree with Michael's comment that the module shouldn't be an
> advertisement for Invaluement either. Then again, at this time Invaluement
> doesn't even seem to be providing the data, so what use is for the module?
>
> For the actual reply to Giovannis comment, "it may be better an outdated
> plugin then no plugin at all": Again, what use is the plugin as users
> actually need knowledge and effort to setup the feed downloads and make sure
> they work. With the same effort they can download up-to-date module from
> Github into /etc/mail/spamassassin.
>
Esp module may be effectively outdated and SpamAssassin releases are not frequent as I would love to,
for me there is no problem in removing the module from SpamAssassin src tree and work on it
out-of-tree.
Giovanni
Esp module discussion
Posted by Henrik K <he...@hege.li>.
On Sun, May 02, 2021 at 10:36:37AM +0200, Giovanni Bechis wrote:
> On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
> >
> > These kinds of changes just make you wonder what's the point of doing such
> > plugins inside SA distribution.. if we ever do get 4.0 released, I really
> > doubt if there are enough resources in the project to even release monthly
> > updates after that..
> >
> I have no problem in working out-of-tree but distros will probably
> never package an external plugin and, in some cases, it may be better
> an outdated plugin then no plugin at all.
> Releasing every some months could help, distro will ship outdated packages
> in any case but I do not think we can do much to improve that situation.
Bumping up this discussion due to:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7989
In current state the Esp.pm likely too vague for any users to use. It
should be clearly documented where the feed files are found, and what format
they should be expected to be in. An external github page related to the
plugin does not constitute as documentation.
Somewhat agree with Michael's comment that the module shouldn't be an
advertisement for Invaluement either. Then again, at this time Invaluement
doesn't even seem to be providing the data, so what use is for the module?
For the actual reply to Giovannis comment, "it may be better an outdated
plugin then no plugin at all": Again, what use is the plugin as users
actually need knowledge and effort to setup the feed downloads and make sure
they work. With the same effort they can download up-to-date module from
Github into /etc/mail/spamassassin.
Re: svn commit: r1889364 -
/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
Posted by Giovanni Bechis <gi...@paclan.it>.
On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
>
> These kinds of changes just make you wonder what's the point of doing such
> plugins inside SA distribution.. if we ever do get 4.0 released, I really
> doubt if there are enough resources in the project to even release monthly
> updates after that..
>
I have no problem in working out-of-tree but distros will probably
never package an external plugin and, in some cases, it may be better
an outdated plugin then no plugin at all.
Releasing every some months could help, distro will ship outdated packages
in any case but I do not think we can do much to improve that situation.
Giovanni
>
> On Sat, May 01, 2021 at 09:41:28AM -0000, gbechis@apache.org wrote:
> > Author: gbechis
> > Date: Sat May 1 09:41:28 2021
> > New Revision: 1889364
> >
> > URL: http://svn.apache.org/viewvc?rev=1889364&view=rev
> > Log:
> > cope with recent MailUP changes
> >
> > Modified:
> > spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> >
> > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> > URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm?rev=1889364&r1=1889363&r2=1889364&view=diff
> > ==============================================================================
> > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm (original)
> > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm Sat May 1 09:41:28 2021
> > @@ -388,20 +388,27 @@ sub esp_sendinblue_check {
> >
> > sub esp_mailup_check {
> > my ($self, $pms) = @_;
> > - my $mailup_id;
> > + my ($mailup_id, $xabuse, $listid);
> >
> > my $rulename = $pms->get_current_eval_rule_name();
> >
> > # All Mailup emails have the X-CSA-Complaints header set to whitelist-complaints@eco.de
> > my $xcsa = $pms->get("X-CSA-Complaints", undef);
> > - if((not defined $xcsa) or ($xcsa !~ /whitelist-complaints\@eco\.de/)) {
> > + if((not defined $xcsa) or ($xcsa !~ /complaints\@eco\.de/)) {
> > return;
> > }
> > # All Mailup emails have the X-Abuse header that must match
> > - $mailup_id = $pms->get("X-Abuse", undef);
> > - return if not defined $mailup_id;
> > - $mailup_id =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/;
> > - $mailup_id = $2;
> > + $xabuse = $pms->get("X-Abuse", undef);
> > + return if not defined $xabuse;
> > + if($xabuse =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/) {
> > + $mailup_id = $2;
> > + }
> > + if(not defined $mailup_id) {
> > + $listid = $pms->get("list-id", undef);
> > + if($listid =~ /\<(\d+)\.\d+\>/) {
> > + $mailup_id = $1;
> > + }
> > + }
> > # if regexp doesn't match it's not Mailup
> > return if not defined $mailup_id;
> > chomp($mailup_id);
> >