You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by Henrik Krohns <he...@hege.li> on 2021/05/01 09:54:41 UTC

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

These kinds of changes just make you wonder what's the point of doing such
plugins inside SA distribution..  if we ever do get 4.0 released, I really
doubt if there are enough resources in the project to even release monthly
updates after that..


On Sat, May 01, 2021 at 09:41:28AM -0000, gbechis@apache.org wrote:
> Author: gbechis
> Date: Sat May  1 09:41:28 2021
> New Revision: 1889364
> 
> URL: http://svn.apache.org/viewvc?rev=1889364&view=rev
> Log:
> cope with recent MailUP changes
> 
> Modified:
>     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> 
> Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm?rev=1889364&r1=1889363&r2=1889364&view=diff
> ==============================================================================
> --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm (original)
> +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm Sat May  1 09:41:28 2021
> @@ -388,20 +388,27 @@ sub esp_sendinblue_check {
>  
>  sub esp_mailup_check {
>    my ($self, $pms) = @_;
> -  my $mailup_id;
> +  my ($mailup_id, $xabuse, $listid);
>  
>    my $rulename = $pms->get_current_eval_rule_name();
>  
>    # All Mailup emails have the X-CSA-Complaints header set to whitelist-complaints@eco.de
>    my $xcsa = $pms->get("X-CSA-Complaints", undef);
> -  if((not defined $xcsa) or ($xcsa !~ /whitelist-complaints\@eco\.de/)) {
> +  if((not defined $xcsa) or ($xcsa !~ /complaints\@eco\.de/)) {
>      return;
>    }
>    # All Mailup emails have the X-Abuse header that must match
> -  $mailup_id = $pms->get("X-Abuse", undef);
> -  return if not defined $mailup_id;
> -  $mailup_id =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/;
> -  $mailup_id = $2;
> +  $xabuse = $pms->get("X-Abuse", undef);
> +  return if not defined $xabuse;
> +  if($xabuse =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/) {
> +    $mailup_id = $2;
> +  }
> +  if(not defined $mailup_id) {
> +    $listid = $pms->get("list-id", undef);
> +    if($listid =~ /\<(\d+)\.\d+\>/) {
> +      $mailup_id = $1;
> +    }
> +  }
>    # if regexp doesn't match it's not Mailup
>    return if not defined $mailup_id;
>    chomp($mailup_id);
>

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Loren Wilton <lw...@earthlink.net>.

> I guess the risk is exactly the same as rulenames colliding.. better not
> use very generic names and you can always prepend the rulename yourself. 
> :-)

My other concern is thta as far as I know, SA rules are still limited to a 
single line of text. If the rule name plus item name gets long, the rule 
text using rule_name:item_name starts to become very long and unreadable, 
espcially when multiple items are used in a single rule body.

        Loren

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by John Hardin <jh...@impsec.org>.

On Sat, 8 May 2021, Henrik K wrote:

> On Fri, May 07, 2021 at 02:44:48PM -0700, John Hardin wrote:
>> On Fri, 7 May 2021, Loren Wilton wrote:
>>
>>> The only nitpick I'd offer is that I'd prefer that the capture tokens be
>>> at a single level, like rule names. So you might get:
>>>
>>>> $pms->{captured_values}->{NAME} = $+{NAME};
>>>>
>>>> Then use it in a rule:
>>>>
>>>> body MATCHER /My name is ${NAME}/
>>
>> The risk with that is rules from multiple sources using colliding variable
>> names.
>>
>>   body MATCHER /My name is ${FROM_NAME:NAME}/
>>
>> ...is explicit and doesn't carry that risk.
>
> I guess the risk is exactly the same as rulenames colliding.. better not
> use very generic names and you can always prepend the rulename yourself. :-)

heh. True.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Tomorrow: the 76th anniversary of VE day

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Henrik K <he...@hege.li>.

On Fri, May 07, 2021 at 02:44:48PM -0700, John Hardin wrote:
> On Fri, 7 May 2021, Loren Wilton wrote:
> 
> > The only nitpick I'd offer is that I'd prefer that the capture tokens be
> > at a single level, like rule names. So you might get:
> > 
> > > $pms->{captured_values}->{NAME} = $+{NAME};
> > > 
> > > Then use it in a rule:
> > > 
> > > body MATCHER /My name is ${NAME}/
> 
> The risk with that is rules from multiple sources using colliding variable
> names.
> 
>   body MATCHER /My name is ${FROM_NAME:NAME}/
> 
> ...is explicit and doesn't carry that risk.

I guess the risk is exactly the same as rulenames colliding.. better not
use very generic names and you can always prepend the rulename yourself. :-)

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Henrik K <he...@hege.li>.

On Fri, May 07, 2021 at 10:34:10AM -0700, Loren Wilton wrote:
>
> The only nitpick I'd offer is that I'd prefer that the capture tokens be at
> a single level, like rule names. So you might get:
> > 
> > body MATCHER /My name is ${NAME}/

Yes, probably more flexible this way.

Re: Capturing and reusing strings for matching across rules

Posted by John Hardin <jh...@impsec.org>.

On Sun, 15 May 2022, Michael Storz wrote:

> Just use a different sigil than $. Perl uses $, @, %, & and *. Looking at my 
> keyboard, I see §

Difficult on US keyboards and possibly others, but compose-able.

> and #

Comment start, must be escaped.

> which could be used.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  715 days since the first private commercial manned orbital mission (SpaceX)

Re: Capturing and reusing strings for matching across rules

Posted by Michael Storz <Mi...@lrz.de>.

Am 2022-05-14 17:43, schrieb Henrik K:
> On Sat, May 14, 2022 at 04:54:01PM +0200, Michael Storz wrote:
>> 
>> After Henrik has presented his implementation, I guess I have to tell 
>> you
>> what I have been working on lately. I am working on a general Tag.pm 
>> Plugin.
>> I took the Tagmatch.pm plugin from Paul and rewrote and extended it. 
>> With
>> Paul's plugin you can do all kinds of operations on tags (I use tag 
>> instead
>> of tagmatch because this looks similar to the header and body 
>> keywords). I
>> extended it with a settag command that allows you to extract data from
>> header, body or other tags via regexp and assign it to a tag. These 
>> tags can
>> then be used as usual. Coming back to the Esp.pm plugin: for me the
>> definition for an ESP looks like this:
>> 
>> ####################
>> #
>> # Mailchimp
>> #
>> ####################
>> 
>> # header field X-MC-User has the customer-id
>>   settag        _LRZ_MCID_              X-MC-User =~ 
>> /^([0-9a-z]{25})$/
> 
> Maybe we can consider tags and regex captures the same in the future..  
> they
> are simply global variables.  In that case, a separate "settag" command
> wouldn't even be needed, since you could just do the "header FOO
> /(?<LRZMCID>bar)/" stanza.
> 
> Btw we already agreed somewhere that tags are not supposed to contain
> underscores, since it's the tag delimiter itself.  It could be awkward 
> to
> parse and make sense.

I know and I do not agree :-) The _ around a tag are only needed as an 
explizit representation to distinguish them from other stuff. For me 
tags will play a big role in SpamAssassin. Since tags should always be 
in uppercase we need _ to make them more readable. I do not think there 
will be big problems in parsing with templates. But we'll see.

> 
>> thing which I have not done yet, is using tags in regexps like the 
>> example
>> above
>> 
>> body MATCHER /My name is ${FROM_NAME:NAME}/
> 
> There should consensus for a general form that will work well in the 
> future
> for all these causes.  If we consider that there is only one type of 
> global
> variable/tag, it would be simpler.

Yes.

> 
> I really dislike anything resembling $ { } because they are valid 
> regexp
> meta characters, that's asking for some trouble.

Just use a different sigil than $. Perl uses $, @, %, & and *. Looking 
at my keyboard, I see § and # which could be used.

> 
> If we start adding lot of "tag" stuff in the mix too, there will 
> probably be
> a horrible web of dependencies all around.  Not sure if there is 
> anything we
> can do up front to ease it.  Whole SA with it's arcane priority system 
> and
> dozen plugins doing their thing in independent ways would really need 
> to be
> rewritten from ground up.  And then we can likely forget any backwards
> compatibility for people that are still using years old versions.  Any
> takers?  :-D

I think we are getting the asynchronous stuff working with 4.0 Therefore 
I do not see problems with this approach.

> 
>> However, to fully create this design, I believe more time is needed 
>> and such
>> functionality should not be incorporated into SpamAssassin until after 
>> the
>> 4.0 release. First the handling of the tags must be improved, which is
>> currently totally broken. I am still writing together where the 
>> problems
>> with the tags are and how to fix them.
> 
> Good to see some enthusiasm.  Personally I will be satisfied after 
> 4.0.0 is
> released and will stay lurking and acting on any bugs, but that's 
> probably
> it on my behalf..  there's hobbies and then there's hobbies that start 
> to
> feel like a payless job..

Michael

Re: Capturing and reusing strings for matching across rules

Posted by Henrik K <he...@hege.li>.

On Sat, May 14, 2022 at 04:54:01PM +0200, Michael Storz wrote:
> 
> After Henrik has presented his implementation, I guess I have to tell you
> what I have been working on lately. I am working on a general Tag.pm Plugin.
> I took the Tagmatch.pm plugin from Paul and rewrote and extended it. With
> Paul's plugin you can do all kinds of operations on tags (I use tag instead
> of tagmatch because this looks similar to the header and body keywords). I
> extended it with a settag command that allows you to extract data from
> header, body or other tags via regexp and assign it to a tag. These tags can
> then be used as usual. Coming back to the Esp.pm plugin: for me the
> definition for an ESP looks like this:
> 
> ####################
> #
> # Mailchimp
> #
> ####################
> 
> # header field X-MC-User has the customer-id
>   settag        _LRZ_MCID_              X-MC-User =~ /^([0-9a-z]{25})$/

Maybe we can consider tags and regex captures the same in the future..  they
are simply global variables.  In that case, a separate "settag" command
wouldn't even be needed, since you could just do the "header FOO
/(?<LRZMCID>bar)/" stanza.

Btw we already agreed somewhere that tags are not supposed to contain
underscores, since it's the tag delimiter itself.  It could be awkward to
parse and make sense.

> thing which I have not done yet, is using tags in regexps like the example
> above
> 
> body MATCHER /My name is ${FROM_NAME:NAME}/

There should consensus for a general form that will work well in the future
for all these causes.  If we consider that there is only one type of global
variable/tag, it would be simpler.

I really dislike anything resembling $ { } because they are valid regexp
meta characters, that's asking for some trouble.

If we start adding lot of "tag" stuff in the mix too, there will probably be
a horrible web of dependencies all around.  Not sure if there is anything we
can do up front to ease it.  Whole SA with it's arcane priority system and
dozen plugins doing their thing in independent ways would really need to be
rewritten from ground up.  And then we can likely forget any backwards
compatibility for people that are still using years old versions.  Any
takers?  :-D

> However, to fully create this design, I believe more time is needed and such
> functionality should not be incorporated into SpamAssassin until after the
> 4.0 release. First the handling of the tags must be improved, which is
> currently totally broken. I am still writing together where the problems
> with the tags are and how to fix them.

Good to see some enthusiasm.  Personally I will be satisfied after 4.0.0 is
released and will stay lurking and acting on any bugs, but that's probably
it on my behalf..  there's hobbies and then there's hobbies that start to
feel like a payless job..

Re: Capturing and reusing strings for matching across rules

Posted by Henrik K <he...@hege.li>.

On Sat, May 14, 2022 at 04:54:01PM +0200, Michael Storz wrote:
> And the last point is modifier functions, like Henrik
> implemented for the HEADER tag: :addr, :name, :trim, :base64, :domain, :lc,
> :uc, :pop, :first, you name it. It would be best if these modifier functions
> could be registered by a plugin and then used similarly to eval functions,
> which are also registered and then used.

I think you might be little bit confused about what is going on here.

First of all, no such thing as "tag modifiers" similar to :addr exists as
you imply.

Some tags in the past have indeed been made as if they act as a "function",
and take some parameter, for example:

 _HAMMYTOKENS(N)_  the N most significant hammy tokens (default, 5)
 _TESTS(,)_        tests hit separated by "," (or other separator)
 _HEADER(NAME)_    includes the value of a message header.  value is the same
                   as is found for header rules (see elsewhere in this doc)

All the :addr :name modifiers you refer to, are specific to HEADER RULE
SELECTOR ($pms->get) and have absolutely nothing to do with tags.

Proposing that all tags could accept a generic modifier is a completely
separate issue and the format would need to be specified.

Re: Capturing and reusing strings for matching across rules

Posted by Henrik K <he...@hege.li>.

On Sat, May 14, 2022 at 04:54:01PM +0200, Michael Storz wrote:
> It would be best if these modifier functions could be registered by a
> plugin and then used similarly to eval functions, which are also
> registered and then used.

Tags are SpamAssassin core (PerMsgStatus) function.  I always shun when I
see proposal to "pluginize" something.  It means creating more hooks and
stuff that can break.  The less public APIs/hooks we offer, the simpler it
is for us to maintain into the future.

Quoting you from recent post:

"Recently the question was asked why Check.pm is a plugin if it is not
optional.  Check.pm is a plugin so that you can implement more than one
check plugin."

You can implement more than one check plugin?  Do you realize what effort it
would take to create your "own check plugin"?  There's a bazillion things in
PerMsgStatus etc that Check depends on and vice versa.  It's almost
impossible for anyone outside of SA developers to understand the internal
dependencies and idiosyncrasies that have been piling up over the years.

In the past there was developer talk for example about all rule types being
plugins, yadda yadda.  I agree that in IDEAL world everything would be
pluginized and wildly customizable by anyone.  That's just not reality with
the resources the project has.  For example a small little change somewhere
can easily break sa-compile users, because that's pluginized and obscured
away doing it's own thing with rules etc, yet all that is highly coupled
into how PMS/Check internals work.  All of the SpamAssassin test suite
assumes that sa-compile is not used, it only has a single t/sa_compile.t
which doesn't test much.

> For a good and effizient use of lists a full rewrite of the WLBL.pm plugin
> is needed. E.g. enlist_addrlist can be used for Mailchimp because the
> customer id is a lowercase hex string, whereas the cid for Salesforce uses
> lowercase and uppercase chars. Therefore we need lists where we can specify
> the syntax of the list members.

Addrlist should remain as (email/domain) addrlist and not be expanded into a
generic [key/]value lookup store.

Anything that is a large list, should be looked up from an external database
(redis, sql, DNS etc).  As a last resort a generic FileDB.pm could be
provided that loads things into a memory hash.  But I guess reading a file
is few lines of code in a Plugin, so probably not needed.  As long as it
loads (and reloads?) the stuff pre-fork, so it doesn't waste memory.

Re: Capturing and reusing strings for matching across rules

Posted by Michael Storz <Mi...@lrz.de>.

Am 2022-05-14 14:27, schrieb Henrik K:
> On Fri, May 07, 2021 at 07:23:05PM +0300, Henrik K wrote:
>> On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
>> > > > >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
>> > > >
>> > > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
>> > >
>> > > It might as well be a tflag or something.  Why limit capturing to headers
>> > > only?
>> >
>> > I hadn't intended it to be limited to headers only, but I guess the syntax
>> > woudl have to be a little different for raw, body, full, etc, since they
>> > don't have a part keyword in the rule syntax.
>> 
>> Perl already has named capture groups as legit syntax, so it would be 
>> most
>> simple to actually use them.
>> 
>> https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)
>> 
>> header FROM_NAME /^From: "(?<NAME>\w+)/
>> 
>> ... just save the matches it in the rule code
>> $pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};
>> 
>> Then use it in a rule:
>> 
>> body MATCHER /My name is ${FROM_NAME:NAME}/
>> 
>> Don't nitpick on ${}, could be any similar syntax.  Code adds this 
>> rule to
>> FROM_NAME dependency chain.  When FROM_NAME hits, run MATCHER regex
>> (obviously first recompile the regexp).
> 
> Implementation pending:
> 
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7992

After Henrik has presented his implementation, I guess I have to tell 
you what I have been working on lately. I am working on a general Tag.pm 
Plugin. I took the Tagmatch.pm plugin from Paul and rewrote and extended 
it. With Paul's plugin you can do all kinds of operations on tags (I use 
tag instead of tagmatch because this looks similar to the header and 
body keywords). I extended it with a settag command that allows you to 
extract data from header, body or other tags via regexp and assign it to 
a tag. These tags can then be used as usual. Coming back to the Esp.pm 
plugin: for me the definition for an ESP looks like this:

####################
#
# Mailchimp
#
####################

# header field X-MC-User has the customer-id
   settag        _LRZ_MCID_              X-MC-User =~ /^([0-9a-z]{25})$/

# check of tag _LRZ_MCID_, different possibilities
# askdns        __LRZ_MCID_FOUND        _LRZ_MCID_.esp.dnsbl.lrz.de A 
127.0.0.5
# tag           __LRZ_MCID_FOUND        _LRZ_MCID_ =~ 
/^566e95f0930918dfb8d575a40$/
# header        __LRZ_MCID_FOUND        
eval:check_in_addrlist('_LRZ_MCID_', Mailchimp)
# tflags        __LRZ_MCID_FOUND        tagify
   header        __LRZ_MCID_FOUND        
eval:check_tag_in_addrlist('_LRZ_MCID_', Mailchimp)

# all Mailchimp emails have the X-Mailer header set to "MailChimp 
Mailer"
   header        __LRZ_XM_MAILCHIMP      X-Mailer =~ /MailChimp\sMailer/

# scoring rule
   meta          LRZ_MCID_FOUND          (__LRZ_MCID_FOUND  && 
__LRZ_XM_MAILCHIMP)
   score         LRZ_MCID_FOUND          7.2

# list of Mailchimp-IDs
   enlist_addrlist       (Mailchimp)     4ecb620f8ed264d1d84aa0981
   enlist_addrlist       (Mailchimp)     566e95f0930918dfb8d575a40

At the moment I am working on the tflags tagify. This should take a 
normal eval function and automatically allow the usage of tags as 
arguments. With the above example, it takes the eval function 
check_in_addrlist which normally would only allow strings as argument 
and make it work with tags instead. At the moment I have to use the eval 
function check_tag_in_addrlist where the ability to work with a tag is 
coded into the function. The other thing which I have not done yet, is 
using tags in regexps like the example above

body MATCHER /My name is ${FROM_NAME:NAME}/

I had the same idea, instead of the explicit representation _TAG_ for 
the tag TAG, you could use the alternative form ${TAG} in regular 
expressions (and maybe templates). And the last point is modifier 
functions, like Henrik implemented for the HEADER tag: :addr, :name, 
:trim, :base64, :domain, :lc, :uc, :pop, :first, you name it. It would 
be best if these modifier functions could be registered by a plugin and 
then used similarly to eval functions, which are also registered and 
then used.

For a good and effizient use of lists a full rewrite of the WLBL.pm 
plugin is needed. E.g. enlist_addrlist can be used for Mailchimp because 
the customer id is a lowercase hex string, whereas the cid for 
Salesforce uses lowercase and uppercase chars. Therefore we need lists 
where we can specify the syntax of the list members.

However, to fully create this design, I believe more time is needed and 
such functionality should not be incorporated into SpamAssassin until 
after the 4.0 release. First the handling of the tags must be improved, 
which is currently totally broken. I am still writing together where the 
problems with the tags are and how to fix them.

Michael

Capturing and reusing strings for matching across rules

Posted by Henrik K <he...@hege.li>.

On Fri, May 07, 2021 at 07:23:05PM +0300, Henrik K wrote:
> On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
> > > > >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> > > > 
> > > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
> > > 
> > > It might as well be a tflag or something.  Why limit capturing to headers
> > > only?
> > 
> > I hadn't intended it to be limited to headers only, but I guess the syntax
> > woudl have to be a little different for raw, body, full, etc, since they
> > don't have a part keyword in the rule syntax.
> 
> Perl already has named capture groups as legit syntax, so it would be most
> simple to actually use them.
> 
> https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)
> 
> header FROM_NAME /^From: "(?<NAME>\w+)/
> 
> ... just save the matches it in the rule code
> $pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};
> 
> Then use it in a rule:
> 
> body MATCHER /My name is ${FROM_NAME:NAME}/
> 
> Don't nitpick on ${}, could be any similar syntax.  Code adds this rule to
> FROM_NAME dependency chain.  When FROM_NAME hits, run MATCHER regex
> (obviously first recompile the regexp).

Implementation pending:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7992

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by John Hardin <jh...@impsec.org>.

On Fri, 7 May 2021, Loren Wilton wrote:

> The only nitpick I'd offer is that I'd prefer that the capture tokens be at a 
> single level, like rule names. So you might get:
>
>> $pms->{captured_values}->{NAME} = $+{NAME};
>> 
>> Then use it in a rule:
>> 
>> body MATCHER /My name is ${NAME}/

The risk with that is rules from multiple sources using colliding 
variable names.

   body MATCHER /My name is ${FROM_NAME:NAME}/

...is explicit and doesn't carry that risk.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Autocorrect is the work of the Devil, and whoever invented it
   should go straight to hello.                        -- Windy Wilson
-----------------------------------------------------------------------
  Tomorrow: the 76th anniversary of VE day

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Loren Wilton <lw...@earthlink.net>.

> Perl already has named capture groups as legit syntax, so it would be most
> simple to actually use them.
>
> https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)
>
> header FROM_NAME /^From: "(?<NAME>\w+)/

Good. I thought there was someting there, but I didn't remember the exact 
syntax and was too lazy to dig it out. Works for me.

>
> ... just save the matches it in the rule code
> $pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};
>
> Then use it in a rule:
>
> body MATCHER /My name is ${FROM_NAME:NAME}/
>
> Don't nitpick on ${}, could be any similar syntax.  Code adds this rule to
> FROM_NAME dependency chain.  When FROM_NAME hits, run MATCHER regex
> (obviously first recompile the regexp).

The only nitpick I'd offer is that I'd prefer that the capture tokens be at 
a single level, like rule names. So you might get:

> $pms->{captured_values}->{NAME} = $+{NAME};
>
> Then use it in a rule:
>
> body MATCHER /My name is ${NAME}/

        Loren

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Henrik K <he...@hege.li>.

On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
> > > >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> > > 
> > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
> > 
> > It might as well be a tflag or something.  Why limit capturing to headers
> > only?
> 
> I hadn't intended it to be limited to headers only, but I guess the syntax
> woudl have to be a little different for raw, body, full, etc, since they
> don't have a part keyword in the rule syntax.

Perl already has named capture groups as legit syntax, so it would be most
simple to actually use them.

https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)

header FROM_NAME /^From: "(?<NAME>\w+)/

... just save the matches it in the rule code
$pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};

Then use it in a rule:

body MATCHER /My name is ${FROM_NAME:NAME}/

Don't nitpick on ${}, could be any similar syntax.  Code adds this rule to
FROM_NAME dependency chain.  When FROM_NAME hits, run MATCHER regex
(obviously first recompile the regexp).

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Loren Wilton <lw...@earthlink.net>.

>> >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
>>
>> Would :capture play well with (e.g.) :addr, :name, :raw, etc?
>
> It might as well be a tflag or something.  Why limit capturing to headers
> only?

I hadn't intended it to be limited to headers only, but I guess the syntax 
woudl have to be a little different for raw, body, full, etc, since they 
don't have a part keyword in the rule syntax.

Originally I hadn't wanted to have the ":Capture" part, just have the 
capture assignment following the rule body. But then, how do you know if 
there is a capture assignment at the end? I didn't like the idea of trying 
to stick it into the match flags, especially for the (probably rare) case of 
multiple captures in a single rule.

I suppose that the rule scanner probably is looking past the flags that may 
follow a regex closing bracket, so would pick up an assignment if there was 
one there. So, for instance, this should work:

    body    SOME_RULE     /Your (\w+) Order/i $(__COMPANY)=\1

Alternately (which I don't much care for) we could have

    body    SOME_RULE     /Your (\w+) Order for \$(\d+)/i
    assign    __COMPANY,__AMOUNT

or keyworded

    assign    1=__COMPANY,2=__AMOUNT

What worries me about that sort of syntax is there is no real 
juxtapositioning requirement between a rule name definition and any modifier 
flag lines with the same rule name. The capture could be in a completely 
different rule file, and I suppose could even be before the defining rule by 
a thousand lines or so in a single file. But you pretty much need to see 
both the regex and the assignments to know what is happening to what. So 
allowing the assignments to be separated from the regex isn't necessarily 
good.

        Loren

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Henrik K <he...@hege.li>.

On Fri, May 07, 2021 at 06:08:00PM +0300, Henrik K wrote:
> 
> All this is petty details compared to the overall logic that is required in
> the background.

I'm mostly interested in tackling the meta-rule dependency mess right now:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735#c7

Variable capturing would be a logical enhancement that follows it, as the
supporting dependency logic would likely be already implemented then.

Any thoughts on implementing it efficiently are welcome.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Henrik K <he...@hege.li>.

On Fri, May 07, 2021 at 07:58:18AM -0700, John Hardin wrote:
> On Sun, 2 May 2021, Loren Wilton wrote:
> 
> > Now consider variable capture from the message:
> > 
> >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> 
> I like this syntax. I was thinking that the capture would be implied - any
> capturing group in a rule would automagically save its (single) match in a
> variable named after the rule (kept separate from the rule's score) for
> later use, but I like the explicit nature of this approach.
> 
> Would :capture play well with (e.g.) :addr, :name, :raw, etc?

It might as well be a tflag or something.  Why limit capturing to headers
only?

Or not a tflag at all.  Just a uncommon enclosure format that is parsed from
_any_ regex anywhere.

All this is petty details compared to the overall logic that is required in
the background.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Henrik K <he...@hege.li>.

On Sat, May 08, 2021 at 08:51:29AM -0700, Loren Wilton wrote:
> > An alternative approach is creating new strings from parsed data:
> > 
> > string  TO_BODY = TO:addr ":" BODY(500)
> > 
> > string  TO_BODY ~= /<whatever>/
> > 
> > the advantage of this is that there are no dependencies.
> > 
> > I'm thinking that BODY(500) would be a multi-line string constructed
> > from the first 500 byte of the rendered body. For me having multi-line
> > body matching is more important than any of this.
> 
> Hum, interesting.
> 
> As a small nitpick, maybe it's just my 40 years programming C++, but I'm
> bothered by using 'string' for both the creation operation and rule-parsing
> operation. I realize they are differentiated by the operator, but that just
> seems too easy to screw up, at least for me and my bad eyesight. Maybe
> 'makestring' and 'string' or some other non-identical pair of words,
> whatever seems nice.
> 
> I assume you would want BODY, RAWBODY, FULL, etc. as possibilities. At least
> I would.
> 
> I think that rather than the character count, I'd do a range: BODY(1:500) or
> the like. This lets you capture from an offset location.
> 
> Actually I think I'd prefer a regex there, at least as an alternative:
> BODY/(.{500})/m to get the equivalent first 500 characters. Or BODY/Your
> order number (\d+)/ to get a capture of a specific thing from the body.
> 
> Thoughts?

There's already a bug discussing something like this:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=4691

But really, perl/memory/CPU are not a bottleneck anymore, there is no
problem matching a ~50k text with regexp.  The simplest solution is just
ditching the "body chunking" completely in 4.0, or implement new methods for
full body matching like I proposed:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7745

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Loren Wilton <lw...@earthlink.net>.

> An alternative approach is creating new strings from parsed data:
>
> string  TO_BODY = TO:addr ":" BODY(500)
>
> string  TO_BODY ~= /<whatever>/
>
> the advantage of this is that there are no dependencies.
>
> I'm thinking that BODY(500) would be a multi-line string constructed
> from the first 500 byte of the rendered body. For me having multi-line
> body matching is more important than any of this.

Hum, interesting.

As a small nitpick, maybe it's just my 40 years programming C++, but I'm 
bothered by using 'string' for both the creation operation and rule-parsing 
operation. I realize they are differentiated by the operator, but that just 
seems too easy to screw up, at least for me and my bad eyesight. Maybe 
'makestring' and 'string' or some other non-identical pair of words, 
whatever seems nice.

I assume you would want BODY, RAWBODY, FULL, etc. as possibilities. At least 
I would.

I think that rather than the character count, I'd do a range: BODY(1:500) or 
the like. This lets you capture from an offset location.

Actually I think I'd prefer a regex there, at least as an alternative: 
BODY/(.{500})/m to get the equivalent first 500 characters. Or BODY/Your 
order number (\d+)/ to get a capture of a specific thing from the body.

Thoughts?

        Loren

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by RW <rw...@googlemail.com>.

On Fri, 7 May 2021 07:58:18 -0700 (PDT)
John Hardin wrote:

> On Sun, 2 May 2021, Loren Wilton wrote:
> 
> > Now consider variable capture from the message:
> >
> >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i
> > $(__COMPANY)=\1  
> 
> I like this syntax. I was thinking that the capture would be implied
> - any capturing group in a rule would automagically save its (single)
> match in a variable named after the rule (kept separate from the
> rule's score) for later use, but I like the explicit nature of this
> approach.

An alternative approach is creating new strings from parsed data:

string  TO_BODY = TO:addr ":" BODY(500)

string  TO_BODY ~= /<whatever>/

the advantage of this is that there are no dependencies.

I'm thinking that BODY(500) would be a multi-line string constructed
from the first 500 byte of the rendered body. For me having multi-line
body matching is more important than any of this.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by John Hardin <jh...@impsec.org>.

On Sun, 2 May 2021, Loren Wilton wrote:

> Now consider variable capture from the message:
>
>  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1

I like this syntax. I was thinking that the capture would be implied - any 
capturing group in a rule would automagically save its (single) match in a 
variable named after the rule (kept separate from the rule's score) for 
later use, but I like the explicit nature of this approach.

Would :capture play well with (e.g.) :addr, :name, :raw, etc?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Tomorrow: the 76th anniversary of VE day

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Loren Wilton <lw...@earthlink.net>.

> Now consider variable capture from the message:
>
>    header __SUB_CAP    Subject:Capture    /Your (\w+) Order/i 
> $(__COMPANY)=\1

The text above was intended to all appear on one line. "$(__COMPANY)=\1" 
followed /i.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Loren Wilton <lw...@earthlink.net>.

John Hardin wrote:

>> An awful lot I think could be done simply by having rules that can 
>> capture to named per-message-global variables, and allowing those 
>> variables to be used in other (or the same) rules.
>
> I've been wanting this for years.

Proposal for discussion:

Consider the following rules that could be in a user_prefs on a system that 
allows per-user rules:

    header __TO_ME   To:Addr    /<me\@myhost\.com>/
    meta    NOT_TO_ME    !__TO_ME

This could be useful for a single user, but obviously could not be 
site-wide, even if the site found such a thing useful, as all users have 
their own email addresses. The problem is obviously the hard-coded address 
string.

PMS has a number of per-message variables attached to it that can be used in 
the Perl code for various things. I'm proposing a way to add per-message 
constants and variables to this collection, and a way to access them in rule 
text.

Consider this variation on the above rules:

    variable __ME            /me\@myhost\.com/
    header    __TO_ME    To:Addr    /<$(__ME)>/
    meta    NOT_TO_ME    !__TO_ME

The format of the "variable" declaration is deliberately the simplified form 
of a rule declaration to simplify parsing and help file description 
considerations. Since it is actually defining a constant it could be called 
"constant". I called it "variable" because the thing it defines could vary 
from user to user and message to message.

Now we can put the __ME in each user_prefs and have a global rule in some 
site-global rules file. Each __ME instance would stick the string into a PMS 
variable for the duration of the message being parsed. The name of the PMS 
variable would be some variation on the "rule" name of the variable.

The text of the rules with a $(name) string in them, to be compiled, would 
have to have a way to reach into the relevent PMS variable to resolve that 
part of the string. Perhaps this means that rules containing variables could 
not be compiled. As the number of them is likely to be relatively small 
compared to the number of all rules, that is probably an acceptable 
tradeoff.

There are no execution ordering problems as long as 'variable' declarations 
are parsed before rules are run.

Now consider variable capture from the message:

    header __SUB_CAP    Subject:Capture    /Your (\w+) Order/i 
$(__COMPANY)=\1

Here we can define a PMS variable and populate it on a rule match. The rule 
can be used as any normal rule, it just additionally captures one or more 
variables while it runs.

We can use this is a match against some other message part with a rule 
similar to the __TO_ME rule above.  Obviously in this case we have rule 
ordering to consider, since we have to capture (or attempt the capture; the 
string may come up null) before we can run the rule depending on the string.

An alternative to the above capture symtax could be:

    header __COMPANY    Subject    /Your (\w+) Order/

The disadvantages I see to this are that you can only capture one string 
from the match, and you now have to wonder whether a rule name represents 
and integer or a string or both. I'm not in favor of the mess this could 
create.

Note that you could extend this fairly trivially (in a syntactic sense) to 
allow a match against multiple captured strings in a pattern:

    variable    __GROUP    /Order for $(__ME_) from $(__ORDER)/
    body        MY_ORDER    /$(__GROUP)/i    # __GROUP exists in the body

This also gets into problems of rule dependencies, since now some 'variable' 
declarations could not be executed until other rules have run. But it might 
be worth considering as an extension. Likely the mechanisms to implement the 
constant declaration, capture, and match code would be most all that was 
necessary to implement this too.

I think that the above would do most of what people would like to do.

Errors:

    A reference to an undefined variable would be a rule syntax error, 
invalidating that rule.
    A poorly formatted capture would be a rule syntax error
    A poorly formatted variable would essentially be a rule syntax error.
    Circular references would be a rule syntax error,  invalidating the rule 
where it was detected (which could then invalidate other depending rules)

A dependency on text from a net rule would push other depending rule 
evaluation to after the net rules returned results. I assume this is already 
done for meta rules that depend on net rules. But I could see this 
potentially being a pain point.

I'm not married to any of the above suggested syntax; it just seemed like a 
reasonable starting point, and simple to describe and to use. Discussion and 
suggestions on the formats are welcome.

I don't know what it would mean to put a 'variable' name in a meta. Likely 
it would be meaningless and probably disallowed. Likewise I don't see much 
point in assigning a score or a description to a varable, since they aren't 
really rules themselves.

Discussion is open!

        Loren

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by John Hardin <jh...@impsec.org>.

On Sat, 1 May 2021, Loren Wilton wrote:

>> Ideally rules could be written with some pseudo-language that could do
>> complex things, grabbing things into variables, modifying, comparing to
>> other things etc.  Then there wouldn't be any need for Perl plugins doing
>> some trivial stuff.
>
> An awful lot I think could be done simply by having rules that can capture to 
> named per-message-global variables, and allowing those variables to be used 
> in other (or the same) rules. The Perl RE syntax almost allows for this 
> as-is, so it shouldn't be a great stretch to modify the rules parser to allow 
> such things and capture the names of the variables and create the variables.

I've been wanting this for years.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The yardstick you should use when considering whether to support a
   given piece of legislation is "what if my worst enemy is chosen to
   administer this law?"
-----------------------------------------------------------------------
  Today: May Day - Remember 110 million people murdered by Communism

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Loren Wilton <lw...@earthlink.net>.

> Ideally rules could be written with some pseudo-language that could do
> complex things, grabbing things into variables, modifying, comparing to
> other things etc.  Then there wouldn't be any need for Perl plugins doing
> some trivial stuff.

An awful lot I think could be done simply by having rules that can capture 
to named per-message-global variables, and allowing those variables to be 
used in other (or the same) rules. The Perl RE syntax almost allows for this 
as-is, so it shouldn't be a great stretch to modify the rules parser to 
allow such things and capture the names of the variables and create the 
variables.

Obviously though this gets into complexities of rule dependencies and 
creates possible circular dependencies. I'd just treat any such as rule 
errors and let the user worry about it. Usage of undefined (ungenerated) 
variables could also just be treated as errors. But it would still take 
evaluating the dependency chains, and that could reorder rule priorities and 
the like, so it would take some thought.

There should also be some way to simply define variable values without 
parsing anything. For instance, I have a bunch of rules that are dependent 
on knowing my email address and the name string I expect in a proper To 
address, and seeing if the mail is sent to me or the like. On a system that 
allows user_prefs it would be trivial to have these strings placed there for 
use by various rules. Currently I have the strings hard-coded in a bunch of 
different rules, making them unuseful for anyone but me.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Henrik K <he...@hege.li>.

On Sat, May 01, 2021 at 04:01:05AM -0700, Loren Wilton wrote:
>
> Given that plugins are by and large the basis for (some) rules, and rule
> updates happen frequently, some thought should be given to treating at least
> those plugins called from rules as in fact being rules themselves, at least
> as far as packaging and distribution is concerned.

The problem is that distributing Perl code with sa-update is inherently
dangerous and should be considered deprecated (which is why I renamed
--allowplugins too).  Only official SA version releases go through proper
scrutiny to release code that can run with root permissions around the
world.  I don't think having a separate "plugins-release" would make any
difference, same problems remain with with vendors OS packaging etc.

Ideally rules could be written with some pseudo-language that could do
complex things, grabbing things into variables, modifying, comparing to
other things etc.  Then there wouldn't be any need for Perl plugins doing
some trivial stuff.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Loren Wilton <lw...@earthlink.net>.

> These kinds of changes just make you wonder what's the point of doing such
> plugins inside SA distribution..  if we ever do get 4.0 released, I really
> doubt if there are enough resources in the project to even release monthly
> updates after that..

Given that plugins are by and large the basis for (some) rules, and rule 
updates happen frequently, some thought should be given to treating at least 
those plugins called from rules as in fact being rules themselves, at least 
as far as packaging and distribution is concerned.

Obviously since interfaces can change from release to release this could get 
complicated. But just because something is complicated does not mean it is 
either insoluable nor necessarily unmaintainable. I admit though it has been 
long enough since I've coded Perl that I'm in no position to suggest a good 
method with any authority. The obvious possibilities are subdirectories for 
different releases (or at least at points of interface change), or 
alternately conditional code.

        Loren

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Henrik K <he...@hege.li>.

... and what would be the point of those monthly releases, if no distribution
will ever pick them up ...

On Sat, May 01, 2021 at 12:54:42PM +0300, Henrik Krohns wrote:
> 
> These kinds of changes just make you wonder what's the point of doing such
> plugins inside SA distribution..  if we ever do get 4.0 released, I really
> doubt if there are enough resources in the project to even release monthly
> updates after that..
> 
> 
> On Sat, May 01, 2021 at 09:41:28AM -0000, gbechis@apache.org wrote:
> > Author: gbechis
> > Date: Sat May  1 09:41:28 2021
> > New Revision: 1889364
> > 
> > URL: http://svn.apache.org/viewvc?rev=1889364&view=rev
> > Log:
> > cope with recent MailUP changes
> > 
> > Modified:
> >     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> > 
> > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> > URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm?rev=1889364&r1=1889363&r2=1889364&view=diff
> > ==============================================================================
> > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm (original)
> > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm Sat May  1 09:41:28 2021
> > @@ -388,20 +388,27 @@ sub esp_sendinblue_check {
> >  
> >  sub esp_mailup_check {
> >    my ($self, $pms) = @_;
> > -  my $mailup_id;
> > +  my ($mailup_id, $xabuse, $listid);
> >  
> >    my $rulename = $pms->get_current_eval_rule_name();
> >  
> >    # All Mailup emails have the X-CSA-Complaints header set to whitelist-complaints@eco.de
> >    my $xcsa = $pms->get("X-CSA-Complaints", undef);
> > -  if((not defined $xcsa) or ($xcsa !~ /whitelist-complaints\@eco\.de/)) {
> > +  if((not defined $xcsa) or ($xcsa !~ /complaints\@eco\.de/)) {
> >      return;
> >    }
> >    # All Mailup emails have the X-Abuse header that must match
> > -  $mailup_id = $pms->get("X-Abuse", undef);
> > -  return if not defined $mailup_id;
> > -  $mailup_id =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/;
> > -  $mailup_id = $2;
> > +  $xabuse = $pms->get("X-Abuse", undef);
> > +  return if not defined $xabuse;
> > +  if($xabuse =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/) {
> > +    $mailup_id = $2;
> > +  }
> > +  if(not defined $mailup_id) {
> > +    $listid = $pms->get("list-id", undef);
> > +    if($listid =~ /\<(\d+)\.\d+\>/) {
> > +      $mailup_id = $1;
> > +    }
> > +  }
> >    # if regexp doesn't match it's not Mailup
> >    return if not defined $mailup_id;
> >    chomp($mailup_id);
> >

Re: Esp module discussion

Posted by Axb <ax...@gmail.com>.

On 5/13/22 19:03, Henrik K wrote:
> On Sun, May 02, 2021 at 10:36:37AM +0200, Giovanni Bechis wrote:
>> On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
>>>
>>> These kinds of changes just make you wonder what's the point of doing such
>>> plugins inside SA distribution..  if we ever do get 4.0 released, I really
>>> doubt if there are enough resources in the project to even release monthly
>>> updates after that..
>>>
>> I have no problem in working out-of-tree but distros will probably
>> never package an external plugin and, in some cases, it may be better
>> an outdated plugin then no plugin at all.
>> Releasing every some months could help, distro will ship outdated packages
>> in any case but I do not think we can do much to improve that situation.
> 
> Bumping up this discussion due to:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7989
> 
> In current state the Esp.pm likely too vague for any users to use.  It
> should be clearly documented where the feed files are found, and what format
> they should be expected to be in.  An external github page related to the
> plugin does not constitute as documentation.
> 
> Somewhat agree with Michael's comment that the module shouldn't be an
> advertisement for Invaluement either.  Then again, at this time Invaluement
> doesn't even seem to be providing the data, so what use is for the module?
> 
> For the actual reply to Giovannis comment, "it may be better an outdated
> plugin then no plugin at all": Again, what use is the plugin as users
> actually need knowledge and effort to setup the feed downloads and make sure
> they work.  With the same effort they can download up-to-date module from
> Github into /etc/mail/spamassassin.
> 

+1 that this can be removed without any loss.

Axb

Re: Esp module discussion

Posted by Michael Storz <Mi...@lrz.de>.

Am 2022-05-13 19:03, schrieb Henrik K:
> On Sun, May 02, 2021 at 10:36:37AM +0200, Giovanni Bechis wrote:
>> On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
>> >
>> > These kinds of changes just make you wonder what's the point of doing such
>> > plugins inside SA distribution..  if we ever do get 4.0 released, I really
>> > doubt if there are enough resources in the project to even release monthly
>> > updates after that..
>> >
>> I have no problem in working out-of-tree but distros will probably
>> never package an external plugin and, in some cases, it may be better
>> an outdated plugin then no plugin at all.
>> Releasing every some months could help, distro will ship outdated 
>> packages
>> in any case but I do not think we can do much to improve that 
>> situation.
> 
> Bumping up this discussion due to:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7989
> 
> In current state the Esp.pm likely too vague for any users to use.  It
> should be clearly documented where the feed files are found, and what 
> format
> they should be expected to be in.  An external github page related to 
> the
> plugin does not constitute as documentation.
> 
> Somewhat agree with Michael's comment that the module shouldn't be an
> advertisement for Invaluement either.  Then again, at this time 
> Invaluement
> doesn't even seem to be providing the data, so what use is for the 
> module?
> 
> For the actual reply to Giovannis comment, "it may be better an 
> outdated
> plugin then no plugin at all": Again, what use is the plugin as users
> actually need knowledge and effort to setup the feed downloads and make 
> sure
> they work.  With the same effort they can download up-to-date module 
> from
> Github into /etc/mail/spamassassin.

 From my point of view, the Esp.pm plugin should not be a standard plugin 
of SpamAssassin. Standard plugins should provide tools or language 
extensions for the SA meta language. This plugin on the other hand 
provides highly specialized functions. Since new functions have to be 
programmed for each newly included ESP, the plugin needs a much faster 
release cycle than SpamAssassin can provide. Therefore, the maintenance 
of this plugin should be done outside SpamAssassin. Announcements about 
new releases should definitely be made on the SpamAssassin user list.

The inclusion of new ESPs is certainly necessary, as there are dozens of 
ESPs. For example, while I don't have a single spam mail from Maildome, 
Mailup or Mdirector, other ESPs like ActiveCampaign, eC-messenger (was 
eCircle), Klaviyo, Salesforce and Sparkpost are in my configuration. How 
to do the configuration of new ESPs generically with standard 
SpamAssassin means I will describe later when I have fully programmed 
the changes needed for this.

Michael

Re: Esp module discussion

Posted by John Hardin <jh...@impsec.org>.

On Sat, 14 May 2022, Bill Cole wrote:

> On 2022-05-14 at 06:52:02 UTC-0400 (Sat, 14 May 2022 12:52:02 +0200)
> <gi...@paclan.it>
> is rumored to have said:
>
>> Esp module may be effectively outdated and SpamAssassin releases are not frequent as I would love to, for me there is no problem in removing the module from SpamAssassin src tree and work on it out-of-tree.
>
> +1

+1


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Maxim V: Close air support and friendly fire should be easier to
            tell apart.
-----------------------------------------------------------------------
  Today: the 74th anniversary of Israel's independence

Re: Esp module discussion

Posted by Bill Cole <bi...@apache.org>.

On 2022-05-14 at 06:52:02 UTC-0400 (Sat, 14 May 2022 12:52:02 +0200)
 <gi...@paclan.it>
is rumored to have said:

> Esp module may be effectively outdated and SpamAssassin releases are not frequent as I would love to, for me there is no problem in removing the module from SpamAssassin src tree and work on it out-of-tree.

+1

Re: Esp module discussion

Posted by Henrik K <he...@hege.li>.

On Sat, May 14, 2022 at 10:57:19AM -0700, Matt Corallo wrote:
> 
> At the same time, the ESP module was one of the features of SA 4 that I was
> most excited about - the ability to learn on and classify specific senders
> even though they hide behind ESPs sounds like it could, at least in theory,
> be quite effective.
> 
> Of course updates for such a thing are going to be a problem, and I don't
> know enough about the SA updates architecture, but if that could provide
> feeds to keep the ESP match-and-decode (regex) rules up-to-date it seems
> like it'd be very powerful for semi-default-install SA, which isn't all that
> uncommon.

AFAIK most of the stuff that Esp.pm does could now be done in plain rules
only with the new 4.0.0 features.

Having the module or not is irrevant.  What matters is if Giovanni or
someone can provide the necessary rules for the "Esp" stuff for stock
sa-update.  The proper way looking up this stuff would be using a DNS query
anyway, maybe we can enhance askdns to look up the new regex capture stuff.

Re: Esp module discussion

Posted by Matt Corallo <sa...@mattcorallo.com>.

On 5/14/22 3:52 AM, giovanni@paclan.it wrote:
> Esp module may be effectively outdated and SpamAssassin releases are not frequent as I would love to,
> for me there is no problem in removing the module from SpamAssassin src tree and work on it
> out-of-tree.

As a user with no business commenting on dev@ or material knowledge of the SA architecture, this is 
a bit disappointing to me. Keeping external plugins up-to-date is somewhat of a pain, at least 
unless it comes via distro packages, which would also suffer the update problem.

At the same time, the ESP module was one of the features of SA 4 that I was most excited about - the 
ability to learn on and classify specific senders even though they hide behind ESPs sounds like it 
could, at least in theory, be quite effective.

Of course updates for such a thing are going to be a problem, and I don't know enough about the SA 
updates architecture, but if that could provide feeds to keep the ESP match-and-decode (regex) rules 
up-to-date it seems like it'd be very powerful for semi-default-install SA, which isn't all that 
uncommon.

- someone asking someone else to (continue) do(ing) work for them
Matt

Re: Esp module discussion

Posted by gi...@paclan.it.

On 5/13/22 19:03, Henrik K wrote:
> On Sun, May 02, 2021 at 10:36:37AM +0200, Giovanni Bechis wrote:
>> On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
>>>
>>> These kinds of changes just make you wonder what's the point of doing such
>>> plugins inside SA distribution..  if we ever do get 4.0 released, I really
>>> doubt if there are enough resources in the project to even release monthly
>>> updates after that..
>>>
>> I have no problem in working out-of-tree but distros will probably
>> never package an external plugin and, in some cases, it may be better
>> an outdated plugin then no plugin at all.
>> Releasing every some months could help, distro will ship outdated packages
>> in any case but I do not think we can do much to improve that situation.
> 
> Bumping up this discussion due to:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7989
> 
> In current state the Esp.pm likely too vague for any users to use.  It
> should be clearly documented where the feed files are found, and what format
> they should be expected to be in.  An external github page related to the
> plugin does not constitute as documentation.
>
Atm there are no public feeds afaik, I am using my private feeds.
I will work on improving documentation about how to create feeds and (maybe) publish my feeds.

 
> Somewhat agree with Michael's comment that the module shouldn't be an
> advertisement for Invaluement either.  Then again, at this time Invaluement
> doesn't even seem to be providing the data, so what use is for the module?
> 
> For the actual reply to Giovannis comment, "it may be better an outdated
> plugin then no plugin at all": Again, what use is the plugin as users
> actually need knowledge and effort to setup the feed downloads and make sure
> they work.  With the same effort they can download up-to-date module from
> Github into /etc/mail/spamassassin.
> 
Esp module may be effectively outdated and SpamAssassin releases are not frequent as I would love to,
for me there is no problem in removing the module from SpamAssassin src tree and work on it
out-of-tree.

 Giovanni

Esp module discussion

Posted by Henrik K <he...@hege.li>.

On Sun, May 02, 2021 at 10:36:37AM +0200, Giovanni Bechis wrote:
> On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
> > 
> > These kinds of changes just make you wonder what's the point of doing such
> > plugins inside SA distribution..  if we ever do get 4.0 released, I really
> > doubt if there are enough resources in the project to even release monthly
> > updates after that..
> > 
> I have no problem in working out-of-tree but distros will probably
> never package an external plugin and, in some cases, it may be better
> an outdated plugin then no plugin at all.
> Releasing every some months could help, distro will ship outdated packages
> in any case but I do not think we can do much to improve that situation.

Bumping up this discussion due to:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7989

In current state the Esp.pm likely too vague for any users to use.  It
should be clearly documented where the feed files are found, and what format
they should be expected to be in.  An external github page related to the
plugin does not constitute as documentation.

Somewhat agree with Michael's comment that the module shouldn't be an
advertisement for Invaluement either.  Then again, at this time Invaluement
doesn't even seem to be providing the data, so what use is for the module?

For the actual reply to Giovannis comment, "it may be better an outdated
plugin then no plugin at all": Again, what use is the plugin as users
actually need knowledge and effort to setup the feed downloads and make sure
they work.  With the same effort they can download up-to-date module from
Github into /etc/mail/spamassassin.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

Posted by Giovanni Bechis <gi...@paclan.it>.

On Sat, May 01, 2021 at 12:54:41PM +0300, Henrik Krohns wrote:
> 
> These kinds of changes just make you wonder what's the point of doing such
> plugins inside SA distribution..  if we ever do get 4.0 released, I really
> doubt if there are enough resources in the project to even release monthly
> updates after that..
> 
I have no problem in working out-of-tree but distros will probably
never package an external plugin and, in some cases, it may be better
an outdated plugin then no plugin at all.
Releasing every some months could help, distro will ship outdated packages
in any case but I do not think we can do much to improve that situation.

 Giovanni


> 
> On Sat, May 01, 2021 at 09:41:28AM -0000, gbechis@apache.org wrote:
> > Author: gbechis
> > Date: Sat May  1 09:41:28 2021
> > New Revision: 1889364
> > 
> > URL: http://svn.apache.org/viewvc?rev=1889364&view=rev
> > Log:
> > cope with recent MailUP changes
> > 
> > Modified:
> >     spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> > 
> > Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm
> > URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm?rev=1889364&r1=1889363&r2=1889364&view=diff
> > ==============================================================================
> > --- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm (original)
> > +++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm Sat May  1 09:41:28 2021
> > @@ -388,20 +388,27 @@ sub esp_sendinblue_check {
> >  
> >  sub esp_mailup_check {
> >    my ($self, $pms) = @_;
> > -  my $mailup_id;
> > +  my ($mailup_id, $xabuse, $listid);
> >  
> >    my $rulename = $pms->get_current_eval_rule_name();
> >  
> >    # All Mailup emails have the X-CSA-Complaints header set to whitelist-complaints@eco.de
> >    my $xcsa = $pms->get("X-CSA-Complaints", undef);
> > -  if((not defined $xcsa) or ($xcsa !~ /whitelist-complaints\@eco\.de/)) {
> > +  if((not defined $xcsa) or ($xcsa !~ /complaints\@eco\.de/)) {
> >      return;
> >    }
> >    # All Mailup emails have the X-Abuse header that must match
> > -  $mailup_id = $pms->get("X-Abuse", undef);
> > -  return if not defined $mailup_id;
> > -  $mailup_id =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/;
> > -  $mailup_id = $2;
> > +  $xabuse = $pms->get("X-Abuse", undef);
> > +  return if not defined $xabuse;
> > +  if($xabuse =~ /Please report abuse here: http\:\/\/.*\.musvc([0-9]+)\.net\/p\?c=([0-9]+)/) {
> > +    $mailup_id = $2;
> > +  }
> > +  if(not defined $mailup_id) {
> > +    $listid = $pms->get("list-id", undef);
> > +    if($listid =~ /\<(\d+)\.\d+\>/) {
> > +      $mailup_id = $1;
> > +    }
> > +  }
> >    # if regexp doesn't match it's not Mailup
> >    return if not defined $mailup_id;
> >    chomp($mailup_id);
> >