You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2011/06/08 01:22:00 UTC
EXTRA_MPART_TYPE has frivolous piece in regex
From SA's perspective, since we don't use $& or its kin, a regex that
starts with an optional portion is the same as not having it. Therefore:
header EXTRA_MPART_TYPE Content-Type =~ /(?:\s*multipart\/)?.*
type=/i
is functionally equivalent to the faster/simpler:
header EXTRA_MPART_TYPE Content-Type =~ / type=/i
I could just remove that frivolous portion, but I wonder if something
else was intended...
Re: [SA-dev] EXTRA_MPART_TYPE has frivolous piece in regex
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
> Well, we are a FreeBSD environment mostly so it is my familiar territory,
> but I have no idea on how Jenkins works or where it lives.
Here's my crash course on Jenkins
You go to builds.apache.org
You login with your ASF LDAP credentials.
The PMC chair can add you to the admin group needed to access the build.
(instructions at http://wiki.apache.org/general/Hudson but Daryl, I
think you login into Minotaur/People and run
modify_appgroups.pl hudson-jobadmin --add=mmartinec
There are currently 3 builds: one for 3.3.x (solaris1 slave), one for
trunk (solaris1 slave) and one for trunk (freebsd slave).
The freebsd is a test slave.
https://builds.apache.org/job/SpamAssassin-trunk-FreeBSD/
If you go to
https://builds.apache.org/job/SpamAssassin-trunk-FreeBSD/53/consoleText
you can see the output. To me, it looks like we get a lot of spamd
errors and that perhaps something on the box is interfering with spamd
launching?
In the trunk, the script that runs on the freebsd slave is in
build/jenkins/run_build_freebsd
They dislike giving people shell but might be ok with giving out a shell
to try and figure out why it's not working. Eventually solaris1 zone
will be ramped down and we will have to use freebsd. So this is an
eventual thing to resolve.
I've updated http://wiki.apache.org/spamassassin/ContinuousTesting with
more of this information.
Regards,
KAM
Re: [SA-dev] EXTRA_MPART_TYPE has frivolous piece in regex
Posted by Mark Martinec <Ma...@ijs.si>.
> I agree completely but as of right now, jenkins is believed to be
> accurate and has been fixed. We do need a FreeBSD expert to weigh in on
> the freebsd slave for jenkins and why it is failing, though. Any
> Freebsd experts out there?
Well, we are a FreeBSD environment mostly so it is my familiar territory,
but I have no idea on how Jenkins works or where it lives.
Mark
Re: [SA-dev] EXTRA_MPART_TYPE has frivolous piece in regex
Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
> Sorry, Jenkins has cried wolf so often that I didn't notice.
I agree completely but as of right now, jenkins is believed to be
accurate and has been fixed. We do need a FreeBSD expert to weigh in on
the freebsd slave for jenkins and why it is failing, though. Any
Freebsd experts out there?
> Also, it
> passed make test TEST_FILES='t/basic_lint.t t/basic_meta.t'
It's considered bad form not to do a full make test before a commit.
http://wiki.apache.org/spamassassin/DevelopmentMode
Regards,
KAM
Re: [SA-dev] EXTRA_MPART_TYPE has frivolous piece in regex
Posted by Adam Katz <an...@khopis.com>.
>> On 6/7/2011 7:22 PM, Adam Katz wrote:
>>>> header EXTRA_MPART_TYPE Content-Type =~ / type=/i
>>
>> Its s/o is 0.086 on my data. Nothing depends on it, so I've just
>> cleared it out altogether (svn r1133500).
On 06/10/2011 05:37 AM, Mark Martinec wrote:
> This broke Jenkins two days ago, please remove the remains of the rule:
Sorry, Jenkins has cried wolf so often that I didn't notice. Also, it
passed make test TEST_FILES='t/basic_lint.t t/basic_meta.t'
Fixed now.
Re: [SA-dev] EXTRA_MPART_TYPE has frivolous piece in regex
Posted by Mark Martinec <Ma...@ijs.si>.
> On 6/7/2011 7:22 PM, Adam Katz wrote:
> >> header EXTRA_MPART_TYPE Content-Type =~ / type=/i
>
> On 06/07/2011 08:00 PM, Matt Kettler wrote:
> > A lot of times rules start off with the writer thinking about the
> > whole line, and after they put it down as a rule, there may be
> > frivolous patterns... I've caught myself doing it dozens of times. I
> > suspect (without really knowing) that happened here.
> >
> > Nice catch.
> >
> > +1 on making the change for efficiency.
> >
> > That said, I propose we should remove this rule entirely, or zero its
> > score.
> >
> > The S/O of this rule is now 0.100 in ruleqa, and was 0.283 in the
> > 3.3.0 corpus for set0. Clearly it is a lousy indicator of spam, but
> > has a force-assigned score of 1.0.
>
> Its s/o is 0.086 on my data. Nothing depends on it, so I've just
> cleared it out altogether (svn r1133500).
This broke Jenkins two days ago, please remove the remains of the rule:
$ perl -T -w ../spamassassin.raw -C log/test_rules_copy --siteconfigpath log/localrules.tmp -p log/test_default.cf -L --lint
Jun 10 14:30:08.727 [13978] warn: config: warning: description exists for non-existent rule EXTRA_MPART_TYPE
$ grep EXTRA_MPART_TYPE rules/*.cf
rules/30_text_de.cf:lang de describe EXTRA_MPART_TYPE Unnötige Parameter in "Content-Type"-Kopfzeile ("...type=")
rules/30_text_fr.cf:lang fr describe EXTRA_MPART_TYPE En-tête "Content-type:...type=" surnuméraire
rules/30_text_nl.cf:lang nl describe EXTRA_MPART_TYPE Header heeft een overbodige Content-type:...type=
rules/30_text_pl.cf:lang pl describe EXTRA_MPART_TYPE Nag³ówek zawiera nadmiarowy wpis Content-type:...type=
rules/50_scores.cf:score EXTRA_MPART_TYPE 1.0
Mark
Re: [SA-dev] EXTRA_MPART_TYPE has frivolous piece in regex
Posted by Adam Katz <an...@khopis.com>.
On 6/7/2011 7:22 PM, Adam Katz wrote:
>> header EXTRA_MPART_TYPE Content-Type =~ / type=/i
On 06/07/2011 08:00 PM, Matt Kettler wrote:
> A lot of times rules start off with the writer thinking about the
> whole line, and after they put it down as a rule, there may be
> frivolous patterns... I've caught myself doing it dozens of times. I
> suspect (without really knowing) that happened here.
>
> Nice catch.
>
> +1 on making the change for efficiency.
>
> That said, I propose we should remove this rule entirely, or zero its
> score.
>
> The S/O of this rule is now 0.100 in ruleqa, and was 0.283 in the
> 3.3.0 corpus for set0. Clearly it is a lousy indicator of spam, but
> has a force-assigned score of 1.0.
Its s/o is 0.086 on my data. Nothing depends on it, so I've just
cleared it out altogether (svn r1133500).
Re: EXTRA_MPART_TYPE has frivolous piece in regex
Posted by Matt Kettler <mk...@verizon.net>.
On 6/7/2011 7:22 PM, Adam Katz wrote:
> From SA's perspective, since we don't use $& or its kin, a regex that
> starts with an optional portion is the same as not having it. Therefore:
>
> header EXTRA_MPART_TYPE Content-Type =~ /(?:\s*multipart\/)?.*
> type=/i
>
> is functionally equivalent to the faster/simpler:
>
> header EXTRA_MPART_TYPE Content-Type =~ / type=/i
>
> I could just remove that frivolous portion, but I wonder if something
> else was intended...
I agree the two are equivalent. The starting "multipart" clause is made
optional with a ?, therefore can be eliminated. Once it is removed the
.* is now a leading optional clause, and can be dropped too.
A lot of times rules start off with the writer thinking about the whole
line, and after they put it down as a rule, there may be frivolous
patterns... I've caught myself doing it dozens of times. I suspect
(without really knowing) that happened here.
Nice catch.
+1 on making the change for efficiency.
That said, I propose we should remove this rule entirely, or zero its score.
The S/O of this rule is now 0.100 in ruleqa, and was 0.283 in the 3.3.0
corpus for set0. Clearly it is a lousy indicator of spam, but has a
force-assigned score of 1.0.
It was force-set to 1.0 due to FP problems back in the 3.1.x days, but
clearly the S/O of this rule is well under 0.5, so any significant
positive score is counter-productive to overall accuracy. Sure 1.0 isn't
much score, but with that low a S/O it is still very misplaced.
See history at:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110