You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruleqa@spamassassin.apache.org by David Jones <dj...@ena.com.INVALID> on 2017/06/07 03:46:39 UTC

ruleqa status

I fixed a minor issue in a script so we got new rule scores
moments ago.  Hopefully my fix will make it run fine next
time in about 23 hours.

Jarif, it looks like you are running your masscheck a little
early so check your cron entry.  It may need to be moved
forward one hour.

It looks like KAM got his masscheck running today so we
should be up to 11 or 12 masscheckers.

My 'ena' corpus is growing pretty fast -- about 8,000 to
10,000 messages per day -- so we should stay above the
minimum ham/spam requirements going forward.  I am
up to 65,000 messages and have had to throttle back the
growth of ham to only a subset to keep it under control.

Does anyone have any ideas on how to remove duplicate
emails in Maildir?  I tried this:

https://github.com/kdeldycke/maildir-deduplicate

but it sees all of the messages a different due to common
headers.  I was trying to trim down the duplicates based
on Subject and similar/close body content.

Dave


Re: DKIM signature stripped (WAS: RE: ruleqa status)

Posted by Benny Pedersen <me...@junc.eu>.
Tom Hendrikx skrev den 2017-06-08 10:21:

> Authentication-Results: spamd2-us-west.apache.org (amavisd-new);
> 	dkim=pass (1024-bit key) header.d=ena.com
> 
> But as I can't trust that Authentication-Results header and am unable 
> to
> verify DKIM myself without the stripped DKIM header, your message would
> result in a DMARC reject. The address change is a valid, but not a nice
> solution.

to be tested, what happens if opendmarc trust that AR header ?

will it reject dmarc fails still ?

> I thought there was some traffic about missing DKIM headers on this 
> list
> a short while ago? Good that infra is working on this :)

is apache.org remove dkim headers for invalid dkim ?, even if its 
invalid apache.org should not remove it :/

something is not as it should here

DKIM signature stripped (WAS: RE: ruleqa status)

Posted by Tom Hendrikx <to...@whyscream.net>.

On 07-06-17 22:25, David Jones wrote:
> On 06/07/2017 03:17 PM, Kevin A. McGrail wrote:
>> On 6/7/2017 4:15 PM, David Jones wrote:
>>> That's interesting.  I can easily comply with doing nothing. :)
>>>
>>> So the volume/number of even the same message can help with accurate
>>> scoring?  That makes sense.
>> That would be my theory.  Messing with the dataset otherwise sounds
>> fishy...
>>
>>> P.S. Anyone know what is adding the ".INVALID" to my From: header?
>>> It's not happening anywhere else that I know of so it might be the
>>> apache.org ezmlm.  It's not in the headers when it leaves my mail
>>> platform.
>> It's a DMARC compliance thingy somewhere in one of the systems
>> involved in the process.
>>
>> Regards,
>> KAM
>>
>>
> My emails should have SPF pass and alignment plus DKIM pass and
> alignment on ena.com.  I did enable p=reject a few months ago for
> ena.com but we should have perfect DMARC before the email gets into the
> ezmlm server.  Coming out it should fail SPF but DKIM should still pass
> and align if it didn't have the ".INVALID".
> 

I don't see your DKIM signaure when receiving mail, but I do see that
the apache.org platform saw your DKIM signature and thought it was fine:

Authentication-Results: spamd2-us-west.apache.org (amavisd-new);
	dkim=pass (1024-bit key) header.d=ena.com

But as I can't trust that Authentication-Results header and am unable to
verify DKIM myself without the stripped DKIM header, your message would
result in a DMARC reject. The address change is a valid, but not a nice
solution.

I thought there was some traffic about missing DKIM headers on this list
a short while ago? Good that infra is working on this :)

Kind regards,

	Tom

Re: ruleqa status

Posted by David Jones <dj...@ena.com.INVALID>.
On 06/07/2017 03:37 PM, Kevin A. McGrail wrote:
> On 6/7/2017 4:25 PM, David Jones wrote:
>>>
>> My emails should have SPF pass and alignment plus DKIM pass and 
>> alignment on ena.com.  I did enable p=reject a few months ago for 
>> ena.com but we should have perfect DMARC before the email gets into 
>> the ezmlm server.  Coming out it should fail SPF but DKIM should still 
>> pass and align if it didn't have the ".INVALID". 
> 
> Really a question for Infra as we don't run the mailing lists, etc.
> 
> Regards,
> 
> KAM
> 
> 
I have opened an issue with Infra on this matter.

-- 
Dave

Re: ruleqa status

Posted by "Kevin A. McGrail" <ke...@mcgrail.com>.
On 6/7/2017 4:25 PM, David Jones wrote:
>>
> My emails should have SPF pass and alignment plus DKIM pass and 
> alignment on ena.com.  I did enable p=reject a few months ago for 
> ena.com but we should have perfect DMARC before the email gets into 
> the ezmlm server.  Coming out it should fail SPF but DKIM should still 
> pass and align if it didn't have the ".INVALID". 

Really a question for Infra as we don't run the mailing lists, etc.

Regards,

KAM


Re: ruleqa status

Posted by David Jones <dj...@ena.com.INVALID>.
On 06/07/2017 03:17 PM, Kevin A. McGrail wrote:
> On 6/7/2017 4:15 PM, David Jones wrote:
>> That's interesting.  I can easily comply with doing nothing. :)
>>
>> So the volume/number of even the same message can help with accurate 
>> scoring?  That makes sense.
> That would be my theory.  Messing with the dataset otherwise sounds 
> fishy...
> 
>> P.S. Anyone know what is adding the ".INVALID" to my From: header? 
>> It's not happening anywhere else that I know of so it might be the 
>> apache.org ezmlm.  It's not in the headers when it leaves my mail 
>> platform.
> It's a DMARC compliance thingy somewhere in one of the systems involved 
> in the process.
> 
> Regards,
> KAM
> 
> 
My emails should have SPF pass and alignment plus DKIM pass and 
alignment on ena.com.  I did enable p=reject a few months ago for 
ena.com but we should have perfect DMARC before the email gets into the 
ezmlm server.  Coming out it should fail SPF but DKIM should still pass 
and align if it didn't have the ".INVALID".

-- 
Dave

Re: ruleqa status

Posted by "Kevin A. McGrail" <ke...@mcgrail.com>.
On 6/7/2017 4:15 PM, David Jones wrote:
> That's interesting.  I can easily comply with doing nothing. :)
>
> So the volume/number of even the same message can help with accurate 
> scoring?  That makes sense.
That would be my theory.  Messing with the dataset otherwise sounds fishy...

> P.S. Anyone know what is adding the ".INVALID" to my From: header?  
> It's not happening anywhere else that I know of so it might be the 
> apache.org ezmlm.  It's not in the headers when it leaves my mail 
> platform.
It's a DMARC compliance thingy somewhere in one of the systems involved 
in the process.

Regards,
KAM



Re: ruleqa status

Posted by David Jones <dj...@ena.com.INVALID>.
On 06/07/2017 02:39 PM, Kevin A. McGrail wrote:
> On 6/7/2017 3:38 PM, Dave Jones wrote:
>> These are not exact file duplicates.  I get "waves" of both spam and 
>> ham that have different message IDs and recipients and a minor 
>> difference in the body for the unsubscribe/opt-out link.  I was hoping 
>> there would be an easy way to remove these very similar Maildir files. 
> 
> Likely no.  If they are indicative of real-world traffic, I would not 
> say they need to be deduped.  Is there something I am missing?
> 

That's interesting.  I can easily comply with doing nothing. :)

So the volume/number of even the same message can help with accurate 
scoring?  That makes sense.

P.S. Anyone know what is adding the ".INVALID" to my From: header?  It's 
not happening anywhere else that I know of so it might be the apache.org 
ezmlm.  It's not in the headers when it leaves my mail platform.

-- 
Dave

Re: ruleqa status

Posted by "Kevin A. McGrail" <ke...@mcgrail.com>.
On 6/7/2017 3:38 PM, Dave Jones wrote:
> These are not exact file duplicates.  I get "waves" of both spam and 
> ham that have different message IDs and recipients and a minor 
> difference in the body for the unsubscribe/opt-out link.  I was hoping 
> there would be an easy way to remove these very similar Maildir files. 

Likely no.  If they are indicative of real-world traffic, I would not 
say they need to be deduped.  Is there something I am missing?


Re: ruleqa status

Posted by Dave Jones <da...@apache.org>.
On 06/07/2017 11:16 AM, Jari Fredriksson wrote:
> 
>> David Jones <djones@ena.com.INVALID <ma...@ena.com.INVALID>> kirjoitti 7.6.2017 kello 6.46:
>>
>> Does anyone have any ideas on how to remove duplicate
>> emails in Maildir?  I tried this:
>>
>> https://github.com/kdeldycke/maildir-deduplicate <https://github.com/kdeldycke/maildir-deduplicate>
>>
>> but it sees all of the messages a different due to common
>> headers.  I was trying to trim down the duplicates based
>> on Subject and similar/close body content.
>>
>> Dave
>>
> 
> Hello!
> 
> If they are really duplicate (copies of each other) then fdupes is your best friend! I use that.
> 
These are not exact file duplicates.  I get "waves" of both spam and ham 
that have different message IDs and recipients and a minor difference in 
the body for the unsubscribe/opt-out link.  I was hoping there would be 
an easy way to remove these very similar Maildir files.

Dave

Re: ruleqa status

Posted by Jari Fredriksson <ja...@iki.fi>.
> David Jones <djones@ena.com.INVALID <ma...@ena.com.INVALID>> kirjoitti 7.6.2017 kello 6.46:
> 
> I fixed a minor issue in a script so we got new rule scores
> moments ago.  Hopefully my fix will make it run fine next
> time in about 23 hours.
> 
> Jarif, it looks like you are running your masscheck a little
> early so check your cron entry.  It may need to be moved
> forward one hour.
> 
> It looks like KAM got his masscheck running today so we
> should be up to 11 or 12 masscheckers.
> 
> My 'ena' corpus is growing pretty fast -- about 8,000 to
> 10,000 messages per day -- so we should stay above the
> minimum ham/spam requirements going forward.  I am
> up to 65,000 messages and have had to throttle back the
> growth of ham to only a subset to keep it under control.
> 
> Does anyone have any ideas on how to remove duplicate
> emails in Maildir?  I tried this:
> 
> https://github.com/kdeldycke/maildir-deduplicate <https://github.com/kdeldycke/maildir-deduplicate>
> 
> but it sees all of the messages a different due to common
> headers.  I was trying to trim down the duplicates based
> on Subject and similar/close body content.
> 
> Dave
> 

Hello!

If they are really duplicate (copies of each other) then fdupes is your best friend! I use that.

Jari Fredriksson
Bitwell Oy
+358 400 779 440
jarif@bitwell.fi <ma...@bitwell.fi>
Dev: https://www.bitwell.fi <https://www.bitwell.fi/>
Ops: https://www.bitwell.biz <https://www.bitwell.biz/>