You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Arthur Dent <sa...@troodos.demon.co.uk> on 2008/01/30 16:20:59 UTC

Help with SA / Procmail regex [OT]

Hello all,

Please forgive me for consuming off-topic bandwith with this question but I
don't really want to subscribe to the Procmail list for what is, I hope, a
very simple question.

I get a lot of spam that has a series of numbers in the "To" address, either
in the form To: 283840@mydomain.com or 46cf5cdr.762399400028872@mmydomain.com

I have written an SA rule in my local.cf which quite effectively deals with
this (header MY_ANTI_NUMERICAL_TO To =~ /\d{4,}\@mydomain\.com/) I score it at
3.5 and it works very well.

I am so pleased with this rule that I decided to give my poor old SA a
well-deserved rest from this rubbish and take these spams out at Procmail
time.

Trying to get a Procmail match using the same regex however has completely
eluded me and will quite possibly drive me mad...

I have tried:
:0:
* ^TO_/\d{4,}\@mydomain\.com/
In-SpaM

Or variants such as:
* ^TO_\d{4,}\@mydomain\.com
* ^TO_[0-9]{4,}\@mydomain\.com
* ^TO_.*\d{4,}\@mydomain\.com
* ^To: \d{4,}\@mydomain\.com
* ^TO().*[0-9]{4,}.*

Or about a thousand different combinations of the above.

None of them seem to match.

Regex is not my thing (as you can clearly see!) but I know this list is awash
with regex gurus (some of your stuff makes my eyes water) not to mention
Procmail experts so I hope someone can give me a give fix...

Thanks in advance.

AD


Re: Help with SA / Procmail regex [OT]

Posted by mouss <mo...@netoyen.net>.
Larry Nedry wrote:
> On 1/30/08 at 3:20 PM +0000 Arthur Dent wrote:
>   
>> I am so pleased with this rule that I decided to give my poor old SA a
>> well-deserved rest from this rubbish and take these spams out at Procmail
>> time.
>>     
>
> Keep in mind that there are a lot of mobile phones out there that have
> email addresses that begin with the phone number, i.e.
> 6165551212@domain.com.
>   

and who cares except mobile operators (and BlackBerry servers...)? My 
understanding is that we are talking about recipients here, not about 
senders. Otherwise, such addresses are indeed valid at other domains.

Re: Help with SA / Procmail regex [OT]

Posted by Larry Nedry <sp...@bluestreak.net>.
On 1/30/08 at 3:20 PM +0000 Arthur Dent wrote:
>I am so pleased with this rule that I decided to give my poor old SA a
>well-deserved rest from this rubbish and take these spams out at Procmail
>time.

Keep in mind that there are a lot of mobile phones out there that have
email addresses that begin with the phone number, i.e.
6165551212@domain.com.

Nedry

Re: Help with SA / Procmail regex [OT] - Back On-Topic (Almost!)

Posted by John Hardin <jh...@impsec.org>.
On Thu, 31 Jan 2008, Arthur Dent wrote:

> http://www.issociate.de/board/post/232336/Lock_failure_on_%22spamc.lock%22.html
> and
> http://www.ii.com/internet/robots/procmail/qs/#SA
>
> which tend to suggest that one should NOT put a lock on for SA 
> processing...

A lock file is not *needed* for spamc (or, strictly, for spamassassin 
either). Using one was recommended to you as a load-management mechanism, 
as you were worried about surges. I do that on my hosted virtual server 
MTA - it's just a little tight on memory.

The problem noted in your first link is probably permissions-related. If 
you use an explicit lockfile AFTER a DROPPRIVS directive, you could have 
permissions problems accessing the lockfile, particularly if procmail 
crashes with the lockfile in place.

--
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You are in a maze of twisty little protocols,
   all written by Microsoft.
----------------------------------------------------------------------
  2 days until the 5th anniversary of the loss of STS-107 Columbia

Re: Help with SA / Procmail regex - Back On-Topic

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2008-01-31 at 22:57 +0000, Arthur Dent wrote:
> On Thu, Jan 31, 2008 at 11:35:17PM +0100, Karsten Bräckelmann wrote:

> Thanks for this. I appreciate you taking the time to explain this. I respect
> your opinion so I have the locks in place just as you suggest.
> 
> Incidentally, with those locks in place (including on the copy section, and
> incorporating the "numerical" filter just as you wrote it, I have now had a
> range of real mail. This includes some regular mail, some "normal" spam and some
> "numerical" spam. None of them triggered that lock error.
> 
> As well as my test message, I piped an old "real" spam message through and
> they *both* caused the error.
> 
> It seems therefore that there is only a problem when Procmail tries to copy a message 
> that already exists into the backup mbox(?) Could this be?

No.  Procmail does by no means check the content. It merely appends.

Also, the contents of your test messages are entirely irrelevant, as
long as procmail happily processes them and the receipt matches.

> It's strange because there 
> is no such problem with the "IN-Spam" mbox. There can be as many copies as I like in there...
> I have checked and the permissions are identical (and same owner/group too). The *only* 
> difference is that the backup mbox lives on a different partition (ext3 fs mounted with 
> fstab). Could that be significant?

Nope. So this locking issue actually happens with any test message?
Still looks like some kind of permission problem the way you call
procmail for testing.

> Anyway, test messages aside, real mail (spam or otherwise) seems to be
> processed without problem, so, once again...
> 
> Thanks!

You're welcome.

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Help with SA / Procmail regex - Back On-Topic

Posted by Arthur Dent <sa...@troodos.demon.co.uk>.
On Thu, Jan 31, 2008 at 11:35:17PM +0100, Karsten Bräckelmann wrote:
> 
> > > > If there is even the slightest chance, your MTA might flood your MDA
> > > > with mail during a peek -- add some explicit locking here, even though
> > > > this is not a delivery receipt (explicit, because it is a filter, and
> > > > procmail can't lock the target file). IIRC the SA docs do have a lock
> > > > there, too.
> > > > 
> > > > :0 fw: spamassassin.lock
> > > > * < 512000
> > > > | spamc
> > > > 
> > > > If you don't lock, you might end up with more mail being piped to spamc
> > > > consecutively, than your spamd can generate children.
> > 
> > I've just been doing a little reading...
> > http://www.issociate.de/board/post/232336/Lock_failure_on_%22spamc.lock%22.html
> 
> Well, Nancy is wrong. :)  Anyway, she claims a lock file is not
> necessary, and refers to a page by her, discussing this this:
> 
> > http://www.ii.com/internet/robots/procmail/qs/#SA
> > 
> > which tend to suggest that one should NOT put a lock on for SA processing...
> 
> Even though the "discussion" is rather sparse, there are some comments
> outlining this. And while she indeed suggest in the post that one
> "should not" lock here, her discussion merely mentions that is is "not
> needed". The latter is correct, from a procmail POV.
> 
> 
> Also, here reasoning is slightly inconsistent. Again, the only comment
> regarding locking when filtering through spamc is, that it is not
> necessary. However, she *does* use locking with the spamassassin filter
> example. The reason being, that this comes with a penalty of drastically
> increased CPU usage. Right, again. And right to use a lock.
> 
> Now, as far as procmail is concerned, there is no difference between a
> slow, CPU intensive filter with a few messages -- and a speedy filter
> that is being called *much* more often, because you got more mail to
> process.
> 
> 
> I stand to what I said. Use locking. See the top-most part of this post.
> Locking can be useful, if you expect spikes [1] of messages. Simply to
> prevent concurrent calls to spamc from exhausting your spamd max child
> limit.
> 
> And yes, I do have seen this myself. Mail will go through unfiltered.
> 
>   guenther
> 
> 
> [1] In particular, running fetchmail will result in spikes. Every
>     $interval seconds.
> 
Hi Guenther,

Thanks for this. I appreciate you taking the time to explain this. I respect
your opinion so I have the locks in place just as you suggest.

Incidentally, with those locks in place (including on the copy section, and
incorporating the "numerical" filter just as you wrote it, I have now had a
range of real mail. This includes some regular mail, some "normal" spam and some
"numerical" spam. None of them triggered that lock error.

As well as my test message, I piped an old "real" spam message through and
they *both* caused the error.

It seems therefore that there is only a problem when Procmail tries to copy a message 
that already exists into the backup mbox(?) Could this be? It's strange because there 
is no such problem with the "IN-Spam" mbox. There can be as many copies as I like in there...
I have checked and the permissions are identical (and same owner/group too). The *only* 
difference is that the backup mbox lives on a different partition (ext3 fs mounted with 
fstab). Could that be significant?

Anyway, test messages aside, real mail (spam or otherwise) seems to be
processed without problem, so, once again...

Thanks!

Mark


Re: Help with SA / Procmail regex - Back On-Topic

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
> > > If there is even the slightest chance, your MTA might flood your MDA
> > > with mail during a peek -- add some explicit locking here, even though
> > > this is not a delivery receipt (explicit, because it is a filter, and
> > > procmail can't lock the target file). IIRC the SA docs do have a lock
> > > there, too.
> > > 
> > > :0 fw: spamassassin.lock
> > > * < 512000
> > > | spamc
> > > 
> > > If you don't lock, you might end up with more mail being piped to spamc
> > > consecutively, than your spamd can generate children.
> 
> I've just been doing a little reading...
> http://www.issociate.de/board/post/232336/Lock_failure_on_%22spamc.lock%22.html

Well, Nancy is wrong. :)  Anyway, she claims a lock file is not
necessary, and refers to a page by her, discussing this this:

> http://www.ii.com/internet/robots/procmail/qs/#SA
> 
> which tend to suggest that one should NOT put a lock on for SA processing...

Even though the "discussion" is rather sparse, there are some comments
outlining this. And while she indeed suggest in the post that one
"should not" lock here, her discussion merely mentions that is is "not
needed". The latter is correct, from a procmail POV.


Also, here reasoning is slightly inconsistent. Again, the only comment
regarding locking when filtering through spamc is, that it is not
necessary. However, she *does* use locking with the spamassassin filter
example. The reason being, that this comes with a penalty of drastically
increased CPU usage. Right, again. And right to use a lock.

Now, as far as procmail is concerned, there is no difference between a
slow, CPU intensive filter with a few messages -- and a speedy filter
that is being called *much* more often, because you got more mail to
process.


I stand to what I said. Use locking. See the top-most part of this post.
Locking can be useful, if you expect spikes [1] of messages. Simply to
prevent concurrent calls to spamc from exhausting your spamd max child
limit.

And yes, I do have seen this myself. Mail will go through unfiltered.

  guenther


[1] In particular, running fetchmail will result in spikes. Every
    $interval seconds.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Help with SA / Procmail regex [OT] - Back On-Topic (Almost!)

Posted by Arthur Dent <sa...@troodos.demon.co.uk>.
On Thu, Jan 31, 2008 at 09:18:38PM +0000, Arthur Dent wrote:

I've just been doing a little reading..

> > > # Spam filter
> > > 
> > > :0fw
> > > * < 256000
> > > | /usr/bin/spamc --username=mark
> > 
> > If there is even the slightest chance, your MTA might flood your MDA
> > with mail during a peek -- add some explicit locking here, even though
> > this is not a delivery receipt (explicit, because it is a filter, and
> > procmail can't lock the target file). IIRC the SA docs do have a lock
> > there, too.
> > 
> > :0 fw: spamassassin.lock
> > * < 512000
> > | spamc
> > 
> > If you don't lock, you might end up with more mail being piped to spamc
> > consecutively, than your spamd can generate children.
> 
I've just been doing a little reading...

http://www.issociate.de/board/post/232336/Lock_failure_on_%22spamc.lock%22.html
and
http://www.ii.com/internet/robots/procmail/qs/#SA

which tend to suggest that one should NOT put a lock on for SA processing...

I am no expert (that much should be obvious!) so I am happy to take your
advice on this.

Thanks again.

Mark


Re: Help with SA / Procmail regex [OT]

Posted by Arthur Dent <sa...@troodos.demon.co.uk>.
On Thu, Jan 31, 2008 at 09:48:20PM +0100, Karsten Bräckelmann wrote:
> Since I was about to hit the send button on this one, here is a shorter
> version of my original thoughts. Partially on-topic (yay!) again. ;)

Oops - I was busy replying to your last message and didn't see this one come
in...

> 
> Since the receipt did work during your testing -- assuming, you fed it
> some of the numerical spams, too -- the problem CAN NOT be with the
> receipt to catch the numerical To header spam early. The problem must be
> somewhere else with your live procmailrc file.
> 
> Did you try commenting out that block?
Ahh.. Interesting! I commented out that block and my test message still caused
the error message.... BUT... I think it's only the test message that causes
the error (I'm testing with "procmail < tmp/testmail" where testmail is a
made-up header + body with the right address. I've got a tail running on the
logfile so I'll just have to wait for one of those spams to come in (there's
never one when you need one!).


> 
> > <PMRC EXTRACT>
> > #Start processing section
> > # Put a copy of ALL mail into a backup folder
> > :0c:
> > *
> > /mnt/backup/mail/rawmail
> 
> Get rid of the superfluous, empty condition. I don't know if this might
> trigger the issue, and frankly doubt it. However, it is useless and
> irritating. ;)
Opps - cut + paste error sorry.

> 
> 
> > # Spam filter
> > 
> > :0fw
> > * < 256000
> > | /usr/bin/spamc --username=mark
> 
> If there is even the slightest chance, your MTA might flood your MDA
> with mail during a peek -- add some explicit locking here, even though
> this is not a delivery receipt (explicit, because it is a filter, and
> procmail can't lock the target file). IIRC the SA docs do have a lock
> there, too.
> 
> :0 fw: spamassassin.lock
> * < 512000
> | spamc
> 
> If you don't lock, you might end up with more mail being piped to spamc
> consecutively, than your spamd can generate children.

Ahhh that's very helpful! I have in the past had problems where SA has been
overwhelmed by a sudden rush of spam and has literally locked up the PC. My
solution was to limit my fetchmail (which runs every 3 minutes) to collecting
only 20 mails at a time - which seems to be manageable. But your solution
should be better. Thanks.

> 
> > The key thing is that first element. I am so paranoid about losing mail that the
> > very first thing I do is make a copy of each and every raw-unprocessed mail
> > and store it in a backup partition (which I archive to disk every month).
> > 
> > This now seems to cause a lock problem on these number spams (and only these)
> > in a way that it never did before:
> 
> I don't see why this should be limited to these particular messages.

No - nor me. But maybe it's just a problem with my test message (I'm still
waiting... Perhaps I'll just try an old "real" one out of my corpus).
> 
> 
> > procmail: Lock failure on "/mnt/backup/mail/rawmail.lock"
> > From imspammer@spamsrus.com  Thu Jan 31 18:00:43 2008
> >  Subject: [SPAM (XXX)] This is a test
> >   Folder: IN-Spam
> 
> The failing lock clearly is related to the backup copy receipt. However,
> it should *not* be requesting the lock itself.
> 
>   guenther

Your help is much appreciated.

Mark


Re: Help with SA / Procmail regex [OT]

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
Since I was about to hit the send button on this one, here is a shorter
version of my original thoughts. Partially on-topic (yay!) again. ;)


> Fantastic! This worked perfectly "out of the box"! (just edited mydomain).

Good. :)

> Thank you Guenther!
> 
> When I moved it from my test rig to the live server however I ran into a
> couple of problems. I still have a record locking problem (though not the same
> as the one you fixed). I have put below a more complete extract of my
> /etc/procmailrc file.

Since the receipt did work during your testing -- assuming, you fed it
some of the numerical spams, too -- the problem CAN NOT be with the
receipt to catch the numerical To header spam early. The problem must be
somewhere else with your live procmailrc file.

Did you try commenting out that block?


> <PMRC EXTRACT>
> #Start processing section
> # Put a copy of ALL mail into a backup folder
> :0c:
> *
> /mnt/backup/mail/rawmail

Get rid of the superfluous, empty condition. I don't know if this might
trigger the issue, and frankly doubt it. However, it is useless and
irritating. ;)


> # Spam filter
> 
> :0fw
> * < 256000
> | /usr/bin/spamc --username=mark

If there is even the slightest chance, your MTA might flood your MDA
with mail during a peek -- add some explicit locking here, even though
this is not a delivery receipt (explicit, because it is a filter, and
procmail can't lock the target file). IIRC the SA docs do have a lock
there, too.

:0 fw: spamassassin.lock
* < 512000
| spamc

If you don't lock, you might end up with more mail being piped to spamc
consecutively, than your spamd can generate children.


> The key thing is that first element. I am so paranoid about losing mail that the
> very first thing I do is make a copy of each and every raw-unprocessed mail
> and store it in a backup partition (which I archive to disk every month).
> 
> This now seems to cause a lock problem on these number spams (and only these)
> in a way that it never did before:

I don't see why this should be limited to these particular messages.


> procmail: Lock failure on "/mnt/backup/mail/rawmail.lock"
> From imspammer@spamsrus.com  Thu Jan 31 18:00:43 2008
>  Subject: [SPAM (XXX)] This is a test
>   Folder: IN-Spam

The failing lock clearly is related to the backup copy receipt. However,
it should *not* be requesting the lock itself.

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Help with SA / Procmail regex [OT]

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
NOTE  I just realized I used "consecutive" a couple times in my previous
      posts, whereas I did mean to say "concurrent". Bad, bad screw up.
      Sorry for any confusion caused.


On Thu, 2008-01-31 at 20:57 +0000, Arthur Dent wrote:
> On Thu, Jan 31, 2008 at 09:36:29PM +0100, Karsten Bräckelmann wrote:
> > On Thu, 2008-01-31 at 19:31 +0000, Arthur Dent wrote:
> > > Apologies to everyone for wasting OT bandwidth. I have just re-read man
> > > procmailrc and realised that a "copy" recipe is not considered to be a
> > > delivery action and therefore does not need a lock.
> > 
> > Uhm, where did you read that?  Clearly, even a copy can deliver mail.
> 
> From man procmailrc:
> 
> "You can tell procmail to treat a delivering recipe as if it were a
>  non-delivering recipe by specifying the ‘c’ flag on such a  recipe. 
>  This will make procmail generate a carbon copy of  the mail by
>  delivering it to this recipe, yet continue processing the rcfile."
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It is my understanding, that the above does *not* alter the meaning of
the procmail action. Delivery to a file still is delivering. IMHO this
merely tries to verbosely express the meaning of the phrase "copy".

Again, consider the basic copy example:

:0 c:
backup

I do insist on locking, because this receipt will append the message to
the mbox file. Which is a simple write modification, and needs to be
atomic. While one process (procmail) is writing to that file, no other
process (a second procmail, your IMAP server, etc) may work with that
file. Hence, locking.

Oh, and please note that the "identical" example from 'man procmailex'
explicitly assumes Maildir, and thus does not need locking...


> > > Removing the lock from my backup copy solves the problem.
> > 
> > As per the log snippet from your previous post: Procmail can't acquire
> > the lock for some reason. But it does not complain that it couldn't
> > deliver. The mail should have been appended to the backup mbox
> > regardless.
> > 
> > So why does the lock fail?  Wrong permissions for the dir?
> 
> Well, good question. All I can say is that *every* mail gets written to this
> backup file (and always has for about the last 2 years). It is only those
> mails that match the "numerical" spams that cause this error.

According to your other post, this probably is related to the "sample"
you used to check.

The backup copy still should be protected by a lock. Even more so given
your use of fetchmail, expecting to process up to 20 mails within a
short amount of time.

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Help with SA / Procmail regex [OT]

Posted by Arthur Dent <sa...@troodos.demon.co.uk>.
On Thu, Jan 31, 2008 at 09:36:29PM +0100, Karsten Bräckelmann wrote:
> On Thu, 2008-01-31 at 19:31 +0000, Arthur Dent wrote:
> > Apologies to everyone for wasting OT bandwidth. I have just re-read man
> > procmailrc and realised that a "copy" recipe is not considered to be a
> > delivery action and therefore does not need a lock.
> 
> Uhm, where did you read that?  Clearly, even a copy can deliver mail.
> 
From man procmailrc:

"You  can  tell  procmail  to treat a delivering recipe as if it were a
non-delivering recipe by
       specifying the ‘c’ flag on such a recipe.  This will make procmail
generate a  carbon  copy  of
       the mail by delivering it to this recipe, yet continue processing the
rcfile."

> 
> > Removing the lock from my backup copy solves the problem.
> 
> As per the log snippet from your previous post: Procmail can't acquire
> the lock for some reason. But it does not complain that it couldn't
> deliver. The mail should have been appended to the backup mbox
> regardless.
> 
> So why does the lock fail?  Wrong permissions for the dir?

Well, good question. All I can say is that *every* mail gets written to this
backup file (and always has for about the last 2 years). It is only those
mails that match the "numerical" spams that cause this error.

What I don't understand is that by the time the numerical rule is being
evaluated, the copy should already have been written. - Surely?

Hmmm....


Thanks again.

Mark
 

Re: Help with SA / Procmail regex [OT]

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2008-01-31 at 19:31 +0000, Arthur Dent wrote:
> Apologies to everyone for wasting OT bandwidth. I have just re-read man
> procmailrc and realised that a "copy" recipe is not considered to be a
> delivery action and therefore does not need a lock.

Uhm, where did you read that?  Clearly, even a copy can deliver mail.

:0 c:
backup


> Removing the lock from my backup copy solves the problem.

As per the log snippet from your previous post: Procmail can't acquire
the lock for some reason. But it does not complain that it couldn't
deliver. The mail should have been appended to the backup mbox
regardless.

So why does the lock fail?  Wrong permissions for the dir?

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Help with SA / Procmail regex [OT]

Posted by Arthur Dent <sa...@troodos.demon.co.uk>.
Apologies to everyone for wasting OT bandwidth. I have just re-read man
procmailrc and realised that a "copy" recipe is not considered to be a
delivery action and therefore does not need a lock.

Removing the lock from my backup copy solves the problem.

I just want to thank everyone for their patience on this one...

Thanks also to Mouss, Larry and others who replied to this thread with advice
or suggestions about my reasons for doing this.

Just to clarify:

I am a home user and I have (amongst others) a demon.co.uk account. This
allows me to create my own subdomain - mydomain.demon.co.uk which allows me
tremendous flexibility to create an unlimited number of email addresses (e.g.
myworkname@mydomain.demon.co.uk & myhomepersona@demon.co.uk & wifey@mydomain.demon.co.uk etc.) The downside of this of course is that it is a "catchall" account and, in any case, rejecting at MTA is not possible because the mail has already been accepted by my ISP.

SA has been VERY effective at stopping most of the rubbish, and now this
Procmail recipe takes some of the strain off.

The only FP I have had was where a round-robin email that included someone
called something like fredbloggs2000@hotmail.com in the distribution list
triggered. I fixed that by looking for 5 or more consecutive digits (my old SA
rule will suffice for any that are only 4...).

Thanks to all (especially Guenther) again.

Back on-topic now... (Promise!)

Mark


Re: Help with SA / Procmail regex [OT]

Posted by Arthur Dent <sa...@troodos.demon.co.uk>.
On Wed, Jan 30, 2008 at 09:56:49PM +0100, Karsten Bräckelmann wrote:
> On Wed, 2008-01-30 at 20:12 +0000, Arthur Dent wrote:
> > On Wed, Jan 30, 2008 at 08:22:55PM +0100, Karsten Bräckelmann wrote:

Sorry Chaps, I had no idea this topic would grow so much.

> 
> 
> Do file-locking, when delivering to mbox files. Don't for Maildir.
I do use mbox not maildir.
> 
> When delivering, let procmail figure out the appropriate lock. In cases
> where you don't deliver (filters, variable setting, pure nested logic,
> whatever), either don't lock, or give a specific lock file IFF you want
> it. For example, when filtering through SpamAssassin, you want a lock,
> to prevent too many consecutive SA running.

I still don't quite get it (see below).

> > This is what I have so far:
> > :0:
> 
> Don't lock here. :)
> 
> > * ^TO_.*[0-9][0-9][0-9][0-9].*@mydomain.*
> 
> It is useless to continue matching, if you don't care about the content.
> The trailing '.*' does nothing, but keep procmail busy.
> 
> > {
> >   :0 fhw
> >   * ^Subject:\/.*
> >   | formail -i "Subject: [SPAM (XXX)] $MATCH"
> > }
> > 
> > :0:
> > * ^Subject:.*\[SPAM \(XXX\)\].*
> > IN-Spam 
> > 
> > I could not find a way to combine the matching, subject rewriting and moving
> > into one step (is this possible?).
> 
> No. There may be exactly one action. Either a filter, or delivering.
> However, unconditional receipts pretty much do exactly that.
> 
> > The above method works but give me the following error message in my Procmail
> > log:
> > 
> > procmail: Extraneous locallockfile ignored
> > From imspammer@spamsrus.com  Wed Jan 30 19:47:28 2008
> >  Subject: [SPAM (XXX)]  This is a test
> >   Folder: IN-Spam
> 
> procmail is right there. ;)  No need for locking...
> 
> Below is a receipt, that should do what you want. Again: This is NOT
> tested. The comments should be self-explanatory, as to when we want or
> need locking. Also, it should add the spam tag to the subject even if
> the original message does not have a Subject header.
> 
> Basically, we don't need to check for anything inside the loop. The one
> condition exists only, to grab the existing Subject.
> 
> 
> # catch bad numerical To: headers
> # no locking here, since this block does not deliver
> :0
> * ^TO_.*[0-9][0-9][0-9][0-9].*@mydomain
> {
>   # grab the subject, if any
>   :0
>   * ^Subject: \/.*
>   { SUBJECT = "${MATCH}" }
> 
>   # add the subject spam tag in either case
>   :0 fhw
>   | formail -i "Subject: [SPAM (XXX)] ${SUBJECT}"
> 
>   # deliver unconditionally (since we're here, we know we want to treat
>   # it as spam), DO LOCK here
>   :0 :
>   spam
> }

Fantastic! This worked perfectly "out of the box"! (just edited mydomain).

Thank you Guenther!

When I moved it from my test rig to the live server however I ran into a
couple of problems. I still have a record locking problem (though not the same
as the one you fixed). I have put below a more complete extract of my
/etc/procmailrc file.

<PMRC EXTRACT>
#Start processing section
# Put a copy of ALL mail into a backup folder
:0c:
*
/mnt/backup/mail/rawmail

# First remove some known spam patterns
# catch bad numerical To: headers
# no locking here, since this block does not deliver
:0
* ^TO_.*[0-9][0-9][0-9][0-9][0-9].*@mydomain
{
  # grab the subject, if any
  :0
  * ^Subject: \/.*
  { SUBJECT = "${MATCH}" }

  # add the subject spam tag in either case
  :0 fhw
  | formail -i "Subject: [SPAM (XXX)] ${SUBJECT}"

  # deliver unconditionally (since we're here, we know we want to treat
  # it as spam), DO LOCK here
  :0 :
  IN-Spam
}

# Virus filter

:0fw
| /usr/local/bin/clamassassin
:0:
* ^X-Virus-Status: Yes
IN-virus

# Spam filter

:0fw
* < 256000
| /usr/bin/spamc --username=mark

:0: 
* ^X-Spam-Status: Yes 
IN-Spam

### Sort Mail in various folders ###
...
</PMRC EXTRACT>

The key thing is that first element. I am so paranoid about losing mail that the
very first thing I do is make a copy of each and every raw-unprocessed mail
and store it in a backup partition (which I archive to disk every month).

This now seems to cause a lock problem on these number spams (and only these)
in a way that it never did before:

procmail: Lock failure on "/mnt/backup/mail/rawmail.lock"
From imspammer@spamsrus.com  Thu Jan 31 18:00:43 2008
 Subject: [SPAM (XXX)] This is a test
  Folder: IN-Spam

I know I something wrong but what. To me, the copy-to-backup should be locked
while it is delivering (no?) and then the normal processing should continue as
if that never happened... or am I missing something?

Sorry to to continue this OT thread and take it yet further OT, but I am soooo
nearly there...

Thanks again!

Mark


Re: Help with SA / Procmail regex [OT]

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Wed, 2008-01-30 at 20:12 +0000, Arthur Dent wrote:
> On Wed, Jan 30, 2008 at 08:22:55PM +0100, Karsten Bräckelmann wrote:

> > :0 :
> > * ^TO_.*[0-9][0-9][0-9][0-9]@mydomain\.com
> > spam/to-numerical
> 
> Brilliant! It works! Thank you so much Guenther (and others who have replied
> off-list to help me with this).
> 
> If I can have your assistance with just one other thing however. I'm afraid I
> really don't understand file-locking in Procmail (I have read the man and web
> pages) but I'm still baffled.

Do file-locking, when delivering to mbox files. Don't for Maildir.

When delivering, let procmail figure out the appropriate lock. In cases
where you don't deliver (filters, variable setting, pure nested logic,
whatever), either don't lock, or give a specific lock file IFF you want
it. For example, when filtering through SpamAssassin, you want a lock,
to prevent too many consecutive SA running.


> Now that I have it matching on the appropriate spam I want to rewrite the
> subject (in the same style as I have SA doing) and place it in the same spam
> folder as SA does so that I have it present for my nightly cron sa-learn job.
> 
> This is what I have so far:
> :0:

Don't lock here. :)

> * ^TO_.*[0-9][0-9][0-9][0-9].*@mydomain.*

It is useless to continue matching, if you don't care about the content.
The trailing '.*' does nothing, but keep procmail busy.

> {
>   :0 fhw
>   * ^Subject:\/.*
>   | formail -i "Subject: [SPAM (XXX)] $MATCH"
> }
> 
> :0:
> * ^Subject:.*\[SPAM \(XXX\)\].*
> IN-Spam 
> 
> I could not find a way to combine the matching, subject rewriting and moving
> into one step (is this possible?).

No. There may be exactly one action. Either a filter, or delivering.
However, unconditional receipts pretty much do exactly that.

> The above method works but give me the following error message in my Procmail
> log:
> 
> procmail: Extraneous locallockfile ignored
> From imspammer@spamsrus.com  Wed Jan 30 19:47:28 2008
>  Subject: [SPAM (XXX)]  This is a test
>   Folder: IN-Spam

procmail is right there. ;)  No need for locking...

Below is a receipt, that should do what you want. Again: This is NOT
tested. The comments should be self-explanatory, as to when we want or
need locking. Also, it should add the spam tag to the subject even if
the original message does not have a Subject header.

Basically, we don't need to check for anything inside the loop. The one
condition exists only, to grab the existing Subject.


# catch bad numerical To: headers
# no locking here, since this block does not deliver
:0
* ^TO_.*[0-9][0-9][0-9][0-9].*@mydomain
{
  # grab the subject, if any
  :0
  * ^Subject: \/.*
  { SUBJECT = "${MATCH}" }

  # add the subject spam tag in either case
  :0 fhw
  | formail -i "Subject: [SPAM (XXX)] ${SUBJECT}"

  # deliver unconditionally (since we're here, we know we want to treat
  # it as spam), DO LOCK here
  :0 :
  spam
}



> Help me fix this and I promise I'll be right back on-topic with all future
> posts (honest!).
> 
> > When writing procmail receipts, it often helps a lot to remind oneself
> > of procmails supported REs by checking 'man procmailrc' or some docs on
> > the net. This one is a nice reference:
> >   http://partmaps.org/era/procmail/quickref.html
> > 
> > Hope the above helps. :)
> 
> Oh it surely has!
> 
> Many thanks

You're welcome. Hope this helps, too. ;)

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Help with SA / Procmail regex [OT]

Posted by Arthur Dent <sa...@troodos.demon.co.uk>.
On Wed, Jan 30, 2008 at 08:22:55PM +0100, Karsten Bräckelmann wrote:
> On Wed, 2008-01-30 at 15:20 +0000, Arthur Dent wrote:
> 
> The // are matched literally, they are not used as an RE delimiter. The
> entire string after the asterisk is a regex anyway. Lose the slashes.
> 
> Procmail does not know this kind of bounded repetition. {4,} again will
> be matched literally. Also, procmail does not know classes like \d. You
> will have to write it, just as you mean it.
>   [0-9][0-9][0-9][0-9]
> 
> Oh, and the @ must be escaped in SA, because it denotes a LIST in Perl.
> This is Perl specific, not RE related. Don't escape it in procmail. :)
> 
> Also, this matches the entire header line -- as opposed to SA header
> rules, where the RE matches on the headers value only. Thus, since this
> is bound at the beginning of the line, you will need to add wildcards
> for optional stuff between the To: and the numbers.
> 
> 
> Something like this should work.  NOTE: Untested.
> 
> :0 :
> * ^TO_.*[0-9][0-9][0-9][0-9]@mydomain\.com
> spam/to-numerical
> 
Brilliant! It works! Thank you so much Guenther (and others who have replied
off-list to help me with this).

If I can have your assistance with just one other thing however. I'm afraid I
really don't understand file-locking in Procmail (I have read the man and web
pages) but I'm still baffled.

Now that I have it matching on the appropriate spam I want to rewrite the
subject (in the same style as I have SA doing) and place it in the same spam
folder as SA does so that I have it present for my nightly cron sa-learn job.

This is what I have so far:
:0:
* ^TO_.*[0-9][0-9][0-9][0-9].*@mydomain.*
{
  :0 fhw
  * ^Subject:\/.*
  | formail -i "Subject: [SPAM (XXX)] $MATCH"
}

:0:
* ^Subject:.*\[SPAM \(XXX\)\].*
IN-Spam 

I could not find a way to combine the matching, subject rewriting and moving
into one step (is this possible?).

The above method works but give me the following error message in my Procmail
log:

procmail: Extraneous locallockfile ignored
From imspammer@spamsrus.com  Wed Jan 30 19:47:28 2008
 Subject: [SPAM (XXX)]  This is a test
  Folder: IN-Spam

Help me fix this and I promise I'll be right back on-topic with all future
posts (honest!).

> When writing procmail receipts, it often helps a lot to remind oneself
> of procmails supported REs by checking 'man procmailrc' or some docs on
> the net. This one is a nice reference:
>   http://partmaps.org/era/procmail/quickref.html
> 
> Hope the above helps. :)

Oh it surely has!

Many thanks


AD

Re: Help with SA / Procmail regex [OT]

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Wed, 2008-01-30 at 15:20 +0000, Arthur Dent wrote:
> Please forgive me for consuming off-topic bandwith with this question but I
> don't really want to subscribe to the Procmail list for what is, I hope, a
> very simple question.
> 
> I get a lot of spam that has a series of numbers in the "To" address, either
> in the form To: 283840@mydomain.com or 46cf5cdr.762399400028872@mmydomain.com
> 
> I have written an SA rule in my local.cf which quite effectively deals with
> this (header MY_ANTI_NUMERICAL_TO To =~ /\d{4,}\@mydomain\.com/) I score it at
> 3.5 and it works very well.
> 
> I am so pleased with this rule that I decided to give my poor old SA a
> well-deserved rest from this rubbish and take these spams out at Procmail
> time.
> 
> Trying to get a Procmail match using the same regex however has completely
> eluded me and will quite possibly drive me mad...
> 
> I have tried:
> :0:
> * ^TO_/\d{4,}\@mydomain\.com/
> In-SpaM

The // are matched literally, they are not used as an RE delimiter. The
entire string after the asterisk is a regex anyway. Lose the slashes.

Procmail does not know this kind of bounded repetition. {4,} again will
be matched literally. Also, procmail does not know classes like \d. You
will have to write it, just as you mean it.
  [0-9][0-9][0-9][0-9]

Oh, and the @ must be escaped in SA, because it denotes a LIST in Perl.
This is Perl specific, not RE related. Don't escape it in procmail. :)

Also, this matches the entire header line -- as opposed to SA header
rules, where the RE matches on the headers value only. Thus, since this
is bound at the beginning of the line, you will need to add wildcards
for optional stuff between the To: and the numbers.


Something like this should work.  NOTE: Untested.

:0 :
* ^TO_.*[0-9][0-9][0-9][0-9]@mydomain\.com
spam/to-numerical


> Or variants such as:

> Or about a thousand different combinations of the above.
> 
> None of them seem to match.
> 
> Regex is not my thing (as you can clearly see!) but I know this list is awash
> with regex gurus (some of your stuff makes my eyes water) not to mention
> Procmail experts so I hope someone can give me a give fix...

When writing procmail receipts, it often helps a lot to remind oneself
of procmails supported REs by checking 'man procmailrc' or some docs on
the net. This one is a nice reference:
  http://partmaps.org/era/procmail/quickref.html

Hope the above helps. :)

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Help with SA / Procmail regex [OT]

Posted by mouss <mo...@netoyen.net>.
Arthur Dent wrote:
> Hello all,
>
> Please forgive me for consuming off-topic bandwith with this question but I
> don't really want to subscribe to the Procmail list for what is, I hope, a
> very simple question.
>
> I get a lot of spam that has a series of numbers in the "To" address, either
> in the form To: 283840@mydomain.com or 46cf5cdr.762399400028872@mmydomain.com
>   


are you accepting mail to all addresses (aka catchall)? I see a lot of 
such mail, but the pattern is also in the envelope recipient, so it can 
easily be stopped in the MTA (I am currently delivering it to a spamtrap 
just for experimentation).
> [snip]
>   

Re: Help with SA / Procmail regex [OT]

Posted by mouss <mo...@netoyen.net>.
jp wrote:
> Another option, if you are using postfix, is to setup mydomain.com as a 
> virtual. Then in /etc/postfix/virtuals, you can
>
> mydomain.com virtual
> @mydomain.com junk@realmailbox.com
> myreadaddress@mydomain.com you@realmailbox.com
> myreadaddress2@mydomain.com you@realmailbox.com
>
>  and so on... You can ommit the wildcard one if you just want to not 
> accept unspecified addresses at your domain.
>   

This is bad advice. People do mistype addresses so do not trap every 
invalid address. The Right Option (TM) is to not use a catchall alias at 
all. If a catchall alias is needed, then be prepared to get a lot of 
junk. you can trap few addresses or address forms , but make sure you 
understand the risks.


Now this is becoming off topic IMHO. better suicide($this->thread) ;-p


Re: Help with SA / Procmail regex [OT]

Posted by jp <jp...@saucer.midcoast.com>.
Another option, if you are using postfix, is to setup mydomain.com as a 
virtual. Then in /etc/postfix/virtuals, you can

mydomain.com virtual
@mydomain.com junk@realmailbox.com
myreadaddress@mydomain.com you@realmailbox.com
myreadaddress2@mydomain.com you@realmailbox.com

 and so on... You can ommit the wildcard one if you just want to not 
accept unspecified addresses at your domain.

On Wed, Jan 30, 2008 at 03:20:59PM +0000, Arthur Dent wrote:
> Hello all,
> 
> Please forgive me for consuming off-topic bandwith with this question but I
> don't really want to subscribe to the Procmail list for what is, I hope, a
> very simple question.
> 
> I get a lot of spam that has a series of numbers in the "To" address, either
> in the form To: 283840@mydomain.com or 46cf5cdr.762399400028872@mmydomain.com
> 
> I have written an SA rule in my local.cf which quite effectively deals with
> this (header MY_ANTI_NUMERICAL_TO To =~ /\d{4,}\@mydomain\.com/) I score it at
> 3.5 and it works very well.
> 
> I am so pleased with this rule that I decided to give my poor old SA a
> well-deserved rest from this rubbish and take these spams out at Procmail
> time.
> 
> Trying to get a Procmail match using the same regex however has completely
> eluded me and will quite possibly drive me mad...
> 
> I have tried:
> :0:
> * ^TO_/\d{4,}\@mydomain\.com/
> In-SpaM
> 
> Or variants such as:
> * ^TO_\d{4,}\@mydomain\.com
> * ^TO_[0-9]{4,}\@mydomain\.com
> * ^TO_.*\d{4,}\@mydomain\.com
> * ^To: \d{4,}\@mydomain\.com
> * ^TO().*[0-9]{4,}.*
> 
> Or about a thousand different combinations of the above.
> 
> None of them seem to match.
> 
> Regex is not my thing (as you can clearly see!) but I know this list is awash
> with regex gurus (some of your stuff makes my eyes water) not to mention
> Procmail experts so I hope someone can give me a give fix...
> 
> Thanks in advance.
> 
> AD
> 



-- 
/*
Jason Philbrook   |   Midcoast Internet Solutions - Wireless and DSL
    KB1IOJ        |   Broadband Internet Access, Dialup, and Hosting 
 http://f64.nu/   |   for Midcoast Maine    http://www.midcoast.com/
*/