You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Martin Gregorie <ma...@gregorie.org> on 2013/06/01 01:28:00 UTC

Re: Rule to scan for .html attachments?

On Fri, 2013-05-31 at 17:48 -0400, Andrew Talbot wrote:
> Thank you for your response. The original test was using a file
> arbitrarily named aa.html .. It still doesn't work with the rewrite
> you provided :/ 
> 
I did wonder. Its absolutely essential to have at least one genuine
message to test your rule against. This would preferably be a spam
message you've recently received. Failing that it would be a message
you've written and then sent to yourself. Either way *you must not
modify the format of the message*[1] that arrived in your mailbox
because that's the only way to guarantee that your test data is as close
as possible to what will be passing through your production SA
installation.

If you merely lash up a fake message with a text editor, any resemblance
between it and genuine spam is purely coincidental because its been
subject to your (mis)understandings and prejudices about the formatting
of a real spam message.

[1] anonymising e-mail addresses, etc is OK provided the format of the
message isn't changed.

If you can post an anonymised message we may be able to provide more
help. However, without sight of the relevant parts of the messages that
you're trying to recognise with the rule its impossible to know why the
rule doesn't match the MIME header and very difficult to reliably
diagnose your problem.


Martin

 
> 
> 
> 
> > -----Original Message-----
> > From: Martin Gregorie [mailto:martin@gregorie.org]
> > Sent: Friday, May 31, 2013 3:38 PM
> > To: users@spamassassin.apache.org
> > Subject: Re: Rule to scan for .html attachments?
> > 
> > On Fri, 2013-05-31 at 14:45 -0400, Andrew Talbot wrote:
> > > I need it to fire on any HTML attachment. The modules are enabled. I
> > > can get it to pick up text/html, remember, but the problem is that it
> > > detects messages sent as HTML when it's set up like that. It doesn't
> > > detect plain-text messages, but it will flag plain-text messages with
> > > HTML files attached.
> > >
> > Well, that's exactly what your second rule won't do: it will only fire on the
> > header of an html attachment for a file that has one of a very restricted set
> > of filenames. As you haven't posted any example MIME header sets I can
> > only guess, but my guess is that none of the messages you've tried it against
> > have attachments with names that match the restriction.
> > 
> > As I said before the rule can't work with the '^' in place, because that says
> > that the 'filename=....' string must be at the beginning of a line and NOT
> > preceded by any white space. Thats a harmful restriction because you never
> > see MIME headers like that. With the '^' removed the rule
> > becomes:
> > 
> > header HTML_ATTACH_RULE_2 Content-Disposition =~  /filename\=\"[a-
> > z]{2}\.html\"/i
> > 
> > which has a better chance of working. This version will only fire if the
> > filename associated with the attachment has precisely two alphabetic
> > characters plus a .html extension, i.e. it will fire on filename="aa.html" or
> > filename="ZZ.HTML" because the trailing 'i' makes it a caseless match, but it
> > won't fire on filename="cat.html"
> > or filename="x.html" because these don't have two character names and it
> > won't fire if the attachment follows the common Windows convention of
> > using a .htm extension.
> > 
> > If you want the rule to fire on *any* HTML attachment it should be:
> > 
> > header HTML_ATTACH_RULE_2 Content-Disposition =~
> > /filename\=\".{0,30}\.html{0,1}\"/i
> > 
> > which will match any filename with a .html or .htm extension (including
> > ".html" and ".htm").
> > 
> > Could I respectfully suggest that you learn about Perl regular expressions
> > before you try writing any more SA rules? SA rules are all based on using the
> > Perl flavour of regular expressions to match character strings in headers and
> > the message body.
> > 
> > You could do a lot worse than getting a copy of "Programming Perl" by Larry
> > Wall, Tom Christiansen & Jon Orwant, published by O'Reilly. If there isn't one
> > in the firm's technical library, they should be willing to buy a copy. Its a brick
> > of a book, but you only need to read "Chapter
> > 5: Pattern Matching" to write SA rules and in any case the rest of its contents
> > will come in handy in future if anybody needs to write Perl programs or SA
> > extension modules.
> > 
> > 
> > Martin
> > 
> > 
> > 
> > 
> 
>