You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by patrickbaer <p....@tvwerk.de> on 2008/09/12 14:56:04 UTC

Searching for a list of strings

Hi all,

I'm looking for some regex to find a list of strings in the body,
independent where they are and so on. 

Example:

i am Nice Girl good looking girl who is looking to chat with you.
email me back at szCic@officiallam.com

i will reply back with some really nice pics or skype realtime videos.

The most common phrases are: nice girl, good looking, chat with you, nice
pics, videos

So if at least three of them hit, the rule should match.

Sorry for bothering again.
-- 
View this message in context: http://www.nabble.com/Searching-for-a-list-of-strings-tp19455236p19455236.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Searching for a list of strings

Posted by Martin Gregorie <ma...@gregorie.org>.
On Fri, 2008-09-12 at 05:56 -0700, patrickbaer wrote:
> Hi all,
> 
> I'm looking for some regex to find a list of strings in the body,
> independent where they are and so on. 
> 
> Example:
> 
> i am Nice Girl good looking girl who is looking to chat with you.
> email me back at szCic@officiallam.com
> 
> i will reply back with some really nice pics or skype realtime videos.
> 
> The most common phrases are: nice girl, good looking, chat with you, nice
> pics, videos
> 
> So if at least three of them hit, the rule should match.
>
I'm currently using this type of rule set plus a combining meta rule:

#
# Fake degrees
#
describe MG_DEGREE Mail-order degree offers
body     __MG_D1 /(Bachelors|Bacheelor|Bachellor)/i
body     __MG_D2 /(Masters|Masteer|MasteerMBA|MassterMBA)/i
body     __MG_D3 /(Doctorate|Doctoraate|Doctoorate|Doctor)/i
body     __MG_D4 /weeks.*college graduate/i
body     __MG_D5 /(diploma|Diiploma|Certiificates)/i
meta     MG_DEGREE ((__MG_D1+__MG_D2+__MG_D3+__MG_D4+__MGD5)>2)
score    MG_DEGREE 4.5

which is easily extended and adapted to other sets of key words. Each of
the subordinate rules scores 1 or TRUE when hit, so you can use either
arithmetic or boolean logic in the meta rule.

Debugging may be easier if you omit the double leading underscore in the
body rules since they will then show up in message headers when  hit
(and add a score of 1.0 to the spam total). In this case the meta rule
would add emphasis to the scores accumulated by the body rules.

NOTE: The names are important: if any body rule's name is the same as
the meta rule or is an exact part of it ( MG_DEG is a subset of
MG_DEGREE but MG_DEG1 is not) then the rule set will not work as you
expect: the meta rule will not fire.


Martin



Re: Searching for a list of strings

Posted by patrickbaer <p....@tvwerk.de>.
This is the email that went through. Nothing about razor though?

Return-Path: <fg...@foggia-activewear.de>
Received: from medusa.tvwerk.de ([unix socket])
	 by medusa2 (Cyrus v2.2.13-Debian-2.2.13-10.cb1.1) with LMTPA;
	 Fri, 12 Sep 2008 14:33:23 +0200
X-Sieve: CMU Sieve 2.2
Received: from proxy.tvwerk.de (proxy1 [10.10.10.2])
	by medusa.tvwerk.de (Postfix) with ESMTP id 0B6D51BD7F23
	for <it...@tvwerk.de>; Fri, 12 Sep 2008 14:33:23 +0200 (CEST)
Received: from localhost (unknown [10.10.10.66])
	by proxy.tvwerk.de (Postfix) with ESMTP id 0313F304012
	for <it...@tvwerk.de>; Fri, 12 Sep 2008 14:33:23 +0200 (CEST)
X-Virus-Scanned: amavisd-new at animoto.intern
X-Spam-Flag: NO
X-Spam-Score: 4.427
X-Spam-Level: ****
X-Spam-Status: No, score=4.427 tagged_above=-999 required=5
	tests=[BAYES_50=0.001, HTML_MESSAGE=0.001, RCVD_FORGED_WROTE2=4.325,
	RDNS_NONE=0.1]
Received: from proxy.tvwerk.de ([10.10.10.2])
	by localhost (voodoo.animoto.intern [10.10.10.66]) (amavisd-new, port
10024)
	with ESMTP id ZBT1zGMIqj33 for <it...@tvwerk.de>;
	Fri, 12 Sep 2008 14:33:04 +0200 (MEST)
Received: from furtmair.com (unknown [79.165.217.243])
	by proxy.tvwerk.de (Postfix) with SMTP id 04966304031
	for <it...@tvwerk.de>; Fri, 12 Sep 2008 14:32:11 +0200 (CEST)
Received: from 212.203.9.120 (HELO mail3.servernation.nl)
     by tvwerk.de with esmtp ({nChar[8-12]} {nChar[4-6]})
     id 8secZp-Vw7spY-Ee
     for itk@tvwerk.de; Fri, 12 Sep 2008 16:32:12 +0400
Message-ID: <45...@Rowena>
From: "Rowena Hyatt" <Ro...@foggia-activewear.de>
To: "Cherry Zapata" <it...@tvwerk.de>
Subject: i need you
Date: Fri, 12 Sep 2008 16:32:12 +0400
MIME-Version: 1.0
Content-Type: multipart/alternative;
        boundary="----=_NextPart_17862_4630_01C914F5.1BF90F20"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2180
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180

This is a multi-part message in MIME format.

------=_NextPart_17862_4630_01C914F5.1BF90F20
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

i am Nice Girl good looking girl who is looking to chat with you=2E=20
email me back at 8TclS@officiallam=2Ecom=20

i will reply back with some really nice pics or skype realtime videos=2E
------=_NextPart_17862_4630_01C914F5.1BF90F20
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4=2E0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Diso-8859-=
1">
<META content=3D"MSHTML 6=2E00=2E2900=2E2180" name=3DGENERATOR>
</HEAD>
<BODY>i am Nice Girl good looking girl who is looking to chat with you=2E=
 <br>

email me back at  3D"mailto:Cp35U@officiallam=2Ecom" szCic@offi=
ciallam=2Ecom 
<br><br>
i will reply back with some really nice pics or skype realtime videos=2E<=
/b></BODY></HTML>

------=_NextPart_17862_4630_01C914F5.1BF90F20--



Robert Schetterer wrote:
> 
> patrickbaer schrieb:
>> Hi all,
>> 
>> I'm looking for some regex to find a list of strings in the body,
>> independent where they are and so on. 
>> 
>> Example:
>> 
>> i am Nice Girl good looking girl who is looking to chat with you.
>> email me back at szCic@officiallam.com
>> 
>> i will reply back with some really nice pics or skype realtime videos.
>> 
>> The most common phrases are: nice girl, good looking, chat with you, nice
>> pics, videos
>> 
>> So if at least three of them hit, the rule should match.
>> 
>> Sorry for bothering again.
> 
> on my side
> this nice girl stuff is mostly matched
> by pyzor, razor, dcc, ixhash, freemail plugins etc
> so phrase matches arent that much important
> there arent so much mails of such kind which bypass
> rbls and clamav-milter at smtp level
> 
> -- 
> Best Regards
> 
> MfG Robert Schetterer
> 
> Germany/Munich/Bavaria
> 
> 

-- 
View this message in context: http://www.nabble.com/Searching-for-a-list-of-strings-tp19455236p19455515.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Searching for a list of strings

Posted by Robert Schetterer <ro...@schetterer.org>.
patrickbaer schrieb:
> Hi all,
> 
> I'm looking for some regex to find a list of strings in the body,
> independent where they are and so on. 
> 
> Example:
> 
> i am Nice Girl good looking girl who is looking to chat with you.
> email me back at szCic@officiallam.com
> 
> i will reply back with some really nice pics or skype realtime videos.
> 
> The most common phrases are: nice girl, good looking, chat with you, nice
> pics, videos
> 
> So if at least three of them hit, the rule should match.
> 
> Sorry for bothering again.

on my side
this nice girl stuff is mostly matched
by pyzor, razor, dcc, ixhash, freemail plugins etc
so phrase matches arent that much important
there arent so much mails of such kind which bypass
rbls and clamav-milter at smtp level

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria

Re: Searching for a list of strings

Posted by Jonas Eckerman <jo...@frukt.org>.
patrickbaer wrote:

> I'm looking for some regex to find a list of strings in the body,
> independent where they are and so on. 

Sounds more like your looking for a meta rule.

perldoc Mail::SpamAssassin::Conf
and search for "meta"

> The most common phrases are: nice girl, good looking, chat with you, nice
> pics, videos

> So if at least three of them hit, the rule should match.

Do you mean something like the below (untested) rules?

body __TB_1 /\bnice girl\b/
body __TB_2 /\bgood looking\b/
body __TB_3 /\bchat with you\b/
body __TB_4 /\bnice pics\b/
body __TB_5 /\bvideos\b/
meta TB (__TB_1+__TB_2+__TB_3+__TB_4+__TB_5)>2

Regards
/Jonas

-- 
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/


Re: Searching for a list of strings

Posted by mouss <mo...@netoyen.net>.
patrickbaer wrote:
> Hi all,
> 
> I'm looking for some regex to find a list of strings in the body,
> independent where they are and so on. 
> 
> Example:
> 
> i am Nice Girl good looking girl who is looking to chat with you.
> email me back at szCic@officiallam.com
> 
> i will reply back with some really nice pics or skype realtime videos.
> 
> The most common phrases are: nice girl, good looking, chat with you, nice
> pics, videos
> 
> So if at least three of them hit, the rule should match.

"at least three of them" can't be implemented without explicit 
repetition or without a plugin. you can however do

body _BODY_1 /foo/
body _BODY_2 /bar/
body _BODY_3 /blah/
meta BODY_FOOBAR (_BODY_1 && _BODY_2 && _BODY_3)
score BODY_FOOBAR 0.01
describe BODY_FOOFAR contains foo and bar and blah


> 
> Sorry for bothering again.