You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Ron E." <re...@thunderstar.net> on 2005/01/22 22:35:36 UTC

detecting duplicate messages

Dear All,

	I have spent some time searching for something I assumed existed,
which is a method of detecting when large quantities of the same message,
or messages with nearly the same content (body & subject) are passing
through an MTA within a specified time period. It seems to me this could
be a useful way to detect not only spam but other types of problems as
well - error conditions, mail bombs, etc. Clearly, one could not block
such messages until a certain number of them had already been delivered,
but still it could be useful. A whitelist function would obviously be
used to allow legitimate traffic, but otherwise a threshold could be set
and when enough messages with the same or mostly the same body content
are detected, any further could be quarantined or tagged.

	Just curious if anyone knows of any method of doing this, either
with spamassassin or with another tool.

Thanks.



Re: detecting duplicate messages

Posted by Matt Kettler <mk...@comcast.net>.
At 02:49 AM 1/23/2005, Ron E. wrote:
>Sounds interesting, but I thought DCC, like pyzor/razor simply checks
>message content against known spam databases? How would one use it to
>detect duplicate messages? Any idea?

Technicaly, DCC does not detect the "spamminess" of a message, it's just 
how common a message is, with no consideration if it's common because it is 
spam vs common nonspam. Whitelists are used to reduce some obvious nonspam 
hits, but in general DCC does occasionally fire off on high-volume nonspam 
mailings, such as vendor announcements from Intel... (Then again, despite 
it's claims of being spam only, razor seems to FP much more often in the 
past couple years that I've been using both)

Every non-whitelisted message that goes into the DCC client for check is 
also is reported to the DCC server by default. You actually need to pass a 
parameter to dccproc in order for it to NOT report the message.

The server provides a count of messages with the same hash.

All you need to do is run a local DCC server and configure the DCC client 
to have a pretty low count threshold and you'll be detecting duplicate 
messages.




Re: detecting duplicate messages

Posted by "Ron E." <re...@thunderstar.net>.

On Sat, 22 Jan 2005, Matt Kettler wrote:

> At 04:35 PM 1/22/2005, Ron E. wrote:
> >         I have spent some time searching for something I assumed existed,
> >which is a method of detecting when large quantities of the same message,
> >or messages with nearly the same content (body & subject) are passing
> >through an MTA within a specified time period. It seems to me this could
> >be a useful way to detect not only spam but other types of problems as
> >well - error conditions, mail bombs, etc.
>
> Hmm, DCC works very well for this kind of thing, and you could run it as a
> local-only server.
>
>
Sounds interesting, but I thought DCC, like pyzor/razor simply checks
message content against known spam databases? How would one use it to
detect duplicate messages? Any idea?

Thanks



Re: detecting duplicate messages

Posted by Matt Kettler <mk...@comcast.net>.
At 04:35 PM 1/22/2005, Ron E. wrote:
>         I have spent some time searching for something I assumed existed,
>which is a method of detecting when large quantities of the same message,
>or messages with nearly the same content (body & subject) are passing
>through an MTA within a specified time period. It seems to me this could
>be a useful way to detect not only spam but other types of problems as
>well - error conditions, mail bombs, etc.

Hmm, DCC works very well for this kind of thing, and you could run it as a 
local-only server.