You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2011/10/18 05:24:36 UTC

One-line URI body spam

Hi,

I'm having difficulty with figuring out how to tag spam where the body
is only one line with a URL in it. Here is an example:

http://pastebin.com/Y9mX1DRV

I'd appreciate any ideas of what I may be missing to catch these.

Thanks,
Alex

Re: One-line URI body spam

Posted by Martin Gregorie <ma...@gregorie.org>.

On Wed, 2011-10-19 at 20:47 +0200, Karsten Bräckelmann wrote:
> > Has anybody tried this and/or shown a worthwhile correlation between
> > failing reverse IP lookup / aliasing and appearance of the URL in spammy
> > body text?
> 
> Not useful.
> 
OK. Just askin'

Martin

Re: One-line URI body spam

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Fri, 2011-10-21 at 11:08 -0400, Alex wrote:
> guenther, thanks for spending the time to help with this. Back to the
> books to learn more about REs.

Frankly, the RE part was not that complicated. With an exception of
the /s modifier of my solution. Your REs where not bad either. The most
important issue with your RE is hardly taught in books -- properly
anchoring your RE.

The crucial parts to the solution as outlined are  (a) SA rule types,
their specifics and peculiarities, and  (b) a method to develop and test
your rules, and see the match.

Thus, I suggest carefully re-reading my full explanation. Try to
understand every part of it, and play around with rules to see the
effect of each part for yourself.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: One-line URI body spam

Posted by Alex <my...@gmail.com>.

Hi,

>> > body            __BODY_URI      m{https?://.{1,50}$}
>>
>> That will match any email that ends with http:// followed by 1 to 50
>> characters of anythings, including spaces and other stuff not part of the
>> url.  "$" is not "I want stuff to stop matching here."  It's the end.
>> Either of the line, or of the email, depending on how SA handles newlines.
>
> Depends on the type of rule. (And the type of RE modifiers.) The
> obscure, old-school definition of a paragraph in this case. See my
> previous post.
>
> And, again, for the URI matching case, the uri rule is the one to go for
> anyway, ensuring the RE to be applied to URIs only.
>
>
>> Some quick untested thoughts:
>>
>> body            __LONG_BODY     /.{151}/
>> describe        __LONG_BODY     Has a body of more than 150 characters
>                                        ^^^^
>
> Has a *paragraph* of more than 150 chars. Again, see my previous post.
>
> These three very short paragraphs sum up to more than 150 chars.
>
> However, that __LONG_BODY body rule would not match on these three
> paragraphs alone, only the other stuff.

guenther, thanks for spending the time to help with this. Back to the
books to learn more about REs.

Thanks,
Alex

Re: One-line URI body spam

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Wed, 2011-10-19 at 22:21 -0400, darxus@chaosreigns.com wrote:
> > body            __BODY_URI      m{https?://.{1,50}$}
> 
> That will match any email that ends with http:// followed by 1 to 50
> characters of anythings, including spaces and other stuff not part of the
> url.  "$" is not "I want stuff to stop matching here."  It's the end.
> Either of the line, or of the email, depending on how SA handles newlines.

Depends on the type of rule. (And the type of RE modifiers.) The
obscure, old-school definition of a paragraph in this case. See my
previous post.

And, again, for the URI matching case, the uri rule is the one to go for
anyway, ensuring the RE to be applied to URIs only.

> Some quick untested thoughts:
> 
> body            __LONG_BODY     /.{151}/
> describe        __LONG_BODY     Has a body of more than 150 characters
                                        ^^^^

Has a *paragraph* of more than 150 chars. Again, see my previous post.

These three very short paragraphs sum up to more than 150 chars.

However, that __LONG_BODY body rule would not match on these three
paragraphs alone, only the other stuff.

> You might be able to do:
> body            __SHORT_BODY    /(?!.{1,150})/
> But I'm new to this "negative look-ahead assertion" thing.

See perlre. That is a *zero-width* negative look-ahead assertion. Since
there is nothing before the look-ahead, *any* place in the string would
do, with less than 1 char following it, as per the look-ahead assertion.
(And in this case, it really is just a waste of cycles trying to not
match more than a single char...)

By definition of the body rule, the end of the first paragraph.
Coincidentally, the end of the Subject (which is the first paragraph of
the "body" for body rules), regardless of the mail body.

And yes, I verified this. Using ad-hoc rules and faked, specially
crafted messages. My previous post might really be educating...

Don't forget to grab a beer, though, and take your time reading it. :)

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: One-line URI body spam

Posted by da...@chaosreigns.com.

On 10/19, Alex wrote:
> body            __SHORT_BODY    /.{1,150}$/

That will match anything that ends in 1 to 150 characters of anything.  So
it'll match any email that has 1 or more characters.

> describe        __SHORT_BODY    Short email body
> body            __BODY_URI      m{https?://.{1,50}$}

That will match any email that ends with http:// followed by 1 to 50
characters of anythings, including spaces and other stuff not part of the
url.  "$" is not "I want stuff to stop matching here."  It's the end.
Either of the line, or of the email, depending on how SA handles newlines.

> describe        __BODY_URI      Message body contains URI
> meta            LOC_SHORT       (__SHORT_BODY && __BODY_URI)
> describe        LOC_SHORT       Contains short body and URI
> score           LOC_SHORT       0.2
> 
> I'd appreciate it if someone could help me create rules to identify a
> message body less than 150 chars and contains URL less than 50 chars.

Some quick untested thoughts:

body            __LONG_BODY     /.{151}/
describe        __LONG_BODY     Has a body of more than 150 characters
body            __BODY_URI      m{https?://\S{1,49}(\s|$)}
describe        __BODY_URI      Mesage body contains a URI
meta            LOC_SHORT       ( ! __LONG_BODY && __BODY_URI)
describe        LOC_SHORT       Contains long body and short URI
score           LOC_SHORT       0.2

You might be able to do:
body            __SHORT_BODY    /(?!.{1,150})/
But I'm new to this "negative look-ahead assertion" thing.

Happy to work on this more.

Regexes can be some scary dense logic.  I recommend creating a tiny perl
script, with a sample bit of text to match, and working up the regex 1
character at a time.

Start with:

#!/usr/bin/perl
use strict; use warnings;
my $body = "http://www.example.com";
if ($body =~ m{http}) {
  print "Matched.\n";
} else {
  print "Didn't match.\n";
}

And work up from there.  I often have to do stuff like this when working
with regexes.  And don't forget testing on an example string that the regex
shouldn't match.

-- 
"...and he that hath no sword, let him sell his garment, and buy one."
- Luke 22:36, King James Bible
http://www.ChaosReigns.com

Developing Rules, clarifying Body, and the Original Topic (was: Re: One-line URI body spam)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

Sorry, this might be a bit long, but I hope it's worth reading. Not only
for the OP...

On Wed, 2011-10-19 at 19:28 -0400, Alex wrote:
> >> > >> http://pastebin.com/P0cJdf2V

> I was hoping to be able to write a rule based on a short message body
> that also simply contained a URL. I thought this would be a good basis
> for a meta, perhaps with RDNS_NONE or BAYES_99. However, I've fallen
> far short in my attempt:
> 
> body            __SHORT_BODY    /.{1,150}$/

Ouch. First thing, read that RE out load, describing it in words. That's
any char, at least one, up to 150, followed by the end of the line. Can
you see it? The last char of *any* mail with at least one char matches,
so this pretty much matches *always*.

What did you forget? To anchor your RE at the beginning!

That much for the obvious, now on to the more subtle problems. I
strongly encourage anyone writing rules to have a look again at the
relevant parts of the M::SA::Conf docs. In this case, it clearly states
that the Subject becomes the first paragraph for body rules. Does your
150 char limit include the Subject?

But wait, it gets even more subtle. The body rule docs are talking about
rendered, normalized body parts and paragraphs. What does that mean?

The part about rendering should be obvious for HTML, stripping markup,
but the overall meaning is more complicated. Basically, for body rules,
the textual parts are rendered and treated in an old-school UNIX kind of
way. Multiple, consecutive lines of text are concatenated, forming a
paragraph. Like this one. With ^ and $ matching the beginning and end of
a string -- or rather, paragraph.

The following demonstrates this, and exercises a rule writing debug
technique. Ad-hoc rules! :)

  echo -e "\n\none \n two \n three \n\n four \n" | \
    spamassassin --cf="use_bayes 0" --cf="use_auto_whitelist 0" \
                 --cf="body PARAGRAPH /^one.+/" \
                 -D 2>&1 | grep PARAGRAPH
  dbg: rules: ran body rule PARAGRAPH ======> got hit: "one two three "

The 'echo' quickly forges a mail with no headers, by starting with the
\n\n body separator. Grepping the debug output will show the RE match.

Have a close look at the original string, and what the rule matches.
One, two and three are on separate lines, but matched in full due to the
greedy /.+/. Four is not matched, because there is a blank line in the
original string before it. The paragraph!

Just like paragraphs in this very post. Moreover, the match also shows
that multiple consecutive whitespace in a paragraph gets normalized to a
single, ordinary space.

So, now you're armed to refine your rules during development, and
observe the actual match. Of course, you can also feed a real spample to
spamassassin, rather than faking one.

Now let's think further about this. What does that paragraph style mean
to your RE?

It means that /^.{1,150}$/ (note the anchor at the beginning!) matches
any *paragraph* with at least one and up to 150 chars. Regardless how
long the mail is, a single short paragraph will trigger it. (Remember
the part about the Subject being the first paragraph for body rules?
Most likely what will satisfy this RE already.)

Noteworthy in this context, as far as RE matching is concerned, that
such a rendered paragraph is a single line (no whitespace but ordinary
space).

Now, on to a solution for this?  Grab a beer! I did.

For similar patterns (very short body, URI with specific pattern) I
wrote a rule for this two years ago. Without further ado...

  rawbody __KB_RAWBODY_200       /^.{0,200}$/s

Grabbed straight from my old rules archive. A non-scoring sub-rule I
wrote to match on short messages with no more than 200 chars in total.
Ignoring the Subject, no rendering, no HTML stripping, just a very short
body -- that is all textual parts -- after decoding from base64 or
quoted-printable.

The /s modifier means to treat the string as a single line, so "."
matches any char whatsoever, even a newline. Necessary to even match
newlines in the *raw* body, between the ^ beginning and $ end of the
string -- the merely decoded rawbody of all textual parts.

> body            __BODY_URI      m{https?://.{1,50}$}

With a total of less than 150 chars, does it really matter how long the
URI is? And, well, you really should use a uri rule here, not body...

  uri __HAS_HTTP_URI  m~^https?://~

Clickable link, no email:// URIs please.

Finished your beer already? If not, you probably should read this again,
following even closer and trying it yourself. And grab a fresh one, when
you reach the point I told you to... ;)

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: One-line URI body spam

Posted by Alex <my...@gmail.com>.

Hi,

>> > >> http://pastebin.com/P0cJdf2V
>>
>> The URLs in the body of these messages don't give consistent results for
>> a domain lookup and a reverse lookup on the IP:

I was hoping to be able to write a rule based on a short message body
that also simply contained a URL. I thought this would be a good basis
for a meta, perhaps with RDNS_NONE or BAYES_99. However, I've fallen
far short in my attempt:

body            __SHORT_BODY    /.{1,150}$/
describe        __SHORT_BODY    Short email body
body            __BODY_URI      m{https?://.{1,50}$}
describe        __BODY_URI      Message body contains URI
meta            LOC_SHORT       (__SHORT_BODY && __BODY_URI)
describe        LOC_SHORT       Contains short body and URI
score           LOC_SHORT       0.2

I'd appreciate it if someone could help me create rules to identify a
message body less than 150 chars and contains URL less than 50 chars.

Would it make sense to parse the interpreted HTML or analyze the
rawbody directly? Many times the spam doesn't contain any HTML at all.

Thanks,
Alex

Re: One-line URI body spam

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Wed, 2011-10-19 at 12:05 +0100, Martin Gregorie wrote:
> > >> http://pastebin.com/P0cJdf2V
> 
> The URLs in the body of these messages don't give consistent results for
> a domain lookup and a reverse lookup on the IP: 
> 
> $ host guiadoagito.com.br
> guiadoagito.com.br has address 69.163.138.150
> guiadoagito.com.br mail is handled by 0 aspmx.l.google.com.
> $ host 69.163.138.150
> 150.138.163.69.in-addr.arpa domain name pointer apache2-yak.sparks.dreamhost.com.

Which is entirely common for web-hosting. In this case DreamHost shared
hosting. You'll notice aliases and "inconsistent" forward and reverse
lookups in *web-hosting* all over the place.

Try spamassassin.org. And pastebin.com, since you quoted that URI.

Also try with google.com, not due to shared hosting, but load balancing,
the other end of the spectrum.

For laughs, try with *your* domain... :D

> Has anybody tried this and/or shown a worthwhile correlation between
> failing reverse IP lookup / aliasing and appearance of the URL in spammy
> body text?

Not useful.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: One-line URI body spam

Posted by Martin Gregorie <ma...@gregorie.org>.

On Tue, 2011-10-18 at 20:54 -0400, Alex wrote:
> >>>> http://pastebin.com/Y9mX1DRV
> >> http://pastebin.com/P0cJdf2V
>
The URLs in the body of these messages don't give consistent results for
a domain lookup and a reverse lookup on the IP: 

$ host guiadoagito.com.br
guiadoagito.com.br has address 69.163.138.150
guiadoagito.com.br mail is handled by 0 aspmx.l.google.com.
$ host 69.163.138.150
150.138.163.69.in-addr.arpa domain name pointer
apache2-yak.sparks.dreamhost.com.

$ host graphique-com.fr
graphique-com.fr has address 213.186.33.19
graphique-com.fr mail is handled by 5 mx2.ovh.net.
graphique-com.fr mail is handled by 100 mxb.ovh.net.
graphique-com.fr mail is handled by 1 mx1.ovh.net.
$ host 213.186.33.19
19.33.186.213.in-addr.arpa domain name pointer cluster010.ovh.net.

> They aren't legitimate sites. I'm not talking about blocking
> google.com in this case. I'm talking about blocking graphique-com.fr
> or mikeyjetadore.free.fr. Unless I'm missing something?
>
and this is merely an alias:

$ host mikeyjetadore.free.fr
mikeyjetadore.free.fr is an alias for perso101-g5.free.fr.
perso101-g5.free.fr has address 212.27.63.101
$ host 212.27.63.101
101.63.27.212.in-addr.arpa domain name pointer perso101-g5.free.fr.

> I also was thinking it would be possible to generate a rule not
> necessarily relying on identifying a blacklisted URI, no?
> 
I don't know about a rule, but a plugin that can recognise aliases and
check that a domain name lookup and a reverse lookup give mutually
consistent results should recognise this type of body URI. 

Remember that these URLs often fall into two categories:

- 'tasters' that some registrars (GoDaddy, I'm looking at you!) let you
  try out to "see if its a domain that suits you" - all very new-ageist,
  but a gift for spammers

- cheap domains bought, used for hours or days and discarded.

In both cases the spammer isn't going to do more than the minimum work
to acquire and use them, hence the shortcuts and lack of valid reverse
lookup.

Has anybody tried this and/or shown a worthwhile correlation between
failing reverse IP lookup / aliasing and appearance of the URL in spammy
body text?

> Perhaps on originating IP, or lack or real content in the body?
> 
Lack of real content is a real pain: its very hard indeed to write rules
that match it but don't trigger on legitimate mail. About the best I've
managed is to use a meta rule that requires matches from very low
scoring rules (score=0.01) that recognise sales phrases, product names
and generic names, that the message is from a technical mailing list: if
all three sub-rules fire, the message is very high probability spam and
gets a good, high score. However, this type of rule almost never hits
one-liners.

Martin

Re: One-line URI body spam

Posted by Alex <my...@gmail.com>.

Hi,

>>>> I'm having difficulty with figuring out how to tag spam where the body
>>>> is only one line with a URL in it. Here is an example:
>>>>
>>>> http://pastebin.com/Y9mX1DRV
>>>
>>> It would be more helpful if you provided several examples.  It would be
>>> easy enough to write a rule that matched just this example.
>>
>> Yes, I thought that might happen. I've included some others here:
>>
>> http://pastebin.com/P0cJdf2V
>>
>> Great example from Paul Graham. The URI filters apparently can't
>> respond quickly enough.
>
> The problem with URI-RBL filters and those particular spams is not
> necessarily speed but a philosophical quandary. Those spamvertized URLs are
> hacked legitimate sites with spammer pages injected (kind of like a
> parasite).

They aren't legitimate sites. I'm not talking about blocking
google.com in this case. I'm talking about blocking graphique-com.fr
or mikeyjetadore.free.fr. Unless I'm missing something?

I also was thinking it would be possible to generate a rule not
necessarily relying on identifying a blacklisted URI, no?

Perhaps on originating IP, or lack or real content in the body?

Thanks,
Alex

Re: One-line URI body spam

Posted by Noel Butler <no...@ausics.net>.

On Tue, 2011-10-18 at 17:27 -0500, David B Funk wrote:

> So if you black-list those hosts you are generating FPs on any legit mails 
> that link to those sites. Would you black-list google.com because 
> somebody puts 'phish' forms in a google-docs spread-sheet and then

Absolutely yes, size doesn't matter.

Google has been blocked here 6 times in total, Yahoo 9 times, Hotmail 3
times... avg block duration 30 days
(Its one thing I'll give  Microsoft credit for, they do more than just
auto-responders with spammers and idiots, they actually do the walk as
well as the talk, *unlike* google)

Why do people assume because someone's "big"  it's taboo to block them,
jesus H C , come out of your shell, how you do think other "big" players
of yesteryear, eg AOL, twtelecom, comcast... eventually changed their
ways.

> 
> Most reputable RBLs want to avoid FPs and thus are reluctant to list such 
> sites.
> 

therein lies the problem .. its also why I rate spamhaus last in our
tests, because they as good as give in to the likes of google, saying
they would never block them, IIRC it was brought up on this list not too
long ago.

Re: One-line URI body spam

Posted by Walter Hurry <wa...@lavabit.com>.

On Tue, 18 Oct 2011 17:27:17 -0500, David B Funk wrote:

> Would you black-list google.com

Yes, happily.

Re: One-line URI body spam

Posted by Benny Pedersen <me...@junc.org>.

On Tue, 18 Oct 2011 17:27:17 -0500 (CDT), David B Funk wrote:
> sends out spams with that as the payload? (I see lots of 'phish'
> spam with that tactic on a regular basis).
.
if google accept links to any uribl sites then yes i would block 
google, if google just have a phish page ok with me, those url 
redirectors helping make it happend

Re: Responsibility of sites that hold user-created documents (was Re: One-line URI body spam)

Posted by SM <sm...@resistor.net>.

At 13:03 19-10-2011, David F. Skoll wrote:
>In my dream world, people would blacklist Google.  I made a suggestion

The approach would also be applicable for pastebin (which is 
generally suggested on this mailing list) and any other free 
service.  The subject could be rewritten as "responsibility of free 
services that hold user-created documents".

Regards,
-sm

Re: Responsibility of sites that hold user-created documents (was Re: One-line URI body spam)

Posted by Benny Pedersen <me...@junc.org>.

On Wed, 19 Oct 2011 21:37:56 +0100, Martin Gregorie wrote:
> On Wed, 2011-10-19 at 16:03 -0400, David F. Skoll wrote:
>
>> Such a change would be trivial for Google, but my suggestion was 
>> ignored.
>> Maybe it will take some blacklisting to get Google's attention.
>>
> The problem with Google is that each time they say "Don't be evil" 
> you
> can't see their lips move.

have seen some sites that sells sucureity that at the same time stores 
ones visa card info for renewval, its about secureity right ?, how long 
will visa accept it, if i see my visa card being missued one more time, 
then say goodbye to all places that want my money, sad, but maybe 
secureity changes if there is no money ?

Re: Responsibility of sites that hold user-created documents (was Re: One-line URI body spam)

Posted by Martin Gregorie <ma...@gregorie.org>.

On Wed, 2011-10-19 at 16:03 -0400, David F. Skoll wrote:

> Such a change would be trivial for Google, but my suggestion was ignored.
> Maybe it will take some blacklisting to get Google's attention.
> 
The problem with Google is that each time they say "Don't be evil" you
can't see their lips move.


Martin

Responsibility of sites that hold user-created documents (was Re: One-line URI body spam)

Posted by "David F. Skoll" <df...@roaringpenguin.com>.

On Wed, 19 Oct 2011 14:59:40 -0500 (CDT)
David B Funk <db...@engineering.uiowa.edu> wrote:

> BTW, you've totally misinterpreted my goole comment, I was talking
> about the insanity of blacklisting "google.com" in a URI-RBL because
> there was a "phish" page being hosted via docs.google.com.

In my dream world, people would blacklist Google.  I made a suggestion
to Google months ago:  When displaying a docs.google.com document marked
"public", include a disclaimer right at the top:

   "If this document requests personal information such as passwords,
    credit card numbers, banking information, etc. do not enter the
    information.  Instead contact the Google abuse team."

Such a change would be trivial for Google, but my suggestion was ignored.
Maybe it will take some blacklisting to get Google's attention.

Regards,

David.

Re: One-line URI body spam

Posted by David B Funk <db...@engineering.uiowa.edu>.

On Tue, 18 Oct 2011, Michael Scheidell wrote:

> On 10/18/11 6:27 PM, David B Funk wrote:
>> So if you black-list those hosts you are generating FPs on any legit mails 
>> that link to those sites. Would you black-list google.com because somebody 
>> puts 'phish' forms in a google-docs spread-sheet and then
>> sends out spams with that as the payload? (I see lots of 'phish'
>> spam with that tactic on a regular basis).
> google will.  its the safebrowsing list, clamav uses their list also.
>
> if an innocent site gets hacked, and drive by crud installed on it, google 
> will list them.
> In fact, on a security site, that might show examples of hack's, you must 
> prevent google from indexing those pages.
> you might need to have the reader sign up, log in to view them.  if google 
> sees them, they will blacklist you.

There is a world of difference between a URI-RBL and the safebrowsing 
list. The google safebrowsing list (and it's associated ClamAV db) are
based upon a whole URL, a URI-RBL only contains the host/domain name.

So safebrowsing can target one specific page on a site, URI-RBL hits
the whole site/domain. (sniper rifle, vs shot-gun).

I'm all for safebrowsing ClamAV db, I use it here.
However the OP specifically talked about URI-RBLs not hitting those
phish URLs.

BTW, you've totally misinterpreted my goole comment, I was talking about 
the insanity of blacklisting "google.com" in a URI-RBL because there was a 
"phish" page being hosted via docs.google.com.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: One-line URI body spam

Posted by Michael Scheidell <mi...@secnap.com>.

On 10/18/11 6:27 PM, David B Funk wrote:
> So if you black-list those hosts you are generating FPs on any legit 
> mails that link to those sites. Would you black-list google.com 
> because somebody puts 'phish' forms in a google-docs spread-sheet and 
> then
> sends out spams with that as the payload? (I see lots of 'phish'
> spam with that tactic on a regular basis).
google will.  its the safebrowsing list, clamav uses their list also.

if an innocent site gets hacked, and drive by crud installed on it, 
google will list them.
In fact, on a security site, that might show examples of hack's, you 
must prevent google from indexing those pages.
you might need to have the reader sign up, log in to view them.  if 
google sees them, they will blacklist you.

-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
 >*| *SECNAP Network Security Corporation

    * Best Mobile Solutions Product of 2011
    * Best Intrusion Prevention Product
    * Hot Company Finalist 2011
    * Best Email Security Product
    * Certified SNORT Integrator

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.spammertrap.com/
______________________________________________________________________

Re: One-line URI body spam

Posted by David B Funk <db...@engineering.uiowa.edu>.

On Tue, 18 Oct 2011, Alex wrote:

> Hi,
>
>>> I'm having difficulty with figuring out how to tag spam where the body
>>> is only one line with a URL in it. Here is an example:
>>>
>>> http://pastebin.com/Y9mX1DRV
>>
>> It would be more helpful if you provided several examples.  It would be
>> easy enough to write a rule that matched just this example.
>
> Yes, I thought that might happen. I've included some others here:
>
> http://pastebin.com/P0cJdf2V
>
> Great example from Paul Graham. The URI filters apparently can't
> respond quickly enough.

The problem with URI-RBL filters and those particular spams is not 
necessarily speed but a philosophical quandary. Those spamvertized URLs 
are hacked legitimate sites with spammer pages injected (kind of like a 
parasite).

So if you black-list those hosts you are generating FPs on any legit mails 
that link to those sites. Would you black-list google.com because 
somebody puts 'phish' forms in a google-docs spread-sheet and then
sends out spams with that as the payload? (I see lots of 'phish'
spam with that tactic on a regular basis).

Most reputable RBLs want to avoid FPs and thus are reluctant to list such 
sites.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: One-line URI body spam

Posted by Alex <my...@gmail.com>.

Hi,

>> I'm having difficulty with figuring out how to tag spam where the body
>> is only one line with a URL in it. Here is an example:
>>
>> http://pastebin.com/Y9mX1DRV
>
> It would be more helpful if you provided several examples.  It would be
> easy enough to write a rule that matched just this example.

Yes, I thought that might happen. I've included some others here:

http://pastebin.com/P0cJdf2V

Great example from Paul Graham. The URI filters apparently can't
respond quickly enough.

Thanks again,
Alex

Re: One-line URI body spam

Posted by da...@chaosreigns.com.

On 10/17, Alex wrote:
> I'm having difficulty with figuring out how to tag spam where the body
> is only one line with a URL in it. Here is an example:
> 
> http://pastebin.com/Y9mX1DRV

It would be more helpful if you provided several examples.  It would be
easy enough to write a rule that matched just this example.

Not helpful, but just interesting:
In 2002, Paul Graham wrote A Plan for Spam, which included:

  Assuming they could solve the problem of the headers, the spam of the
  future will probably look something like this:

    Hey there.  Thought you should check out the following:
    http://www.27meg.com/foo

  because that is about as much sales pitch as content-based filtering
  will leave the spammer room to make. (Indeed, it will be hard even
  to get this past filters, because if everything else in the email is
  neutral, the spam probability will hinge on the url, and it will take
  some effort to make that look neutral.)

- http://www.paulgraham.com/spam.html

I guess he thought spammers wouldn't think that would be worth sending.

-- 
"A ship in a port is safe, but that's not what ships are built for."
-Grace Murray Hopper
http://www.ChaosReigns.com