You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kenneth Porter <sh...@sewingwitch.com> on 2006/03/09 01:25:40 UTC

Re: [Mimedefang] Re: [SURBL-Discuss] Fw: Interesting Phishing Trick

--On Wednesday, March 08, 2006 2:24 PM -0800 Jeff Chan <je...@surbl.org> 
wrote:

> It's an interesting use, but I don't believe it would confuse
> SpamAssassin, etc.  The second URI should be visible enough to be
> checked, and I added the IP to ph.surbl.org.

Is there an SA rule that checks for nested anchors? (Either in 3.1 or 
SARE.) Any signs of this idiom in ham corpuses?



Re: [OT] Fw: Interesting Phishing Trick

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Mar 09, 2006 at 10:05:15AM -0500, Kevin A. McGrail wrote:
> However, this rule does trigger on the technique I sent.  I want to work on 
> the nested anchor idea as well but in the meantime, I'd like to hear 
> feedback on this trigger.  It seemed REALLY spammy to me.  Anyone get any 
> hits with this against their HAM or SPAM corpuses?

I have no hits for that.  Interestingly, my spamtraps have no mention of
cursor:, but my personal spam corpus has a ton of "CURSOR: hand".

-- 
Randomly Generated Tagline:
"The frame comes in your choice of colors, so long as your choice is black."
         - From Amazon.com about the Ceiva picture frame

[OT] Fw: Interesting Phishing Trick

Posted by "Kevin A. McGrail" <km...@pccc.com>.
I ran the rule below through the NightlyMassCheck with a 0 HAM hit and a 0 
SPAM hit on those corpuses so the technique might not be very prevalent.

However, this rule does trigger on the technique I sent.  I want to work on 
the nested anchor idea as well but in the meantime, I'd like to hear 
feedback on this trigger.  It seemed REALLY spammy to me.  Anyone get any 
hits with this against their HAM or SPAM corpuses?

#PHISHING TEST
rawbody         KAM_PHISH1      /u style="cursor: pointer"/
describe        KAM_PHISH1      Test for PHISH that changes the cursor
score           KAM_PHISH1      0.01

Regards,
KAM

> Is there an SA rule that checks for nested anchors? (Either in 3.1 or 
> SARE.) Any signs of this idiom in ham corpuses?


Re: [Mimedefang] Re: [SURBL-Discuss] Fw: Interesting Phishing Trick

Posted by Gene Heskett <ge...@verizon.net>.
On Wednesday 08 March 2006 21:57, jdow wrote:
>From: "Kenneth Porter" <sh...@sewingwitch.com>
>
>> --On Wednesday, March 08, 2006 8:40 PM -0500 Theo Van Dinter
>>
>> <fe...@apache.org> wrote:
>>> Not in SA proper.  For curiosity sake, I wrote up a quick rule to
>>> test it out:
>>>
>>>  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
>>>      0    27920     4940    0.850   0.00    0.00  (all messages)
>>>  1.400   1.0852   3.1781    0.255   0.00    1.00  TVD_NESTED_ANCHOR
>>>
>>> ie: it's pretty horrible.
>>
>> What MUA generates all the FP's?
>>
>> Makes me wonder about installing outbound filters that run a
>> validator and reject anything that fails. I often see flame wars on
>> mailing lists about allowing HTML posts to the list, but I wonder
>> how the arguments would change if one allowed only *validated* HTML.
>> I'll bet most who insist on using HTML would immediately be rejected
>> by the validator. "Sorry, your message was rejected because your MUA
>> vendor writes garbage that we can't parse, and makes you look like a
>> spammer." ;)
>
>I'd still take part in the HTML only user's lynching.
>{o.o}

You might have to get in line there, Joanne.  I'm the guy with the rope, 
lemme through please.

-- 
Cheers, Gene
People having trouble with vz bouncing email to me should add the word
'online' between the 'verizon', and the dot which bypasses vz's
stupid bounce rules.  I do use spamassassin too. :-)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.

Re: HTML Validator

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Friday, March 10, 2006 5:08 PM -0800 Kenneth Porter 
<sh...@sewingwitch.com> wrote:

> Anyone know of a good validator that can be run over a MIME part to
> report on the quality of the HTML? This might be used as a go/no-go
> filter at milter level, or it could be used as an SA plugin to assign a
> variable score based on the quality of the HTML.
>
> For mailing lists catering to newbies who love HTML and can't understand
> why us old-timers hate it, we can set the list to exclude all invalid
> HTML. "Sure, we'll accept your HTML. But only if it's really HTML. Not
> that crap that most MUA's write."

I was trying to remember a web page I found that counseled not to use 
DOCTYPE and HTML tags around email to escape spam filters (pretty weird 
advice IMO) and I ran across indications that AOL is rejecting mail that 
fails to pass validation:

<http://www.petefreitag.com/item/307.cfm>
<http://info.aol.co.uk/about/spam/mailer-daemon.adp>
<http://postmaster.info.aol.com/errors/554hvufo.html>
<http://www.clickz.com/showPage.html?page=3490146>

Re: HTML Validator

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
Eric W. Bates wrote:
> I have never used it in a mail context; but tidy (from our friends at w3
> http://www.w3.org/People/Raggett/tidy/) is a very nice validator. Might
> be too big a load for SA, tho.  I think you will also find that M$ html
> output from OE is probably full of errors anyway...

All the better.  Maybe they can be shamed into fixing it.  ;-)

And maybe pigs will grow wings...  Sigh.

-Philip



Re: HTML Validator

Posted by "Eric W. Bates" <er...@vineyard.net>.
Kenneth Porter wrote:
> On Wednesday, March 08, 2006 6:46 PM -0800 Kenneth Porter
> <sh...@sewingwitch.com> wrote:
> 
>> Makes me wonder about installing outbound filters that run a validator
>> and reject anything that fails. I often see flame wars on mailing lists
>> about allowing HTML posts to the list, but I wonder how the arguments
>> would change if one allowed only *validated* HTML. I'll bet most who
>> insist on using HTML would immediately be rejected by the validator.
>> "Sorry, your message was rejected because your MUA vendor writes garbage
>> that we can't parse, and makes you look like a spammer." ;)
> 
> 
> Anyone know of a good validator that can be run over a MIME part to
> report on the quality of the HTML? This might be used as a go/no-go
> filter at milter level, or it could be used as an SA plugin to assign a
> variable score based on the quality of the HTML.
> 
> For mailing lists catering to newbies who love HTML and can't understand
> why us old-timers hate it, we can set the list to exclude all invalid
> HTML. "Sure, we'll accept your HTML. But only if it's really HTML. Not
> that crap that most MUA's write."

I have never used it in a mail context; but tidy (from our friends at w3
http://www.w3.org/People/Raggett/tidy/) is a very nice validator. Might
be too big a load for SA, tho.  I think you will also find that M$ html
output from OE is probably full of errors anyway...

> 
> 


Re: HTML Validator

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Mar 15, 2006 at 08:13:48PM -0700, Philip Prindeville wrote:
> I'm wondering what would be involved in putting in an HTML parser
> that could call various rules to check things, like the case of:

Well, you wouldn't "call various rules", you'd look for a behavior while
parsing and flag it for later detection by a rule.  The current code
means modificaations have to be made to HTML.pm.

> <a href="http://www.foo.com/xyzzy">http://www.bar.com/aardvark</a>

This kind of rule actually doesn't need to be in the HTML parser,
you could easily write a plugin that uses the already parsed anchor
information.

FWIW though, this rule has previously been discussed and dismissed as
being non-useful (too many FPs).  Earlier today on this list even. ;)

-- 
Randomly Generated Tagline:
"You can lead a bigot to water, but if you don't tie him up you can't
 make him drown." - The Psychodots

Re: HTML Validator

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Mar 16, 2006 at 12:50:34PM -0700, Philip Prindeville wrote:
> Hmm.  Thanks.  Trying out the attachment, but having issues.  Using 3.1.0
> on FC3 Linux.
> 
> Updated the bug.

In general, it's bad to have the same conversation in multiple locations.
I'd prefer to discuss issues with the plugin here as opposed to bugzilla since
the plugin was put there so that people in the future can easily access it.
Debugging problems and such I'd prefer to talk about here.

I also responded to your issue in the ticket.  It essentially came down to:
yes, the plugin works fine with 3.1.0.  The errors you saw indicate that
you're not using 3.1.x.

-- 
Randomly Generated Tagline:
Diversity is God's way of amusing himself.

Re: HTML Validator

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
Theo Van Dinter wrote:

>On Wed, Mar 15, 2006 at 09:58:52PM -0700, Philip Prindeville wrote:
>  
>
>>Ok, does anyone have *recent* statistical analysis (i.e. not almost a
>>year old)
>>on this?  It could be that the people using this "boneheaded" construct have
>>realized the error of their ways, and stopped doing it.
>>    
>>
>
>Unfortunately not.  I updated the ticket
>(http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4255) with new
>stats and a plugin that implements the check so people can play with it.
>The best version was comparing domains:
>
>  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
>      0    28446     5023    0.850   0.00    0.00  (all messages)
>0.00000  84.9921  15.0079    0.850   0.00    0.00  (all messages as %)
>  0.302   0.3340   0.1195    0.737   0.00    0.01  T_HTTPS_HTTP_MISMATCH
>
>If people want to play with the plugin and can improve the hit rate to
>a usable level (or if you find a bug in the code), please let us know!
>But otherwise this rule sucks pretty badly.  :(
>
>  
>
Hmm.  Thanks.  Trying out the attachment, but having issues.  Using 3.1.0
on FC3 Linux.

Updated the bug.

-Philip


Re: HTML Validator

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Mar 15, 2006 at 09:58:52PM -0700, Philip Prindeville wrote:
> Ok, does anyone have *recent* statistical analysis (i.e. not almost a
> year old)
> on this?  It could be that the people using this "boneheaded" construct have
> realized the error of their ways, and stopped doing it.

Unfortunately not.  I updated the ticket
(http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4255) with new
stats and a plugin that implements the check so people can play with it.
The best version was comparing domains:

  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
      0    28446     5023    0.850   0.00    0.00  (all messages)
0.00000  84.9921  15.0079    0.850   0.00    0.00  (all messages as %)
  0.302   0.3340   0.1195    0.737   0.00    0.01  T_HTTPS_HTTP_MISMATCH

If people want to play with the plugin and can improve the hit rate to
a usable level (or if you find a bug in the code), please let us know!
But otherwise this rule sucks pretty badly.  :(

-- 
Randomly Generated Tagline:
 Fry: Whoah. Check out that guy. He makes Speedy Gonzales look like 
  Regular Gonzalez.

Re: HTML Validator

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
Theo Van Dinter wrote:

>On Wed, Mar 15, 2006 at 08:40:51PM -0700, Philip Prindeville wrote:
>  
>
>>Does anyone have a way of doing a statistical analysis of ham that contains
>>http(s?):// as the beginning of the anchor text?
>>    
>>
>
>So for the second time today:
>
>http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4255
>
>  
>

Ok, does anyone have *recent* statistical analysis (i.e. not almost a
year old)
on this?  It could be that the people using this "boneheaded" construct have
realized the error of their ways, and stopped doing it.

-Philip


Re: HTML Validator

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Mar 15, 2006 at 08:40:51PM -0700, Philip Prindeville wrote:
> Does anyone have a way of doing a statistical analysis of ham that contains
> http(s?):// as the beginning of the anchor text?

So for the second time today:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4255

-- 
Randomly Generated Tagline:
We are what we pretend to be.
 		-- Kurt Vonnegut, Jr.

Re: HTML Validator

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
Craig Morrison wrote:

>Philip Prindeville wrote:
>  
>
>>I'm wondering what would be involved in putting in an HTML parser
>>that could call various rules to check things, like the case of:
>>
>><a href="http://www.foo.com/xyzzy">http://www.bar.com/aardvark</a>
>>
>>where the link disagrees with the text between the anchor tags (yeah, you
>>could limit it to partial matches on the host-portion)...
>>    
>>
>
>This is the functional equivalent of pissing in the wind. If you are 
>downwind, you are going to get wet.
>
>Anchor text in too many/most cases will not match the HREF. grep is 
>good, but it isn't good enough to catch all cases without significant 
>overhead. Anchor text is a descriptor, nothing more than that. It is not 
>a regurgitation of the link HREF.
>
>  
>

Usually it's not.  That's the point.  It's when the anchor text is tries
to look
like a URL that one needs to be suspicious.  At the very least, if the
anchor text
starts with "https://" but the anchor URL looks like "http://", I'd say
that this is a
definite spam.

Does anyone have a way of doing a statistical analysis of ham that contains
http(s?):// as the beginning of the anchor text?

-Philip


-Philip


Re: HTML Validator

Posted by Craig Morrison <cr...@2cah.com>.
Philip Prindeville wrote:
> I'm wondering what would be involved in putting in an HTML parser
> that could call various rules to check things, like the case of:
> 
> <a href="http://www.foo.com/xyzzy">http://www.bar.com/aardvark</a>
> 
> where the link disagrees with the text between the anchor tags (yeah, you
> could limit it to partial matches on the host-portion)...

This is the functional equivalent of pissing in the wind. If you are 
downwind, you are going to get wet.

Anchor text in too many/most cases will not match the HREF. grep is 
good, but it isn't good enough to catch all cases without significant 
overhead. Anchor text is a descriptor, nothing more than that. It is not 
a regurgitation of the link HREF.


Re: HTML Validator

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
Kenneth Porter wrote:

>On Friday, March 10, 2006 9:43 PM -0700 Philip Prindeville 
><ph...@redfish-solutions.com> wrote:
>
>  
>
>>Do you mean:
>>
>>http://validator.w3.org/source/
>>    
>>
>
>I thought that was just a web form-based validator. I'll have to look at it 
>to see if the validator can be run over an attachment (ie. an HTML MIME 
>part) from a separate mail filter (eg. MIMEDefang).
>  
>

I'm wondering what would be involved in putting in an HTML parser
that could call various rules to check things, like the case of:

<a href="http://www.foo.com/xyzzy">http://www.bar.com/aardvark</a>

where the link disagrees with the text between the anchor tags (yeah, you
could limit it to partial matches on the host-portion)...

This seems to be the Korean Chase issue that Chris encountered.

-Philip


Re: HTML Validator

Posted by Kenneth Porter <sh...@sewingwitch.com>.
On Friday, March 10, 2006 9:43 PM -0700 Philip Prindeville 
<ph...@redfish-solutions.com> wrote:

> Do you mean:
>
> http://validator.w3.org/source/

I thought that was just a web form-based validator. I'll have to look at it 
to see if the validator can be run over an attachment (ie. an HTML MIME 
part) from a separate mail filter (eg. MIMEDefang).

Re: HTML Validator

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
Kenneth Porter wrote:

> Anyone know of a good validator that can be run over a MIME part to report 
> on the quality of the HTML? This might be used as a go/no-go filter at 
> milter level, or it could be used as an SA plugin to assign a variable 
> score based on the quality of the HTML.
> 
> For mailing lists catering to newbies who love HTML and can't understand 
> why us old-timers hate it, we can set the list to exclude all invalid HTML. 
> "Sure, we'll accept your HTML. But only if it's really HTML. Not that crap 
> that most MUA's write."

Do you mean:

http://validator.w3.org/source/

-Philip



HTML Validator (was: Interesting Phishing Trick)

Posted by Kenneth Porter <sh...@sewingwitch.com>.
On Wednesday, March 08, 2006 6:46 PM -0800 Kenneth Porter 
<sh...@sewingwitch.com> wrote:

> Makes me wonder about installing outbound filters that run a validator
> and reject anything that fails. I often see flame wars on mailing lists
> about allowing HTML posts to the list, but I wonder how the arguments
> would change if one allowed only *validated* HTML. I'll bet most who
> insist on using HTML would immediately be rejected by the validator.
> "Sorry, your message was rejected because your MUA vendor writes garbage
> that we can't parse, and makes you look like a spammer." ;)

Anyone know of a good validator that can be run over a MIME part to report 
on the quality of the HTML? This might be used as a go/no-go filter at 
milter level, or it could be used as an SA plugin to assign a variable 
score based on the quality of the HTML.

For mailing lists catering to newbies who love HTML and can't understand 
why us old-timers hate it, we can set the list to exclude all invalid HTML. 
"Sure, we'll accept your HTML. But only if it's really HTML. Not that crap 
that most MUA's write."

Re: [Mimedefang] Re: [SURBL-Discuss] Fw: Interesting Phishing Trick

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Mar 08, 2006 at 06:46:41PM -0800, Kenneth Porter wrote:
> > 1.400   1.0852   3.1781    0.255   0.00    1.00  TVD_NESTED_ANCHOR
> What MUA generates all the FP's?

I already deleted the results, but there were a lot of newsletters.
People are sloppy when they write html, leave an anchor tag open
somewhere, and there you go.

-- 
Randomly Generated Tagline:
"To teach responsibility you must give responsibility."
                 - Geoff Halprin at LISA '99 (quoting from unknown source)

Re: [Mimedefang] Re: [SURBL-Discuss] Fw: Interesting Phishing Trick

Posted by jdow <jd...@earthlink.net>.
From: "Kenneth Porter" <sh...@sewingwitch.com>
> --On Wednesday, March 08, 2006 8:40 PM -0500 Theo Van Dinter 
> <fe...@apache.org> wrote:
> 
>> Not in SA proper.  For curiosity sake, I wrote up a quick rule to test
>> it out:
>>
>>  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
>>      0    27920     4940    0.850   0.00    0.00  (all messages)
>>  1.400   1.0852   3.1781    0.255   0.00    1.00  TVD_NESTED_ANCHOR
>>
>> ie: it's pretty horrible.
> 
> What MUA generates all the FP's?
> 
> Makes me wonder about installing outbound filters that run a validator and 
> reject anything that fails. I often see flame wars on mailing lists about 
> allowing HTML posts to the list, but I wonder how the arguments would 
> change if one allowed only *validated* HTML. I'll bet most who insist on 
> using HTML would immediately be rejected by the validator. "Sorry, your 
> message was rejected because your MUA vendor writes garbage that we can't 
> parse, and makes you look like a spammer." ;)

I'd still take part in the HTML only user's lynching.
{o.o}

Re: [Mimedefang] Re: [SURBL-Discuss] Fw: Interesting Phishing Trick

Posted by Kenneth Porter <sh...@sewingwitch.com>.
--On Wednesday, March 08, 2006 8:40 PM -0500 Theo Van Dinter 
<fe...@apache.org> wrote:

> Not in SA proper.  For curiosity sake, I wrote up a quick rule to test
> it out:
>
>  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
>      0    27920     4940    0.850   0.00    0.00  (all messages)
>  1.400   1.0852   3.1781    0.255   0.00    1.00  TVD_NESTED_ANCHOR
>
> ie: it's pretty horrible.

What MUA generates all the FP's?

Makes me wonder about installing outbound filters that run a validator and 
reject anything that fails. I often see flame wars on mailing lists about 
allowing HTML posts to the list, but I wonder how the arguments would 
change if one allowed only *validated* HTML. I'll bet most who insist on 
using HTML would immediately be rejected by the validator. "Sorry, your 
message was rejected because your MUA vendor writes garbage that we can't 
parse, and makes you look like a spammer." ;)

Re: [Mimedefang] Re: [SURBL-Discuss] Fw: Interesting Phishing Trick

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Mar 08, 2006 at 04:25:40PM -0800, Kenneth Porter wrote:
> >It's an interesting use, but I don't believe it would confuse
> >SpamAssassin, etc.  The second URI should be visible enough to be
> >checked, and I added the IP to ph.surbl.org.
> 
> Is there an SA rule that checks for nested anchors? (Either in 3.1 or 
> SARE.) Any signs of this idiom in ham corpuses?

Not in SA proper.  For curiosity sake, I wrote up a quick rule to test
it out:

 MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME
     0    27920     4940    0.850   0.00    0.00  (all messages)
 1.400   1.0852   3.1781    0.255   0.00    1.00  TVD_NESTED_ANCHOR

ie: it's pretty horrible.

I also tried changing the rule to look for nested anchors where the
nested href goes https? to an IP:

 0.009   0.0107   0.0000    1.000   1.00    1.00  TVD_NESTED_ANCHOR

but the 3 mails that got hit are nailed via other rules.  Average score
of 11 for set0 in 3.2.0 (scores are defaults, no score generation
mass-check yet).

-- 
Randomly Generated Tagline:
"This is my sandbox, I'm not allowed to go in the deep end."
 
 	--Ralph Wiggum
 	  This Little Wiggy (Episode 5F13)

[OT] Re:Fw: Interesting Phishing Trick

Posted by "Kevin A. McGrail" <km...@pccc.com>.
I put a rule in for testing just for this part of the process but a nested 
<a> tag inside another <a> tag is a good idea as well.  I want to see what 
the corpus view is on this issue as well.

rawbody         KAM_PHISH1      /u style="cursor: pointer"/
describe        KAM_PHISH1      Test for PHISH that changes the cursor
score           KAM_PHISH1      0.01


Regards,
KAM

> Is there an SA rule that checks for nested anchors? (Either in 3.1 or 
> SARE.) Any signs of this idiom in ham corpuses?