You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Philip Prindeville <ph...@redfish-solutions.com> on 2014/06/25 03:07:29 UTC

Dubious hyperlinks

I’ve been seeing spam with <A HREF=“#” …> such as:

<A href="#" philipp&nbsp;2014-06-25 01:20:00;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7><SPAN style="VISIBILITY: hidden"></SPAN></A>

and the style=“VISIBILITY: hidden” is also dubious (why would normal mail have hidden text???).

Anyone have rules to catch these they could point me at?  Or any empirical evidence about how successful they’ve been with such?

Thanks,

-Philip


Re: Dubious hyperlinks

Posted by Axb <ax...@gmail.com>.
On 06/25/2014 11:35 PM, Philip Prindeville wrote:
>
> On Jun 25, 2014, at 3:00 PM, Axb <ax...@gmail.com> wrote:
>
>> On 06/25/2014 10:37 PM, Philip Prindeville wrote:
>>>
>>> On Jun 25, 2014, at 3:09 AM, Axb <ax...@gmail.com> wrote:
>>>
>>>> On 06/25/2014 03:07 AM, Philip Prindeville wrote:
>>>>
>>>>> Anyone have rules to catch these they could point me at?  Or any empirical evidence about how successful they’ve been with such?
>>>>
>>>> Wouldn't use this for a rule unless you meta it with lots of other traits
>>>>
>>>> the rawbody /href\=\"#\"/ plus other traits could be combined.
>>>>
>>>> Can you pastebin a sample ?
>>>>
>>>
>>>
>>> Sure:
>>>
>>> http://pastebin.com/4QFUZ6vd
>>
>>
>> the href template bork + the Base8 hashes are giveaways.
>> meta those rawbody traits together and you're rocking (for a while)
>>
>
> Sorry, which base8 hashes?

F1B9215E, etc


> Also, I’m noticing the tracking info following the href…
>
> F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7
>
> Including 6 distinct UUID’s would seem to be useful.  Including the same UUID 6 times seems broken.
>
> Perhaps a pattern like:
>
> body /((;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})){4,}/
>
> would be… no, wait… we’d need to save the first one, and then check for 3 or more recurrences of the exact same literal string.
>
> rawbody L_REPEATING_UUIDS       /<a href="\#" .*(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12}){4,}>/i
> describe L_REPEATING_UUIDS      Seeing the same tracking info repeated
> score L_REPEATING_UUIDS         0.1


I'd do a less specific:

rawbody HREF_TP_BORK_HASH   /\<href\=\"#\"/
score   HREF_TP_BORK_HASH  1.5

body 	BASE812C_DASHS  /\;?\-?[A-F0-9]{12}\-?\;?/
score   BASE812C_DASHS  1.5

meta    META_DASHES_URIHASH  (BASE812C_DASHS && HREF_TP_BORK_HASH)
score   META_DASHES_URIHASH   3.5
tflags	META_DASHES_URIHASH   autolearn_force





Re: Dubious hyperlinks

Posted by John Hardin <jh...@impsec.org>.
On Thu, 26 Jun 2014, John Hardin wrote:

> On Thu, 26 Jun 2014, Philip Prindeville wrote:
>
>>  On Jun 25, 2014, at 3:47 PM, John Hardin <jh...@impsec.org> wrote:
>> 
>> >  That still doesn't hit *only* the same GUID repeated. Try this:
>> > 
>> >  rawbody L_REPEATING_UUIDS  /<a href="\#" 
>> >  [^\s>]+(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})\1\1\1/i
>>
>>  Sorry, that got dropped along the way.  I had tested:
>>
>>  rawbody L_REPEATING_UUIDS       /<a href="\#"
>>  .*(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})(\1){4,}>/i
>>
>>  and indeed that works correctly.
>
> OK, that's certainly another valid way to code it.
>
> Note that you do not need parens around the \1. That captures it again, which 
> just wastes processing.  \1{4,} should work.
>
> Also, .* in a rawbody rule is a **really** bad idea. Note my suggested 
> alternative, which won't run wild scanning the entire message.

Actually, one small modification:

   rawbody L_REPEATING_UUIDS  /<a href="\#"\s+[^\s>]+(;[A-F0-9].....

The original version would not match if there was more than one space 
after the href="\#" bit.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Maxim IX: Never turn your back on an enemy.
-----------------------------------------------------------------------
  8 days until the 238th anniversary of the Declaration of Independence

Re: Dubious hyperlinks

Posted by John Hardin <jh...@impsec.org>.
On Thu, 26 Jun 2014, Philip Prindeville wrote:

> The [^\s] wouldn’t work because there is space in there…
>
> <A href="#" philipp&nbsp;2014-06-25 01:20:00;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7><SPAN style="VISIBILITY: hidden"></SPAN></A>
>
> note the name, non-breaking space, and the timestamp before the UUID’s…

The nonbreaking space wouldn't have any effect, that's not converted 
before the RE scan; but the space in the date I did miss - apologies.

   rawbody L_REPEATING_UUIDS  /<a href="\#" [^>]+(;[A-F0-9].....

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Maxim IX: Never turn your back on an enemy.
-----------------------------------------------------------------------
  8 days until the 238th anniversary of the Declaration of Independence

Re: Dubious hyperlinks

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
On Jun 26, 2014, at 7:31 PM, John Hardin <jh...@impsec.org> wrote:

> On Thu, 26 Jun 2014, Philip Prindeville wrote:
> 
>> On Jun 25, 2014, at 3:47 PM, John Hardin <jh...@impsec.org> wrote:
>> 
>>> That still doesn't hit *only* the same GUID repeated. Try this:
>>> 
>>> rawbody L_REPEATING_UUIDS  /<a href="\#" [^\s>]+(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})\1\1\1/i
>> 
>> Sorry, that got dropped along the way.  I had tested:
>> 
>> rawbody L_REPEATING_UUIDS       /<a href="\#" .*(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})(\1){4,}>/i
>> 
>> and indeed that works correctly.
> 
> OK, that's certainly another valid way to code it.
> 
> Note that you do not need parens around the \1. That captures it again, which just wastes processing.  \1{4,} should work.
> 
> Also, .* in a rawbody rule is a **really** bad idea. Note my suggested alternative, which won't run wild scanning the entire message.


The [^\s] wouldn’t work because there is space in there…

<A href="#" philipp&nbsp;2014-06-25 01:20:00;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7><SPAN style="VISIBILITY: hidden"></SPAN></A>

note the name, non-breaking space, and the timestamp before the UUID’s…



Re: Dubious hyperlinks

Posted by John Hardin <jh...@impsec.org>.
On Thu, 26 Jun 2014, Philip Prindeville wrote:

> On Jun 25, 2014, at 3:47 PM, John Hardin <jh...@impsec.org> wrote:
>
>> That still doesn't hit *only* the same GUID repeated. Try this:
>>
>> rawbody L_REPEATING_UUIDS  /<a href="\#" [^\s>]+(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})\1\1\1/i
>
> Sorry, that got dropped along the way.  I had tested:
>
> rawbody L_REPEATING_UUIDS       /<a href="\#" .*(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})(\1){4,}>/i
>
> and indeed that works correctly.

OK, that's certainly another valid way to code it.

Note that you do not need parens around the \1. That captures it again, 
which just wastes processing.  \1{4,} should work.

Also, .* in a rawbody rule is a **really** bad idea. Note my suggested 
alternative, which won't run wild scanning the entire message.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Maxim IX: Never turn your back on an enemy.
-----------------------------------------------------------------------
  8 days until the 238th anniversary of the Declaration of Independence

Re: Dubious hyperlinks

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
On Jun 25, 2014, at 3:47 PM, John Hardin <jh...@impsec.org> wrote:

> On Wed, 25 Jun 2014, Philip Prindeville wrote:
> 
>> Including 6 distinct UUID’s would seem to be useful.  Including the same UUID 6 times seems broken.
>> 
>> Perhaps a pattern like:
>> 
>> body /((;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})){4,}/
>> 
>> would be… no, wait… we’d need to save the first one, and then check for 3 or more recurrences of the exact same literal string.
>> 
>> rawbody L_REPEATING_UUIDS       /<a href="\#" .*(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12}){4,}>/i
>> describe L_REPEATING_UUIDS      Seeing the same tracking info repeated
>> score L_REPEATING_UUIDS         0.1
> 
> That still doesn't hit *only* the same GUID repeated. Try this:
> 
> rawbody L_REPEATING_UUIDS  /<a href="\#" [^\s>]+(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})\1\1\1/i
> 


Sorry, that got dropped along the way.  I had tested:

rawbody L_REPEATING_UUIDS       /<a href="\#" .*(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})(\1){4,}>/i


and indeed that works correctly.



Re: Dubious hyperlinks

Posted by John Hardin <jh...@impsec.org>.
On Wed, 25 Jun 2014, Philip Prindeville wrote:

> Including 6 distinct UUID’s would seem to be useful.  Including the same UUID 6 times seems broken.
>
> Perhaps a pattern like:
>
> body /((;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})){4,}/
>
> would be… no, wait… we’d need to save the first one, and then check for 3 or more recurrences of the exact same literal string.
>
> rawbody L_REPEATING_UUIDS       /<a href="\#" .*(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12}){4,}>/i
> describe L_REPEATING_UUIDS      Seeing the same tracking info repeated
> score L_REPEATING_UUIDS         0.1

That still doesn't hit *only* the same GUID repeated. Try this:

rawbody L_REPEATING_UUIDS  /<a href="\#" [^\s>]+(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})\1\1\1/i


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The United States has become a place where entertainers and
   professional athletes are mistaken for people of importance.
                                         -- Maureen Johnson Smith Long
-----------------------------------------------------------------------
  9 days until the 238th anniversary of the Declaration of Independence

Re: Dubious hyperlinks

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
On Jun 25, 2014, at 3:00 PM, Axb <ax...@gmail.com> wrote:

> On 06/25/2014 10:37 PM, Philip Prindeville wrote:
>> 
>> On Jun 25, 2014, at 3:09 AM, Axb <ax...@gmail.com> wrote:
>> 
>>> On 06/25/2014 03:07 AM, Philip Prindeville wrote:
>>> 
>>>> Anyone have rules to catch these they could point me at?  Or any empirical evidence about how successful they’ve been with such?
>>> 
>>> Wouldn't use this for a rule unless you meta it with lots of other traits
>>> 
>>> the rawbody /href\=\"#\"/ plus other traits could be combined.
>>> 
>>> Can you pastebin a sample ?
>>> 
>> 
>> 
>> Sure:
>> 
>> http://pastebin.com/4QFUZ6vd
> 
> 
> the href template bork + the Base8 hashes are giveaways.
> meta those rawbody traits together and you're rocking (for a while)
> 

Sorry, which base8 hashes?

Also, I’m noticing the tracking info following the href…

F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7

Including 6 distinct UUID’s would seem to be useful.  Including the same UUID 6 times seems broken.

Perhaps a pattern like:

body /((;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})){4,}/

would be… no, wait… we’d need to save the first one, and then check for 3 or more recurrences of the exact same literal string.

rawbody L_REPEATING_UUIDS       /<a href="\#" .*(;[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12}){4,}>/i
describe L_REPEATING_UUIDS      Seeing the same tracking info repeated
score L_REPEATING_UUIDS         0.1


Re: Dubious hyperlinks

Posted by Axb <ax...@gmail.com>.
On 06/25/2014 10:37 PM, Philip Prindeville wrote:
>
> On Jun 25, 2014, at 3:09 AM, Axb <ax...@gmail.com> wrote:
>
>> On 06/25/2014 03:07 AM, Philip Prindeville wrote:
>>
>>> Anyone have rules to catch these they could point me at?  Or any empirical evidence about how successful they’ve been with such?
>>
>> Wouldn't use this for a rule unless you meta it with lots of other traits
>>
>> the rawbody /href\=\"#\"/ plus other traits could be combined.
>>
>> Can you pastebin a sample ?
>>
>
>
> Sure:
>
> http://pastebin.com/4QFUZ6vd


the href template bork + the Base8 hashes are giveaways.
meta those rawbody traits together and you're rocking (for a while)




Re: Dubious hyperlinks

Posted by Philip Prindeville <ph...@redfish-solutions.com>.
On Jun 25, 2014, at 3:09 AM, Axb <ax...@gmail.com> wrote:

> On 06/25/2014 03:07 AM, Philip Prindeville wrote:
> 
>> Anyone have rules to catch these they could point me at?  Or any empirical evidence about how successful they’ve been with such?
> 
> Wouldn't use this for a rule unless you meta it with lots of other traits
> 
> the rawbody /href\=\"#\"/ plus other traits could be combined.
> 
> Can you pastebin a sample ?
> 


Sure:

http://pastebin.com/4QFUZ6vd



Re: Dubious hyperlinks

Posted by Axb <ax...@gmail.com>.
On 06/25/2014 03:07 AM, Philip Prindeville wrote:
> I’ve been seeing spam with <A HREF=“#” …> such as:
>
> <A href="#" philipp&nbsp;2014-06-25 01:20:00;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7;F1B9215E-B1D0-40BC-92D1-F13D501596B7><SPAN style="VISIBILITY: hidden"></SPAN></A>
>
> and the style=“VISIBILITY: hidden” is also dubious (why would normal mail have hidden text???).

Lots of legitmate bulk mail uses this for tracking purposes

> Anyone have rules to catch these they could point me at?  Or any empirical evidence about how successful they’ve been with such?

Wouldn't use this for a rule unless you meta it with lots of other traits

the rawbody /href\=\"#\"/ plus other traits could be combined.

Can you pastebin a sample ?