You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2018/11/15 01:02:02 UTC

googleapis hosted phish

Hi,
Anyone have any further ideas for blocking these? Google really should
be doing better to prevent these.

https://pastebin.com/XumEjHc1

Re: googleapis hosted phish

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 15 Nov 2018, at 7:52, RW wrote:

> On Thu, 15 Nov 2018 01:22:00 -0500
> Bill Cole wrote:
>
>> On 14 Nov 2018, at 20:11, Alex wrote:
>>
>>> Where is it getting these long hostname strings from?
>>
>> There's a bunch of garbage HTML using invisible text (font-size: 0)
>> between tiny bits of visible text to break Bayes and/or specific word
>> detection.
>
> That particular example is actually html in a  text/plain mime 
> section.

The mess in the text/plain part is a result of a botched 
rendering/tag-stripping of the insane text/html part, but yes: the 
specific misidentified domain name is in the plain part and is a result 
of a line-breaking artifact inside the rendered HTML.


-- 
Bill Cole

Re: googleapis hosted phish

Posted by RW <rw...@googlemail.com>.
On Thu, 15 Nov 2018 01:22:00 -0500
Bill Cole wrote:

> On 14 Nov 2018, at 20:11, Alex wrote:
> 
> > Where is it getting these long hostname strings from?  
> 
> There's a bunch of garbage HTML using invisible text (font-size: 0) 
> between tiny bits of visible text to break Bayes and/or specific word 
> detection. 

That particular example is actually html in a  text/plain mime section.

Re: googleapis hosted phish

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 14 Nov 2018, at 20:11, Alex wrote:

> Where is it getting these long hostname strings from?

There's a bunch of garbage HTML using invisible text (font-size: 0) 
between tiny bits of visible text to break Bayes and/or specific word 
detection. The overly-thirsty "URI" parser strings this junk together 
and is seeing <longstring>.az\b somewhere in it, and picks it up as a 
domain name. It's noisy in debug output but in this case harmless 
because what it is seeing includes a hostname that's too long to be a 
DNS label.

FWIW, that junk can be detected with rawbody rules looking for 
idiosyncratic HTML. I don't publish my local rules which do that sort of 
thing because they are very useful but very evadable and I suspect that 
if the precise rules were broadcast, they'd stop being useful in a 
matter of days. Instead, it would be really good if everyone maintaining 
their own local rules would take that hint and devise an invisible 
forest of slightly different rules to catch HTML structures with no 
legitimate purpose, making it impossible for spammers to get around a 
single rule published in the default channel or KAM.cf or anything else 
known to be under spammers' watch.

(CAVEAT: For some reason, a lot of opt-in political bulk mail also 
catches on such rules.)

> Should we be rethinking whether googleapis.com should be in the DNSBL 
> skip list?

I think it may deserve a special rule all its own (with extensive FP 
shielding) but I suspect that you will never see it in a URIDNSBL that 
is safe to use, so it would do no good to keep resolving 
storage.googleapis.com and other such names with short-TTL CNAME records 
pointing to shorter-TTL A records on a frequent basis only to determine 
that it will never get listed OR that you're using a URIDNSBL which 
intends to generate widespread collateral damage.

Of course, I could be wrong. You could test how wrong I might be with 
this:

clear_uridnsbl_skip_domain  googleapis.com



-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole

Re: googleapis hosted phish

Posted by Alex <my...@gmail.com>.
Hi,

> Anyone have any further ideas for blocking these? Google really should
> be doing better to prevent these.
>
> https://pastebin.com/XumEjHc1

I ran this through with debug, and it's producing some weird messages
I don't understand:

Nov 14 20:05:43.654 [28187] dbg: uridnsbl: domain googleapis.com in
skip list, host storage.googleapis.com
Nov 14 20:05:43.654 [28187] dbg: check: tagrun - tag URIHOSTS is now
ready, value: ARY:[z2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqc
xaz2eqcxaz2eqcx.az,example.com]
Nov 14 20:05:43.654 [28187] dbg: check: tagrun - tag URIDOMAINS is now
ready, value: ARY:[z2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2e
qcxaz2eqcxaz2eqcx.az,example.com]
Nov 14 20:05:43.654 [28187] dbg: uridnsbl: considering
host=z2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcx.az,
domain=z2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcxaz2eqcx.az

Where is it getting these long hostname strings from?

Should we be rethinking whether googleapis.com should be in the DNSBL skip list?