You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2020/09/12 16:55:38 UTC

Re: Zero-point garbage text that isn't caught by the small-font rules

On Fri, 21 Aug 2020, John Hardin wrote:

> On Fri, 21 Aug 2020, Matus UHLAR - fantomas wrote:
>
>> On 20.08.20 09:13, Loren Wilton wrote:
>>> I've started receiving a bunch of spam or more likely phish mails that 
>>> contain the following sort of trash in large quantities between almost 
>>> every word of the visible text. The invisible font rules don't seem to 
>>> catch this.
>>>
>>>   <span style=3D"font-size: 0vw;">lzdtec</span>
>> 
>> I have noticed those some time ago.
>> I wonder what's the point of sending such mail.
>
> It's an attempt to obstruct spam detection via naïve text matching in the raw 
> HTML. It has no effect (beyond being a fairly good spam indicator) if the 
> text is rendered before being scanned.

OK, I just found another reason for doing this that I don't want to put on 
the users list quite yet.

If you put sufficient invisible text in the *middle* of the body, then the 
visible text following it may be omitted from the text for BODY rules...

See attached spample.

The spammy text after the hidden div is visible in __ALL_RAWBODY but not 
__ALL_BODY (local utility subrules). I was wondering why in the hell my 
rules for it weren't hitting, then I took a closer look at __ALL_BODY...

It's possible that the HTML parser needs some work to exclude HTML-hidden 
text from the BODY text.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Maxim XI: Everything is air-droppable at least once.
-----------------------------------------------------------------------
  5 days until the 233rd anniversary of the signing of the U.S. Constitution

Re: Zero-point garbage text that isn't caught by the small-font rules

Posted by John Hardin <jh...@impsec.org>.
On Sat, 12 Sep 2020, John Hardin wrote:

> It's possible that the HTML parser needs some work to exclude HTML-hidden 
> text from the BODY text.

For this particular message that wouldn't help - the big block of 
"invisible" text is explicitly included in the plaintext message part; it 
looks to me like the spammer screwed things up by dumping a large block of 
visible garbage in the middle of the plaintext body. It impedes scanning 
but the recipient probably isn't going to scroll all the way to the bottom 
to read the end of the pitch.

However, if I manually fix the plaintext part the "invisible" text is 
still rendered into the body text from the HTML part, so that's still a 
potentially viable mechanism to impede scanning, assuming the plaintext 
version is not the same as the visible portion of the HTML version (i.e. 
the plaintext is missing or isn't spammy).


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The premise of gun control in America is that rural conservatives
   must be disarmed because urban leftists are violent and predatory.
                                              -- Grumpy Old Fart @ TSM
-----------------------------------------------------------------------
  5 days until the 233rd anniversary of the signing of the U.S. Constitution

Re: Zero-point garbage text that isn't caught by the small-font rules

Posted by John Hardin <jh...@impsec.org>.
On Sat, 12 Sep 2020, Loren Wilton wrote:

>> It's properly formed. Compare the plaintext part to the HTML part, note 
>> that the base64 block is QP'd base64, and note that there's some more QP 
>> spam pitch text after the base64 block.
>
> Ah. I completely missed the division boundary a third of the way thru, or for 
> that matter the pdf attachment at the end.
>
> I fairly commonly see plaintext versions that include some of the hidden or 
> small-font obfuscation from the HTML part. My assumption is there is some 
> tool that generates the plaintext from the spam-built HTML and does a 
> suboptimal rendering job. I'm guessing this isn't generally a problem since I 
> think most mail programs suppress the plaintext part when there is an HTML 
> part present.

It's a problem for SA because enough embedded "invisible" text can push 
the suspicious text out of the "body" buffer, thus hiding it from rules.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Are you a mildly tech-literate politico horrified by the level of
   ignorance demonstrated by lawmakers gearing up to regulate online
   technology they don't even begin to grasp? Cool. Now you have a
   tiny glimpse into a day in the life of a gun owner.   -- Sean Davis
-----------------------------------------------------------------------
  Today: the 337th anniversary of the muslim Ottoman defeat at Vienna

Re: Zero-point garbage text that isn't caught by the small-font rules

Posted by Loren Wilton <lw...@earthlink.net>.
> It's properly formed. Compare the plaintext part to the HTML part, note 
> that the base64 block is QP'd base64, and note that there's some more QP 
> spam pitch text after the base64 block.

Ah. I completely missed the division boundary a third of the way thru, or 
for that matter the pdf attachment at the end.

I fairly commonly see plaintext versions that include some of the hidden or 
small-font obfuscation from the HTML part. My assumption is there is some 
tool that generates the plaintext from the spam-built HTML and does a 
suboptimal rendering job. I'm guessing this isn't generally a problem since 
I think most mail programs suppress the plaintext part when there is an HTML 
part present.


Re: Zero-point garbage text that isn't caught by the small-font rules

Posted by John Hardin <jh...@impsec.org>.
On Sat, 12 Sep 2020, Loren Wilton wrote:

>> See attached spample.
>
> Is there a boundary missing in that spample? It seems to go from a couple 
> lines of QP text into base64 with no intervening boundary.

It's properly formed. Compare the plaintext part to the HTML part, note 
that the base64 block is QP'd base64, and note that there's some more QP 
spam pitch text after the base64 block.

I think the base64 "hidden" text was pasted into the plaintext part by the 
spammer. Whether intentionally (to impede scanning) or by accident isn't 
clear, but I think it was probably by accident.

Perhaps the spammer's message composing tool isn't hiding invisible text 
when it generates the plaintext body part from the HTML part?


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The premise of gun control in America is that rural conservatives
   must be disarmed because urban leftists are violent and predatory.
                                              -- Grumpy Old Fart @ TSM
-----------------------------------------------------------------------
  5 days until the 233rd anniversary of the signing of the U.S. Constitution

Re: Zero-point garbage text that isn't caught by the small-font rules

Posted by Loren Wilton <lw...@earthlink.net>.
> See attached spample.

Is there a boundary missing in that spample? It seems to go from a couple 
lines of QP text into base64 with no intervening boundary.