You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mike Cardwell <sp...@lists.grepular.com> on 2009/04/23 17:53:27 UTC

False positive?

Hi,

I received a legitimate marketting email from a service I signed up to. 
A family member works for that company and asked me to take a look at 
the content of the email as he's had problems delivering mail recently 
and because I'm an email admin.

It has two parts, a plain text and an XHTML part. The XHTML parts starts:

========================================================================
Content-Type: text/html; charset = "utf-8"
Content-Transfer-Encoding: 8bit

<html xml:lang="en">
   <head/>
   <body>
========================================================================

One of the things spamassassin reported as a problem was:

========================================================================
1.3 HTML_TAG_BALANCE_HEAD  BODY: HTML has unbalanced "head" tags
========================================================================

That isn't really correct. <head/> is equivalent to <head></head> and is 
properly balanced. The XHTML is being automatically generated using XSLT 
and a third party script. There are several obvious short term 
workarounds for this case such as:

1.) Output as HTML4 rather than XHTML
2.) Put some tags inside the <head>

However, I still think this should be sorted within SpamAssassin its 
self. Opinions?

-- 
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)

Re: [SA] False positive?

Posted by Adam Katz <an...@khopis.com>.
Karsten Bräckelmann wrote:
> According to the W3C validator [1], the <head/> in itself is invalid,
> and not finished. And the root element is missing a mandatory xmlns
> attribute. Also, according to the compatibility guidelines [2], empty
> elements should have a space before the closing slash. To name but a few
> I found by some brief digging.

More specifically, <head/> is disallowed because the [X]HTML spec
requires <head> sections to include <title>, so the briefest you can
get is <head><title/></head>

The space is optional but recommended as a best-practice, especially
for non-XML HTML4 compatibility (the top of Guenther's link #2 reads:
"This appendix summarizes design guidelines for authors who wish their
XHTML documents to render on existing HTML user agents").

-khopesh

-- 
Adam Katz
http://khopesh.com/Anti-spam

Re: False positive?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2009-04-23 at 16:53 +0100, Mike Cardwell wrote:
> I received a legitimate marketting email from a service I signed up to. 
> A family member works for that company and asked me to take a look at 
> the content of the email as he's had problems delivering mail recently 
> and because I'm an email admin.
> 
> It has two parts, a plain text and an XHTML part. The XHTML parts starts:
> 
> ========================================================================
> Content-Type: text/html; charset = "utf-8"
> Content-Transfer-Encoding: 8bit
> 
> <html xml:lang="en">
>    <head/>
>    <body>
> ========================================================================

Not directly regarding the original report, however there are lots of
issues in that tiny snippet already.

According to the W3C validator [1], the <head/> in itself is invalid,
and not finished. And the root element is missing a mandatory xmlns
attribute. Also, according to the compatibility guidelines [2], empty
elements should have a space before the closing slash. To name but a few
I found by some brief digging.


> One of the things spamassassin reported as a problem was:
> 
> ========================================================================
> 1.3 HTML_TAG_BALANCE_HEAD  BODY: HTML has unbalanced "head" tags
> ========================================================================
> 
> That isn't really correct. <head/> is equivalent to <head></head> and is 
> properly balanced. The XHTML is being automatically generated using XSLT 
> and a third party script.

Frankly, I don't know for sure if this is correct, and whether it needs
to be fixed, but the above issues assume there's something broken about
it.


> There are several obvious short term 
> workarounds for this case such as:
> 
> 1.) Output as HTML4 rather than XHTML
> 2.) Put some tags inside the <head>
> 
> However, I still think this should be sorted within SpamAssassin its 
> self. Opinions?

First, get all the rules straight, and verify this really is allowed and
correct. Then file a bug if the rule truly mis-fires.

However, there's another aspect to the rules and their scores. That is
the ration of hits in spam to overall. Yes, rules occasionally trigger-
ing on ham doesn't necessarily impact its validity to detect spam or its
score.

Just some first thoughts...


[1] http://validator.w3.org/
[2] http://www.w3.org/TR/xhtml1/#guidelines

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: False positive?

Posted by Justin Mason <jm...@jmason.org>.
hi Mike -- could you open a bug on the bugzilla? definitely sounds
like a false positive.

--j.

On Thu, Apr 23, 2009 at 16:53, Mike Cardwell
<sp...@lists.grepular.com> wrote:
> Hi,
>
> I received a legitimate marketting email from a service I signed up to. A
> family member works for that company and asked me to take a look at the
> content of the email as he's had problems delivering mail recently and
> because I'm an email admin.
>
> It has two parts, a plain text and an XHTML part. The XHTML parts starts:
>
> ========================================================================
> Content-Type: text/html; charset = "utf-8"
> Content-Transfer-Encoding: 8bit
>
> <html xml:lang="en">
>  <head/>
>  <body>
> ========================================================================
>
> One of the things spamassassin reported as a problem was:
>
> ========================================================================
> 1.3 HTML_TAG_BALANCE_HEAD  BODY: HTML has unbalanced "head" tags
> ========================================================================
>
> That isn't really correct. <head/> is equivalent to <head></head> and is
> properly balanced. The XHTML is being automatically generated using XSLT and
> a third party script. There are several obvious short term workarounds for
> this case such as:
>
> 1.) Output as HTML4 rather than XHTML
> 2.) Put some tags inside the <head>
>
> However, I still think this should be sorted within SpamAssassin its self.
> Opinions?
>
> --
> Mike Cardwell
> (https://secure.grepular.com/) (http://perlcv.com/)
>
>