You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Warren Togami <wt...@redhat.com> on 2009/09/29 20:37:50 UTC

DNSWL and JMF White false positives, what to do exactly?

I scanned my spam folders and found a few false positives that hit on 
either DNSWL or JMF (HOSTKARMA?  See how confusing it is not knowing 
what to call it?)

Is there an easy automated way we can forward FP's to DNSWL and JMF so 
their maintainers can decide what to do about the offending senders? 
I'd attach it to mail but it might get caught in the spam filter...

Warren Togami
wtogami@redhat.com

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by Marc Perkel <ma...@perkel.com>.


Charles Gregory wrote:
> On Fri, 2 Oct 2009, RW wrote:
>>> However, if you want to be understood you need to speak the Lingua
>>> Franca. If you choose to use a term differently than everyone else
>>> you WILL be misunderstood and corrected.
>
> If everyone calls an apple an orange, then yeah, it's an orange.
>
>> A false match on a test is a false-positive. It doesn't reverse for a
>> ham test, simply because you're more used to thinking about spam tests.
>
> The distinction is whether the 'false positive' refers to the overall 
> scoring of the message (FP=ham flagged as spam) or an individual test 
> (FP=test triggered incorrectly). I consider *both* usages correct in 
> this group. And as I vaguely recall, the OP did use sufficient context 
> for even a lame-brain like myself to realize he meant the latter.
>
> The FP on the named rule had the potential to cause an FN.
>
>> Do you apply the same usage to anything else? For example, do you
>> reverse the meaning of "off" and "on" for air-conditioning to make it
>> consistent with heating, so "on" always mean "make hotter"?
>
> Do you TURN UP or TURN DOWN your air-conditioning?
> Depends on whether someone has a simple numerical control
> or is adjusting a thermostat. Plus colloquial usage, of course. :)
> But yeah, you hit pretty close with your analogy. Just chose
> the wrong words. :)
>
> - Charles
>

Q. Do I make a left at the next intersection?
A. Right!

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by Charles Gregory <cg...@hwcn.org>.

On Fri, 2 Oct 2009, RW wrote:
>> However, if you want to be understood you need to speak the Lingua
>> Franca. If you choose to use a term differently than everyone else
>> you WILL be misunderstood and corrected.

If everyone calls an apple an orange, then yeah, it's an orange.

> A false match on a test is a false-positive. It doesn't reverse for a
> ham test, simply because you're more used to thinking about spam tests.

The distinction is whether the 'false positive' refers to the overall 
scoring of the message (FP=ham flagged as spam) or an individual test 
(FP=test triggered incorrectly). I consider *both* usages correct in this 
group. And as I vaguely recall, the OP did use sufficient context for even 
a lame-brain like myself to realize he meant the latter.

The FP on the named rule had the potential to cause an FN.

> Do you apply the same usage to anything else? For example, do you
> reverse the meaning of "off" and "on" for air-conditioning to make it
> consistent with heating, so "on" always mean "make hotter"?

Do you TURN UP or TURN DOWN your air-conditioning?
Depends on whether someone has a simple numerical control
or is adjusting a thermostat. Plus colloquial usage, of course. :)
But yeah, you hit pretty close with your analogy. Just chose
the wrong words. :)

- Charles

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by RW <rw...@googlemail.com>.

On Thu, 1 Oct 2009 18:54:40 -0600
LuKreme <kr...@kreme.com> wrote:

> On Oct 1, 2009, at 18:36, Karsten Bräckelmann
> <gu...@rudersport.de> wrote:
> 
> > Same for RCVD_IN_DNSWL. If it positively matches, it either it is
> > correct, or wrong. A false positive is a match, that is wrong. No  
> > matter
> > the score you assign the test.
> 
> Lke others havecsaid, you can make the words mean whatever you want.  
> However, if you want to be understood you need to speak the Lingua  
> Franca. If you choose to use a term differently than everyone else
> you WILL be misunderstood and corrected.

Except that so far the lunatics haven't taken-over the asylum and you
are in a 3 to 2 minority, so please don't claim to be speaking for
everyone. 

A false match on a test is a false-positive. It doesn't reverse for a
ham test, simply because you're more used to thinking about spam tests. 

Do you apply the same usage to anything else? For example, do you
reverse the meaning of "off" and "on" for air-conditioning to make it
consistent with heating, so "on" always mean "make hotter"?

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by LuKreme <kr...@kreme.com>.

On Oct 1, 2009, at 18:36, Karsten Bräckelmann <gu...@rudersport.de>  
wrote:

> Same for RCVD_IN_DNSWL. If it positively matches, it either it is
> correct, or wrong. A false positive is a match, that is wrong. No  
> matter
> the score you assign the test.

Lke others havecsaid, you can make the words mean whatever you want.  
However, if you want to be understood you need to speak the Lingua  
Franca. If you choose to use a term differently than everyone else you  
WILL be misunderstood and corrected.

Saying everyone else is wrong isn't going to help.

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Sat, 2009-10-03 at 00:25 +0200, mouss wrote:
> Karsten Bräckelmann wrote:

> > > > False positive. Something, that matches (positive) the criterion for a
> > > > certain test, but should not (false).
> > 
> > I stand to what I said.
> 
> I'm not surprised:)

;)

> > IFF you are talking about the black box that spam detection is, that is
> > true.
> > 
> > If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to
> > be that simple. However, it is not. You are looking at a single test,
> > which -- if positive -- either is correct or wrong.
> 
> I understand the rationale, but I find this too abstract for "common"
> discussions.

*shrug*  You're not obliged to participate in a thread, if it is
confusing to you. That's the wonders of open discussion and diverse
input. You might stumble upon something you didn't know before... ;)

> > Same for RCVD_IN_DNSWL. If it positively matches, it either it is
> > correct, or wrong. A false positive is a match, that is wrong. No matter
> > the score you assign the test.
> 
> except that it depends what the test really means. dnswl doesn't mean
> the listed hosts never send spam. I am happy that it lists debian list
> servers, Orange, ... etc.

Exactly, in the context of a single rule (as opposed to "detecting
spam"), it depends on what the rule really means. Or in short, its
score's sign...

> > This concept is NOT specific to spam detection, or even computer
> > science. As a matter of fact, when I first really grasped the concept, a
> > medical scientist explained it to me.
> 
> now that you say it, this is true. I too believ that medical science has
> precedence in this area.
> 
> > Yes, a FP for a rule that identifies *ham* actually evaluated positive
> > on a spam. It only appears to be spam centric on this list, cause it is
> > mainly dedicated to identifying spam, not ham.
> > 
> > You might want to ask wikipedia as well. And don't focus on the spam
> > filtering *example*, which again exclusively talks about a rule
> > identifying spam. Not ham.
> 
> my point was that in a spam oriented forum, the meaning of some words is
> what "most of us" (yes, this is hard to define) think they mean. the
> principle of least astonishment.

Of course, these terms mostly come up WRT to overall score of a message,
which applies to "detecting spam".

However, on this very list, it also commonly is referred to single rules
FP'ing, *without* pushing the ham above the required_score threshold.

The only aspect new and obviously confusing to some regulars on this
list is the negative sign of the rule's score. Inverting the "is spam"
test logic also inverts the meaning of F[PN]. Whether one likes this or
not.

It's all about context.

And FWIW, it is wrong to base your definitions on what the majority
thinks is correct. The majority and what's believed to be "common
knowledge" too often is wrong. You can observe this in real life, too...
I prefer to educate the masses instead.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by mouss <mo...@ml.netoyen.net>.

Karsten Bräckelmann wrote:
> On Fri, 2009-10-02 at 00:08 +0200, mouss wrote:
>> Karsten Bräckelmann wrote:
>>> False positive. Something, that matches (positive) the criterion for a
>>> certain test, but should not (false).
> 
> I stand to what I said.
> 

I'm not surprised:)

>> you can certainly devise a system to detect alpha(foo) where alpha is a
>> function mapping a Banach space to a Hilbert Space, and define what FP,
>> FN, FX mean in the context you consider. you can also say "let PI=69,
>> ... ". but conventions are here for a reason. they allow us to
>> understand each others more easily. the fact that children of today can
>> solve computation problems that "great scientists" of the old times
>> couldn't handle is thanks to conventions (think of a/b * c/d =
>> (a*c)/(b*d), which looks trivial today, but wasn't before).
>>
>> when talking about spam or intrusion detection, FN means "missing" and
>> FP means "false alarm". if we allow defining FN and FP differently, then
>> we'll need to rewrite a lot of books, reports, articles, ...
> 
> IFF you are talking about the black box that spam detection is, that is
> true.
> 
> If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to
> be that simple. However, it is not. You are looking at a single test,
> which -- if positive -- either is correct or wrong.
> 

I understand the rationale, but I find this too abstract for "common"
discussions.

> Same for RCVD_IN_DNSWL. If it positively matches, it either it is
> correct, or wrong. A false positive is a match, that is wrong. No matter
> the score you assign the test.
> 

except that it depends what the test really means. dnswl doesn't mean
the listed hosts never send spam. I am happy that it lists debian list
servers, Orange, ... etc.

> 
> This concept is NOT specific to spam detection, or even computer
> science. As a matter of fact, when I first really grasped the concept, a
> medical scientist explained it to me.
> 

now that you say it, this is true. I too believ that medical science has
precedence in this area.

> Yes, a FP for a rule that identifies *ham* actually evaluated positive
> on a spam. It only appears to be spam centric on this list, cause it is
> mainly dedicated to identifying spam, not ham.
> 
> You might want to ask wikipedia as well. And don't focus on the spam
> filtering *example*, which again exclusively talks about a rule
> identifying spam. Not ham.
> 

my point was that in a spam oriented forum, the meaning of some words is
what "most of us" (yes, this is hard to define) think they mean. the
principle of least astonishment.


anyway, I'm sorry for bringing the discussion to this sand. so I will
stop here (of course, offlist is ok for any discussion, including
garbage without collection:)

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Fri, 2009-10-02 at 00:08 +0200, mouss wrote:
> Karsten Bräckelmann wrote:
> > False positive. Something, that matches (positive) the criterion for a
> > certain test, but should not (false).

I stand to what I said.

> you can certainly devise a system to detect alpha(foo) where alpha is a
> function mapping a Banach space to a Hilbert Space, and define what FP,
> FN, FX mean in the context you consider. you can also say "let PI=69,
> ... ". but conventions are here for a reason. they allow us to
> understand each others more easily. the fact that children of today can
> solve computation problems that "great scientists" of the old times
> couldn't handle is thanks to conventions (think of a/b * c/d =
> (a*c)/(b*d), which looks trivial today, but wasn't before).
> 
> when talking about spam or intrusion detection, FN means "missing" and
> FP means "false alarm". if we allow defining FN and FP differently, then
> we'll need to rewrite a lot of books, reports, articles, ...

IFF you are talking about the black box that spam detection is, that is
true.

If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to
be that simple. However, it is not. You are looking at a single test,
which -- if positive -- either is correct or wrong.

Same for RCVD_IN_DNSWL. If it positively matches, it either it is
correct, or wrong. A false positive is a match, that is wrong. No matter
the score you assign the test.

This concept is NOT specific to spam detection, or even computer
science. As a matter of fact, when I first really grasped the concept, a
medical scientist explained it to me.

Yes, a FP for a rule that identifies *ham* actually evaluated positive
on a spam. It only appears to be spam centric on this list, cause it is
mainly dedicated to identifying spam, not ham.

You might want to ask wikipedia as well. And don't focus on the spam
filtering *example*, which again exclusively talks about a rule
identifying spam. Not ham.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by mouss <mo...@ml.netoyen.net>.

Karsten Bräckelmann wrote:
> On Wed, 2009-09-30 at 23:35 +0200, mouss wrote:
>> Warren Togami wrote:
>>> I scanned my spam folders and found a few false positives that hit on
>>> either DNSWL 
>> FP with DNSWL?????
>>
>> FP = False Positive = legitimaite mail tagged as spam
>> DNSWL = Whitelist
> 
> False positive. Something, that matches (positive) the criterion for a
> certain test, but should not (false).
> 
>> if your system adds points because of dnswl, you have a serious problem. ..
>>
>> or do you mean FN (false negative)?
> 
> Granted, the wording ("FPs that hit ham rules") could need some polish,
> but I believe Warren was talking about spam that falsely hits ham rules.
> 
> 

you can certainly devise a system to detect alpha(foo) where alpha is a
function mapping a Banach space to a Hilbert Space, and define what FP,
FN, FX mean in the context you consider. you can also say "let PI=69,
... ". but conventions are here for a reason. they allow us to
understand each others more easily. the fact that children of today can
solve computation problems that "great scientists" of the old times
couldn't handle is thanks to conventions (think of a/b * c/d =
(a*c)/(b*d), which looks trivial today, but wasn't before).

when talking about spam or intrusion detection, FN means "missing" and
FP means "false alarm". if we allow defining FN and FP differently, then
we'll need to rewrite a lot of books, reports, articles, ...

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Wed, 2009-09-30 at 23:35 +0200, mouss wrote:
> Warren Togami wrote:
> > I scanned my spam folders and found a few false positives that hit on
> > either DNSWL 
> 
> FP with DNSWL?????
> 
> FP = False Positive = legitimaite mail tagged as spam
> DNSWL = Whitelist

False positive. Something, that matches (positive) the criterion for a
certain test, but should not (false).

> if your system adds points because of dnswl, you have a serious problem. ..
> 
> or do you mean FN (false negative)?

Granted, the wording ("FPs that hit ham rules") could need some polish,
but I believe Warren was talking about spam that falsely hits ham rules.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by Henrik K <he...@hege.li>.

On Wed, Sep 30, 2009 at 11:35:31PM +0200, mouss wrote:
> 
> yes, you can report offending IPs, if that makes sense. for example, if
> the offending IP is that of an ISP relay, then don't report it: ISPs do
> relay spam.

Ehm.. surely you should report spam sending ISP relays if they are
miscategorized as low or higher.

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by RW <rw...@googlemail.com>.

On Sat, 03 Oct 2009 00:12:37 +0200
mouss <mo...@ml.netoyen.net> wrote:

> RW wrote:
> > On Fri, 02 Oct 2009 00:14:52 +0200
> > mouss <mo...@ml.netoyen.net> wrote:
> > 

> > The source of your confusion is that you are mixing-up the
> > terminology of the overall classification and individual test
> > results. Think of this way, in a fingerprint comparison the
> > meanings of TP, TN, FP and FN are obvious and intrinsic to the
> > test, it would be absurd to switch them around depending on whether
> > it's evidence for the defence or prosecution.
> 
> let's take it more easily: Please explain to me what was an FP in this
> thread.

A test intended for identifying ham was being hit on spam.

A hit on a rule is a positive result. When a rule hits something it's
intended to identify, it's a "true positive". When a rule hits something
it's not intended to identify, it's a "false positive", and so on.

The same terminology can be used for SpamAssassin's overall spam
classification, but that's a different matter. If you talk about a rule
hit being an FN, because it might contribute to a classification FN then
you are using the terminology like a cargo-cultist.

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by mouss <mo...@ml.netoyen.net>.

RW wrote:
> On Fri, 02 Oct 2009 00:14:52 +0200
> mouss <mo...@ml.netoyen.net> wrote:
> 
>> RW wrote:
> 
>>> The term  false-positive can apply to any test. A test for ham
>>> that matches a spam is a false-positive, it's a matter of context.
>> spam too can be (re)defined. and actually any term. but it is assumed
>> here that we talk about spam detection. so false negative means "miss"
>> and false positive means "false alarm". this is the common terminology
>> inherited from intrusion detection.
> 
> The term comes from statistics, not intrusion detection. I don't
> know much about the latter, perhaps people in that field are a little
> sloppy in their usage, more  likely all the tests are expressed as
> tests for intrusion, so the same kind of issue doesn't arise.
> 
> The source of your confusion is that you are mixing-up the terminology
> of the overall classification and individual test results. Think of
> this way, in a fingerprint comparison the meanings of TP, TN, FP and FN
> are obvious and intrinsic to the test, it would be absurd to switch
> them around depending on whether it's evidence for the defence or
> prosecution.

let's take it more easily: Please explain to me what was an FP in this
thread.

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by RW <rw...@googlemail.com>.

On Fri, 02 Oct 2009 00:14:52 +0200
mouss <mo...@ml.netoyen.net> wrote:

> RW wrote:

> > The term  false-positive can apply to any test. A test for ham
> > that matches a spam is a false-positive, it's a matter of context.
> 
> spam too can be (re)defined. and actually any term. but it is assumed
> here that we talk about spam detection. so false negative means "miss"
> and false positive means "false alarm". this is the common terminology
> inherited from intrusion detection.

The term comes from statistics, not intrusion detection. I don't
know much about the latter, perhaps people in that field are a little
sloppy in their usage, more  likely all the tests are expressed as
tests for intrusion, so the same kind of issue doesn't arise.

The source of your confusion is that you are mixing-up the terminology
of the overall classification and individual test results. Think of
this way, in a fingerprint comparison the meanings of TP, TN, FP and FN
are obvious and intrinsic to the test, it would be absurd to switch
them around depending on whether it's evidence for the defence or
prosecution.

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by mouss <mo...@ml.netoyen.net>.

RW wrote:
> On Wed, 30 Sep 2009 23:35:31 +0200
> mouss <mo...@ml.netoyen.net> wrote:
> 
>> Warren Togami wrote:
>>> I scanned my spam folders and found a few false positives that hit
>>> on either DNSWL 
>> FP with DNSWL?????
>>
>> FP = False Positive = legitimaite mail tagged as spam
>> DNSWL = Whitelist
> 
> The term  false-positive can apply to any test. A test for ham
> that matches a spam is a false-positive, it's a matter of context.

spam too can be (re)defined. and actually any term. but it is assumed
here that we talk about spam detection. so false negative means "miss"
and false positive means "false alarm". this is the common terminology
inherited from intrusion detection.

I used to have a clock that was anti-clockwise. but it was for fun. I
always understood what "clockwise" meant.

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by RW <rw...@googlemail.com>.

On Wed, 30 Sep 2009 23:35:31 +0200
mouss <mo...@ml.netoyen.net> wrote:

> Warren Togami wrote:
> > I scanned my spam folders and found a few false positives that hit
> > on either DNSWL 
> 
> FP with DNSWL?????
> 
> FP = False Positive = legitimaite mail tagged as spam
> DNSWL = Whitelist

The term  false-positive can apply to any test. A test for ham
that matches a spam is a false-positive, it's a matter of context.

Re: DNSWL and JMF White false positives, what to do exactly?

Posted by mouss <mo...@ml.netoyen.net>.

Warren Togami wrote:
> I scanned my spam folders and found a few false positives that hit on
> either DNSWL 

FP with DNSWL?????

FP = False Positive = legitimaite mail tagged as spam
DNSWL = Whitelist

if your system adds points because of dnswl, you have a serious problem. ..

or do you mean FN (false negative)?

> or JMF (HOSTKARMA?  See how confusing it is not knowing
> what to call it?)
> 
> Is there an easy automated way we can forward FP's to DNSWL and JMF so
> their maintainers can decide what to do about the offending senders?

offending? then you probably mean FN.

yes, you can report offending IPs, if that makes sense. for example, if
the offending IP is that of an ISP relay, then don't report it: ISPs do
relay spam. if on the other hand you see FNs from paypal or bank of
blahblah, then do submit.

> I'd
> attach it to mail but it might get caught in the spam filter...
> 

post the s(p)ample on a web site instead. you can use pastebin for example.