You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Marc Perkel <su...@junkemailfilter.com> on 2016/08/16 05:22:47 UTC

I have some bad news

Well, this is kind of hard to say so just going to say it. I have stage 
4 lung cancer and the probably spectrum is not good. I've been fighting 
spam for the last 15 years and I'd like to keep fighting spam from the 
grave. So I'm willing to share my technology with anyone interested.

Several months ago I talked about a new trick I came up with to fight 
spam and also positively identify good email as good. I've been running 
it now for 7 months and it is a breakthrough. At the time I had intended 
to patent it just to get enough protection to license it to the big 
boys, but now it is unlikely I'll be around long enough for that. I have 
however noticed that because of my condition people are paying attention 
to me more now that there's a deadline.

Here's my spam filtering trick. It's something that can be easily 
integrated into SpamAssassin. Being that my programming is somewhat 
sloppy at times it can probably be done even better than what I did. The 
thing to keep in mind when reading this is that it's not bayesian 
filtering. Many people in the spam filtering community make that 
mistake. This is done with set operations using Redis. Here's the link.

http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter

I'm still doing well for now and if not for this diagnosis I wouldn't 
know I was sick, And I want to get as much done in this window as 
possible. Since I live in Gilroy California I'm thinking I'd like to 
contact the spam filtering person at Google and let them continue to 
really develop what I started. So if someone could hook me up with the 
right person(s) there I would appreciate it. And I'm willing to work 
with anyone else that can make use of my work. (My way of cheating death.)

Below is a letter I wrote to EFF staff where I used to work. It 
summarizes my situation. I'm still doing well considering.


Hi Cindy,

Hate to ruin your Monday morning but I have some bad news. I have stage 
4 lung cancer and the odds are not with me. I'm slowly telling the world 
and realizing the the problem with having so many friends is that I'm 
making a lot of people very sad. And that is very difficult for me to do.

I'm dealing with it about as well as can be expected, maybe a little 
better than that. My needs are covered for now, but dealing with rolling 
out the information. Please pass this email on to the staff there. I'm 
somewhat concerned about getting too much response at once. There is no 
specific time frame for me yet but stage 4 lung is almost always fatal 
and it's more likely months and not years.

I have a lot of friends who are offering to take care of me. I have a 
paid for house, some savings, and I'm still doing well off my spam 
filtering business. I am going to be looking for someone to take over my 
small techno empire in the hopes of keeping my web sites and the people 
who I host for online. While I plan to put up a good fight if I get 2 
years that would be considered a win. Taking over my empire would be a 
great opportunity for the right person and I need to find someone to do 
that. I am unfortunately really good at what I do and might be tricky 
getting someone to take that over.

I have lived a good life. I have done more than most people have done in 
100 lifetimes. At the age of 60 I was already down to my last 1/4 tank 
so if I don't get the last 20 years I really have little to complain 
about. At this point my goals are to upload what's left of me to the 
web, which is the afterlife in my world. I have to finish up certain 
philosophical projects with my Church of Reality, which, interestingly 
enough might lead to a solution for the control problem for Artificial 
Intelligence. (Something I need to finish writing up.)

Oddly enough the idea of being dead doesn't worry me. And that might be 
the denial speaking. However the process of getting there is going to be 
overwhelming. And it's been just a week since I found out. And I'm 
exploring the idea that there might even be an upside to being terminal. 
Maybe new opportunities will open up.

I do want to say that working at EFF was some of the best times of my 
life and I really appreciate having had that opportunity. The internet 
is the new nervous system of humanity and is therefore sacred space, not 
just in a religious sense, but in a Reality based sense. To protect it 
is to protect the essence of humanity itself. The Internet is our common 
mind and it is the core of who we are as a human species. (Note to legal 
team, I think there is a legal argument opportunity in this statement.)

A person's story is everything they do from the moment they are born to 
the moment they die. And then your story is the effect you had on 
advancing the evolution of life from what we were, to what we are, to 
what we will become. So my story will become part of the story of 
humanity, which is part of the story of life on this planet, and part of 
the story of the universe. And with the internet the essence of who I am 
and what makes my existence have meaning will be preserved.

I have always believed that if a person decides to "own their story" and 
choose to live a life worth living that when they are faced with the end 
of their personal existence it would be much easier. And now that I am 
there I can say it is definitely true. I have not lived a perfect life 
and looking back there are quite a few things where I could have made a 
better choice. But at this point I'm feeling unusually positive about my 
situation as my last adventures unfold.

While I have spent much of my life writing software for cyberspace I 
have also written quite a bit of software for meat space. This email is 
an example of that. Meat space is coded in ideas and philosophies and 
I'm hoping in the time I have left to see what else I can accomplish. 
Facing death definitely sharpens the mind so I'm going to take advantage 
of that.

I suppose I'll wrap this up here as I can ramble on forever. And forever 
isn't as quite long as it used to be.

Marc Perkel
/root

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Posted by Marc Perkel <su...@junkemailfilter.com>.


On 08/17/16 03:51, Antony Stone wrote:
> On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote:
>
>> What I'm doing is looking for fingerprints in email that intersect HAM
>> and not in SPAM - which would be a HAM result.
>> If it matches SPAM and does NOT match HAM - then it's SPAM.
>>
>> The magic is in the NOT matching on the other side.
>>
>> So if I say to you, "Let's get some lunch" that's ham because spammers
>> never say that, but normal people do. So the way to test what "spammers
>> never say" is to store what they do say and see if it's NOT in the list.
>> (Thus the infinite set)
> What length are the tokens you store in the list?  Single words (so the above
> lunch example would contain 4 tokens)?  Entire phrases (so the above would be
> just 1 token)?  Also how do you deal with spam which contains random cuttings
> from legitimate texts (generally along with a graphic attachment and/or a URL
> to get aross the "real" message)?

I tokenize a lot of different things but the fingerprints are at most 3 
to 4 tokens long. If you go more then you get a database that's too big. 
And in the body I'm just looking at the first 50 words, and a "concept 
parser" that looks at the whole body.

http://wiki.junkemailfilter.com/index.php/Concept_Parsing_Spam_Filter

>
>> Similarly, there's only so many ways to misspell viagra, and good email
>> wouldn't have it spelled wrong.
> Does this mean that people with bad spelling will more likely get classified as
> spam, because they do not match the 'ham' group very well?
No - unless they misspell a lot of words the same way spammers misspell 
it. If a spammer isn't misspelling the same way and normal people are - 
it can count as ham - or be ignored.

>
> Also, what happens to mail contains lots of tokens which match neither set
> (for example, perfectly legitimate email which happens to be in a language the
> system hasn't been trained with)?
Mail that doesn't match either side produces no score.

>
>
> Antony.
>

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Posted by sh...@shanew.net.

I'm finding this discussion interesting, because I've been trying to
wrap my head around the theoretical basis of this system.  As such,
I've noticed that several questions have been asked now that are
explained in the document Marc initially pointed to
(http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter).
Given Marc's situation, it seems reasonable to read that document
before asking too many questions.

As a way to (maybe) save Marc some time, test my own knowledge and
perhaps help move the conversation forward, I'm going to summarize
the questions I've seen so far and, as much as possible, the answers
to those questions (and Marc, correct me if I'm getting anything wrong
here):

- How do you classify an email that has tokens from both the ham and
spam set?
Whichever set (out of "only found in ham" and "only found in spam") is
larger (or "better") determines the final classification.

- What length are the tokens?
Marc's examples use multiple length tokens, capturing everything
between 1 and 4 "words", but I suspect the exact maximum token
length might be adjustable.

- What happens when spammers use "hammy" text to avoid detection?
I don't see this directly addressed, but I would guess there are
several things that mitigate against this.  Multi-word tokens
prevent the truly random word salad attempts at poisoning, and
probably help with "cuttings" from other texts because the transition
from one cutting to the next probably doesn't appear in ham, leaving
the "spam-only" aspects of the mail to push it towards a spam
classification.  The unlearning and expiration of fingerprints would
mean that such cuttings would have to appear repeatedly over time in
legitimate mail to tip an email toward a ham classification.

- Will bad spellers (or typists) be seen as spammier?
Again, I don't see this addressed specifically, but I don't think so,
unless they are such tremendously bad spellers that nearly every word
is misspelled.  To take the "let's get some lunch" example, even if I
accidentally mis-type "some" as "som", I still have other tokens to
compare against, and the tokens "som", "get som", "som lunch", "let's
get som", etc. would have to have appeared in spam (and only spam) to
pull the classification toward spam.  So I'd say the occasional typo
or misspelling would come up neutral.

- What happens to messages that have a lot of neutral tokens?
Now I'm really speculating, but unless every token is neutral, there's
still something to decide on, though it does seem that detection
becomes less reliable as the number of non-neutral tokens appraches
zero.  A similar question that I thought of is what happens to
messages where the the final sets "only found in spam" and "only found
in ham" are nearly (or exactly) the same size.  If you're using this
filter as part of SA scoring, the answer would seem to be that you
have an appropriately small score for "undetermined" (like bogofilter
does), but if it's acting as a separate filter, I don't know.

On Wed, 17 Aug 2016, Antony Stone wrote:

> On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote:
>
>> What I'm doing is looking for fingerprints in email that intersect HAM
>> and not in SPAM - which would be a HAM result.
>> If it matches SPAM and does NOT match HAM - then it's SPAM.
>>
>> The magic is in the NOT matching on the other side.
>>
>> So if I say to you, "Let's get some lunch" that's ham because spammers
>> never say that, but normal people do. So the way to test what "spammers
>> never say" is to store what they do say and see if it's NOT in the list.
>> (Thus the infinite set)
>
> What length are the tokens you store in the list?  Single words (so the above
> lunch example would contain 4 tokens)?  Entire phrases (so the above would be
> just 1 token)?  Also how do you deal with spam which contains random cuttings
> from legitimate texts (generally along with a graphic attachment and/or a URL
> to get aross the "real" message)?
>
>> Similarly, there's only so many ways to misspell viagra, and good email
>> wouldn't have it spelled wrong.
>
> Does this mean that people with bad spelling will more likely get classified as
> spam, because they do not match the 'ham' group very well?
>
> Also, what happens to mail contains lots of tokens which match neither set
> (for example, perfectly legitimate email which happens to be in a language the
> system hasn't been trained with)?
>
>
> Antony.
>
>

-- 
Public key #7BBC68D9 at            |                 Shane Williams
http://pgp.mit.edu/                |      System Admin - UT CompSci
=----------------------------------+-------------------------------
All syllogisms contain three lines |              shanew@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: I have some bad news

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote:

> What I'm doing is looking for fingerprints in email that intersect HAM
> and not in SPAM - which would be a HAM result.
> If it matches SPAM and does NOT match HAM - then it's SPAM.
> 
> The magic is in the NOT matching on the other side.
> 
> So if I say to you, "Let's get some lunch" that's ham because spammers
> never say that, but normal people do. So the way to test what "spammers
> never say" is to store what they do say and see if it's NOT in the list.
> (Thus the infinite set)

What length are the tokens you store in the list?  Single words (so the above 
lunch example would contain 4 tokens)?  Entire phrases (so the above would be 
just 1 token)?  Also how do you deal with spam which contains random cuttings 
from legitimate texts (generally along with a graphic attachment and/or a URL 
to get aross the "real" message)?

> Similarly, there's only so many ways to misspell viagra, and good email
> wouldn't have it spelled wrong.

Does this mean that people with bad spelling will more likely get classified as 
spam, because they do not match the 'ham' group very well?

Also, what happens to mail contains lots of tokens which match neither set 
(for example, perfectly legitimate email which happens to be in a language the 
system hasn't been trained with)?

Antony.

-- 
Wanted: telepath.   You know where to apply.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: I have some bad news

Posted by Marc Perkel <su...@junkemailfilter.com>.

On 08/17/16 03:43, Matus UHLAR - fantomas wrote:
> On 16.08.16 20:06, Marc Perkel wrote:
>> What I'm doing is looking for fingerprints in email that intersect 
>> HAM and not in SPAM - which would be a HAM result.
>> If it matches SPAM and does NOT match HAM - then it's SPAM.
>>
>> The magic is in the NOT matching on the other side.
>
> so, if mail matches both hammy and spammy tokens (or token sets), you 
> don't
> classify at all?
>

On that fingerprint is it matches both it creates no score on that item. 
The idea is to generate a lot of fingerprints so that something scores. 
If you look at enough stuff to generate hundreds of fingerprints and you 
have big reference corpi then you will usually get a result on 
something. Usually a big result in one direction.

But ignoring if it's in both makes it more immune to poisoning.

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 17.08.16 11:02, Marc Perkel wrote:
>For what it's worth I have noticed that people who are familiar with 
>Bayesian filtering seem to have a mental block when it comes to 
>understanding this. People who know nothing about bayesian get it 
>instantly. Here's the actual formula.
>
>card(Test_message intersect Spam diff Ham) minus card(Test_message intersect Ham diff Spam)

I guess it's because people who are familiar with bayesian filtering say
"this is the same as bayes, just tweaked"

while people who are not think it's really a new idea.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
We are but packets in the Internet of life (userfriendly.org)

Re: I have some bad news

Posted by Marc Perkel <su...@junkemailfilter.com>.

For what it's worth I have noticed that people who are familiar with 
Bayesian filtering seem to have a mental block when it comes to 
understanding this. People who know nothing about bayesian get it 
instantly. Here's the actual formula.

card(Test_message intersect Spam diff Ham) minus card(Test_message intersect Ham diff Spam)



On 08/17/16 09:16, Shawn Bakhtiar wrote:
>
>> On Aug 17, 2016, at 3:43 AM, Matus UHLAR - fantomas 
>> <uhlar@fantomas.sk <ma...@fantomas.sk>> wrote:
>>
>> On 16.08.16 20:06, Marc Perkel wrote:
>>> What I'm doing is looking for fingerprints in email that intersect 
>>> HAM and not in SPAM - which would be a HAM result.
>>> If it matches SPAM and does NOT match HAM - then it's SPAM.
>>>
>>> The magic is in the NOT matching on the other side.
>>
>> so, if mail matches both hammy and spammy tokens (or token sets), you 
>> don't
>> classify at all?
>>
>
> I guess what is confusing me (and I imagine others, as alluded to by 
> Matus) is the fact that you are describing a special condition 
> of Bayes' probability theorem. You are testing two variables (match 
> SPAM and match HAM) (not matching is simply the negation of matching) 
> thus giving you four conditions:
>
> 1) SPAM  &&HAM
> 2) SPAM  &&~HAM
> 3) ~SPAM &&HAM
> 4) ~SPAM &&~HAM
>
> Here is a great diagram to show the four probable conditions:
> https://en.wikipedia.org/wiki/Bayes%27_theorem#/media/File:Bayes%27_Theorem_2D.svg
>
> So (if I am correct) Matus is asking what if condition 1 is true? How 
> are you classifying an email than? Which is often the state of most 
> emails, and thus why the use of Naive Bayes spam filtering, which 
> generates a probability based on Bayes' probability theorem and is the 
> conventional methodology to date. A Rose by any other name....
>
> Condition 4 is obvious it's nothing you have ever seen so classifying 
> it anything other than HAM would be a huge mistake (IMHO), and fully 
> covered by the aforementioned theorem as the probability of SPAM would 
> (should) be 0. Same with Condition 3, obviously it never hits SPAM so 
> wether it matches HAM or not you're going to treat it as HAM anyway 
> same as condition 4.
>
> That leaves condition 2. Which (if I'm not mistaken) is "... it 
> matches SPAM and does NOT match HAM - then it's SPAM.". Which brings 
> us back to Matus question, what if the email contains a single HAM 
> token? Two HAM tokens? This is exactly what Bayes' probability theorem 
> is designed for. All you are doing is defining a special condition in 
> which the HAM probability is ZERO.
>
> I think that's were I need to understand a bit more about what HAM 
> means in this solution, does getting a hit on HAM somehow negate it 
> being SPAM completely? In other words if the email contains some set 
> of tokens that are SPAM, yet only one HAM token, that single HAM token 
> makes it not SPAM? If so, you have a long way to go in convincing me 
> that this is a good solution.
>
>>> So if I say to you, "Let's get some lunch" that's ham because 
>>> spammers never say that, but normal people do. So the way to test 
>>> what "spammers never say" is to store what they do say and see if 
>>> it's NOT in the list. (Thus the infinite set)
>>>
>
> Actually I get SPAM with that very set of tokes in it. If somehow the 
> HAM rating of it overrides the SPAM, I don't believe it would have a 
> desirable effect.
>
> I get plenty of:
>
> "
> Hay Shawn,
>
> Hope you have time to do some lunch, click on this link and check out 
> my new pictures!
>
> Wannabe Phisher
> "
>
> Based on your example there's plenty of HAM and SPAM tokens in there, 
> "Click on this link" high probability of SPAM-e-ness, would it get 
> HAMed based on "hope you have time to do lunch". Or am I missing 
> something?
>
>
>>> Similarly, there's only so many ways to misspell viagra, and good 
>>> email wouldn't have it spelled wrong.
>>>
>>> Does that make sense?
>>
>
> Again, what you are saying makes sense in that it is special condition 
> of the probability theory, What does not make sense is why would you 
> not simply use the probability theory, that already encompasses that 
> condition?
>
>> -- 
>> Matus UHLAR - fantomas, uhlar@fantomas.sk <ma...@fantomas.sk> 
>> ; http://www.fantomas.sk/
>> Warning: I wish NOT to receive e-mail advertising to this address.
>> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
>> Linux - It's now safe to turn on your computer.
>> Linux - Teraz mozete pocitac bez obav zapnut.
>

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Posted by Shawn Bakhtiar <sh...@hotmail.com>.

On Aug 17, 2016, at 3:43 AM, Matus UHLAR - fantomas <uh...@fantomas.sk>> wrote:

On 16.08.16 20:06, Marc Perkel wrote:
What I'm doing is looking for fingerprints in email that intersect HAM and not in SPAM - which would be a HAM result.
If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

so, if mail matches both hammy and spammy tokens (or token sets), you don't
classify at all?

I guess what is confusing me (and I imagine others, as alluded to by Matus) is the fact that you are describing a special condition of Bayes' probability theorem. You are testing two variables (match SPAM and match HAM) (not matching is simply the negation of matching) thus giving you four conditions:

1) SPAM && HAM
2) SPAM && ~HAM
3) ~SPAM && HAM
4) ~SPAM && ~HAM

Here is a great diagram to show the four probable conditions:
https://en.wikipedia.org/wiki/Bayes%27_theorem#/media/File:Bayes%27_Theorem_2D.svg<https://en.wikipedia.org/wiki/Bayes'_theorem#/media/File:Bayes'_Theorem_2D.svg>

So (if I am correct) Matus is asking what if condition 1 is true? How are you classifying an email than? Which is often the state of most emails, and thus why the use of Naive Bayes spam filtering, which generates a probability based on Bayes' probability theorem and is the conventional methodology to date. A Rose by any other name....

Condition 4 is obvious it's nothing you have ever seen so classifying it anything other than HAM would be a huge mistake (IMHO), and fully covered by the aforementioned theorem as the probability of SPAM would (should) be 0. Same with Condition 3, obviously it never hits SPAM so wether it matches HAM or not you're going to treat it as HAM anyway same as condition 4.

That leaves condition 2. Which (if I'm not mistaken) is "... it matches SPAM and does NOT match HAM - then it's SPAM.". Which brings us back to Matus question, what if the email contains a single HAM token? Two HAM tokens? This is exactly what Bayes' probability theorem is designed for. All you are doing is defining a special condition in which the HAM probability is ZERO.

I think that's were I need to understand a bit more about what HAM means in this solution, does getting a hit on HAM somehow negate it being SPAM completely? In other words if the email contains some set of tokens that are SPAM, yet only one HAM token, that single HAM token makes it not SPAM? If so, you have a long way to go in convincing me that this is a good solution.

So if I say to you, "Let's get some lunch" that's ham because spammers never say that, but normal people do. So the way to test what "spammers never say" is to store what they do say and see if it's NOT in the list. (Thus the infinite set)

Actually I get SPAM with that very set of tokes in it. If somehow the HAM rating of it overrides the SPAM, I don't believe it would have a desirable effect.

I get plenty of:

"
Hay Shawn,

Hope you have time to do some lunch, click on this link and check out my new pictures!

Wannabe Phisher
"

Based on your example there's plenty of HAM and SPAM tokens in there, "Click on this link" high probability of SPAM-e-ness, would it get HAMed based on "hope you have time to do lunch". Or am I missing something?

Similarly, there's only so many ways to misspell viagra, and good email wouldn't have it spelled wrong.

Does that make sense?

Again, what you are saying makes sense in that it is special condition of the probability theory, What does not make sense is why would you not simply use the probability theory, that already encompasses that condition?

--
Matus UHLAR - fantomas, uhlar@fantomas.sk<ma...@fantomas.sk> ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.

Re: I have some bad news

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 16.08.16 20:06, Marc Perkel wrote:
>What I'm doing is looking for fingerprints in email that intersect 
>HAM and not in SPAM - which would be a HAM result.
>If it matches SPAM and does NOT match HAM - then it's SPAM.
>
>The magic is in the NOT matching on the other side.

so, if mail matches both hammy and spammy tokens (or token sets), you don't
classify at all?

>So if I say to you, "Let's get some lunch" that's ham because 
>spammers never say that, but normal people do. So the way to test 
>what "spammers never say" is to store what they do say and see if 
>it's NOT in the list. (Thus the infinite set)
>
>Similarly, there's only so many ways to misspell viagra, and good 
>email wouldn't have it spelled wrong.
>
>Does that make sense?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.

Re: I have some bad news

Posted by Marc Perkel <su...@junkemailfilter.com>.

Hi Shawn,

What I'm doing is looking for fingerprints in email that intersect HAM 
and not in SPAM - which would be a HAM result.
If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

So if I say to you, "Let's get some lunch" that's ham because spammers 
never say that, but normal people do. So the way to test what "spammers 
never say" is to store what they do say and see if it's NOT in the list. 
(Thus the infinite set)

Similarly, there's only so many ways to misspell viagra, and good email 
wouldn't have it spelled wrong.

Does that make sense?


On 08/16/16 12:57, Shawn Bakhtiar wrote:
> Marc,
>
> Let me first say I am truly sorry to here about your cancer. I lost my 
> father to cancer just over a decade ago, after a long battle with 
> sarcoma of the throat and tongue. So I pray and wish you the best.
>
> I sent this to you in January 2016 (don't recall if I ever got a reply 
> to it) but based on your document:
>
> /Set theory is not my strongest suit,  but your diagram looks incorrect:/
> /http://www.junkemailfilter.com/patent/patent5.pdf/
> /
> /
> /Let:/
> /
> /
> /H be ham /
> /S be spam /
> /E be an email/
> /
> /
> /Than you state that:/
> /HE = (H u E)/
> /SE = (S u E)/
> /
> /
> /But than the next diagram shows that there is some solution in which 
> (HE u SE) and thus there may be some set which is (HE / SE). Even 
> though in the first diagram S and H do not intersect./
> /
> /
> /This is not logical. Either (H u S) in which there are tokens common 
> to the ham and spam token sets, or it does not, so which is it?? in 
> other words, if a token is both ham and spam how are you calculating 
> it\u2019s weight?? Is it spam or ham? /
> /
> /
> /Clearly it\u2019s the latter (they do not intersect) as described in this:/
> /http://www.junkemailfilter.com/patent/patent2.pdf/
> /
> /
> /In which case you are simply looking to see if (H u E) > (S u E) and 
> has nothing to do with what is not in the set, and there is indeed no 
> (H u S) or the negation or NOT which is (H / S), so as everyone has 
> been trying to explain it has NOTHING to do with what is NOT matched./
> /
> /
> /By they way, you can\u2019t match an infinite set (well theoretically but 
> not actually). /
> /https://en.wikipedia.org/wiki/Intersection_(set_theory)/ 
> <https://en.wikipedia.org/wiki/Intersection_%28set_theory%29>
> /
> /
> /Since the current Bayes learns both SPAM and HAM I imagine that it 
> does a very similar thing, other than perhaps the larger multi word 
> token sets, which seems a trivial thing to add, and available in other 
> tool sets. /
>
>
> I'll only add this, if you believe that your SPAM has been greatly 
> reduced. That's awesome! But have you really isolated it to this "new 
> technique" or in playing around have you inadvertently changed 
> something else that may have changed your results?
>
> I am also not saying that you have not developed some "new technique", 
> but that if you have, your description of it does not line up 
> logically with the technique itself. Back in January you were looking 
> to patent it, today you simply want it to live on. I suggest that if 
> it is indeed the latter, than perhaps it's time to release the source 
> code/scripts and let a few more eyes look at the logic to see exactly 
> what is it doing, that you believe is so different than what is out there.
>
> Again, I pray and hope the best for you,
> Shawn
>
>
>
>
>> On Aug 16, 2016, at 6:45 AM, Marc Perkel <support@junkemailfilter.com 
>> <ma...@junkemailfilter.com>> wrote:
>>
>> Thanks for the encouragement Ted. Unfortunately I know way too much 
>> about mathematics and I have a deep understanding of probability 
>> spectrums. There's a curve and I'm going to be somewhere on it. If 
>> I'm lucky I might be here for some time. But my life is a casino 
>> right now. And yes - there is also a probability spectrum for any of 
>> us getting hit by a bus tomorrow as well. SpamAssassin is based on 
>> statistical probabilities.
>>
>> I have to have a dual track strategy. One one hand I need to do what 
>> I can to move the curve into the future. But at the same time I need 
>> to accomplish thing that are important within a limited time slot as 
>> well.
>>
>> Spam filtering isn't just another job to me. I actually have a 
>> passion for it. On a philosophical basis I look at the internet as 
>> the new nervous system for humanity and is now core to who we are as 
>> a species. And email is a very key technology in that nervous system.
>>
>> In that context spam is like poison where predators suck some of the 
>> life out of humanity, and my real life has always been about the 
>> progress of the human race.
>>
>> I am somewhat of a spam fighting savant. I actually run very little 
>> of my email through SpamAssassin, truth be told. Over the years I've 
>> thrown some ideas into the mix and sometimes they have been adopted 
>> to make SA better. Sometimes I just get shouted down by trolls and 
>> the ideas go no where.
>>
>> At this point however there's a deadline and I have ideas that could 
>> be implemented in SA very very easily. In fact it was through SA that 
>> I discovered Redis, and SA already talks to redis.
>>
>> Although my innovation is excellent as a programmer I'm mediocre. 
>> Never worked as a team. Easily frustrated. Probably somewhat autistic 
>> and somewhat arrogant. So mostly living in my own world doing my own 
>> development. I have my little online empire. I work from home. I make 
>> a great living. And I really like (most of) my customers and enjoy 
>> doing tech support. And it's allowed me a lot of free time to do 
>> things that I'm really interested in.
>>
>> But my ideas are now my immortality, so I'm now releasing this to the 
>> world. And mostly this simple AI method that SA could easily implement.
>>
>> This new spam filtering trick is not only extremely effective, it's 
>> extremely simple. I had it working in 2 days. The developers here 
>> could probably implement it in 1 day. (At least the core 
>> functionality) And with a team of better programmers probably do a 
>> better job and get a even better result than I get. In fact you don't 
>> need or even want my sloppy code (not in Perl). All you need is to 
>> read the description of how it works and once you get it - coding it 
>> is trivial.
>>
>> So - this is an opportunity to milk the mind of the dying spam 
>> savant. It works, it's easy, and I'm just handing it to you all. 
>> There is no reason I would be making this up. All you all need to do 
>> is accept this gift.
>>
>>
>> On 08/16/16 01:03, Ted Mittelstaedt wrote:
>>> Hi Marc,
>>>
>>>  Back in 1994 I was diagnosed with testicular cancer, it was 
>>> essentially "stage 4" as it had metastasized throughout my body.
>>>
>>>  But, it responded to chemo and here I am today.  In fact ironically
>>> my original oncologist died a few years ago - on a fishing trip he had
>>> an accident and drowned.
>>>
>>>  The Universe has an interesting sense of humor and likes to throw
>>> curve balls.  Take what you have been told about your "probability
>>> spectrum" and toss it in the trash - hakuna matata.   You could 
>>> accidentally step in front of a bus tomorrow and be dead.   You could
>>> live another 20 years.   Statistics on people only have meaning on
>>> large groups of people - they are irrelevant when it comes to the
>>> individual.
>>>
>>>  I've met a number of people who had serious cancers.  And I learned
>>> one thing from that.   The people who survived - every one of them,
>>> fighters.  And everyone fights differently.  Some get on the food 
>>> bandwagon and try overdosing on green tea and every alleged 
>>> anti-cancer food out there.  Others jump into yoga, and I knew one 
>>> guy who went out and binged watched Monty Python to spend as much 
>>> time laughing as possible.  Me, I fought on a more mental approach. 
>>>  I dropped everything in my life that I was not completely satisfied 
>>> with - I turned my back on my job, my apartment, etc. - every burden 
>>> or responsibility that I had which I didn't like and didn't really 
>>> want - and dove into the treatment, and I never let myself believe I 
>>> was in any danger of dying.
>>>
>>>  Of course, not all who fight, survive.  But I will say with absolute
>>> conviction that everyone I ever met who had a serious cancer and had
>>> that "attitude of acceptance", later died.  You are a fighter or you
>>> wouldn't even be here.  Now, fight to win.
>>>
>>> Ted
>>>
>>>
>>
>> -- 
>> Marc Perkel - Sales/Support
>> support@junkemailfilter.com <ma...@junkemailfilter.com>
>> http://www.junkemailfilter.com
>> Junk Email Filter dot com
>> 415-992-3400
>>
>

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Sunday 21 August 2016 at 21:22:38, Damian wrote:

> Am 21.08.2016 um 18:47 schrieb Marc Perkel:
> > Actually - you can match an infinite set. And maybe this is what it's
> > hard for some people to wrap their head around.
> > 
> > Suppose set A contains 2 items, apples and oranges.
> > So we define set B as everything in the universe that is not in set A.
> > So set B is an infinite set, everything in the universe EXCEPT apples
> > and oranges.
> 
> There is no such set B, as it would contain itself.

In that case try the definition: "B contains all possible email tokens which 
are not in set A", thus excluding sets themselves from being members of B.


Antony.

-- 
This sentence contains exacly three erors.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Matching infinite sets

Posted by Martin Gregorie <ma...@gregorie.org>.

On Sun, 2016-08-21 at 16:56 -0400, Dianne Skoll wrote:
> On Sun, 21 Aug 2016 21:22:38 +0200
> Damian <sp...@arcsin.de> wrote:
> 
> > 
> > > 
> > > So we define set B as everything in the universe that is not in
> > > set
> > > A. So set B is an infinite set, everything in the universe EXCEPT
> > > apples and oranges.
> > 
> > There is no such set B, as it would contain itself.
> And... why can't a set contain itself?
> 
Because recursive sets are off topic.�

At least, I assume that if Marc had meant to include recursion he would
have said so.


Martin

Re: Matching infinite sets

Posted by Joe Quinn <jq...@pccc.com>.

On 8/21/2016 5:55 PM, Sidney Markowitz wrote:
> Dianne Skoll wrote on 22/08/16 8:56 AM:
>> And... why can't a set contain itself?
>>
> It can't in standard modern set theory (ZFC), through the foundation axioms,
> also known as the axiom of regularity
>    https://en.wikipedia.org/wiki/Axiom_of_regularity
> which is a formulation that allows set theory to avoid Russell's Paradox.
> (see also https://en.wikipedia.org/wiki/ZFC)
>
> Just like Euclidean Geometry has the axiom that parallel lines never meet, and
> you get various non-euclidean geometries by changing that axiom, there are
> non-standard set theories that do not include the axiom of regularity, in
> which there can be sets that include themselves.
>
> None of that is relevant to the discussion of Marc Perkel's ideas because he
> is talking about sets of tokens from email (or sets of potential tokens?) not
> sets that contain sets. And all he needs to do with his infinite sets is be
> able to test if a token is in it, which is easy to do since the set is defined
> as the complement of a finite set. (I'm not saying this to agree with the
> method as good or to argue against it. I'm one of those people he mentions who
> understands how Bayesian spam filtering works who has yet to wrap my head
> around what he is presenting - For now I'm staying agnostic about it until I
> do understand it better).
>
>   Sidney
This is a good summary. As a fun theoretical side-note, ZFC can be 
interpreted as a type theory and then used as a way to reason about the 
behavior of programs. One of its major weaknesses is that it's possible 
to formulate exactly this sort of issue where a set can contain other 
sets of unknown depth. This corresponds to untyped programming languages 
and is almost always resolved by formalizations that correspond to 
adding a type system (as your last paragraph does).

But back to discussing Bayes... ;)

Re: Matching infinite sets

Posted by RW <rw...@googlemail.com>.

On Mon, 22 Aug 2016 09:55:10 +1200
Sidney Markowitz wrote:

>  I'm one of those people he mentions who understands
> how Bayesian spam filtering works who has yet to wrap my head around
> what he is presenting - For now I'm staying agnostic about it until I
> do understand it better).

What it amounts to is:

Training: 

- tokenize a corpus of spam and ham 
- compile a list of tokens that occur only in spam and a list of
  tokens that only occur in ham

Classification:

- Tokenize the email
- count how many of the tokens are in each of the two list
- compare the two counts

In Bayes, if you set Robinson's S parameter to 0, then tokens that only
occur in spam or ham get a token probability of exactly 1 and 0
respectively. 

Tokens that have been seen in both spam and ham get a probability
between 0 and 1. So if you then set MIN_PROB_STRENGTH to 0.5 you can
discard all of these. 

All of the remaining tokens have probabilities of 0 or 1 so running
them through the chi-squared calculation (or any sensible symmetric
combining algorithm) and then comparing the result to 0.5  gives the
same result as comparing the number of spam-only and ham-only tokens.

In short it's mathematically equivalent to Bayes with different
tokenization and different constants; and on the face of it
the values of S and MIN_PROB_STRENGTH are very sub-optimal. 

OTOH it wouldn't surprise me if the tokenization is much better.

Re: Matching infinite sets

Posted by Sidney Markowitz <si...@sidney.com>.

Dianne Skoll wrote on 22/08/16 8:56 AM:
> And... why can't a set contain itself?
> 

It can't in standard modern set theory (ZFC), through the foundation axioms,
also known as the axiom of regularity
  https://en.wikipedia.org/wiki/Axiom_of_regularity
which is a formulation that allows set theory to avoid Russell's Paradox.
(see also https://en.wikipedia.org/wiki/ZFC)

Just like Euclidean Geometry has the axiom that parallel lines never meet, and
you get various non-euclidean geometries by changing that axiom, there are
non-standard set theories that do not include the axiom of regularity, in
which there can be sets that include themselves.

None of that is relevant to the discussion of Marc Perkel's ideas because he
is talking about sets of tokens from email (or sets of potential tokens?) not
sets that contain sets. And all he needs to do with his infinite sets is be
able to test if a token is in it, which is easy to do since the set is defined
as the complement of a finite set. (I'm not saying this to agree with the
method as good or to argue against it. I'm one of those people he mentions who
understands how Bayesian spam filtering works who has yet to wrap my head
around what he is presenting - For now I'm staying agnostic about it until I
do understand it better).

 Sidney

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Sun, 21 Aug 2016 21:22:38 +0200
Damian <sp...@arcsin.de> wrote:

> > So we define set B as everything in the universe that is not in set
> > A. So set B is an infinite set, everything in the universe EXCEPT
> > apples and oranges.

> There is no such set B, as it would contain itself.

And... why can't a set contain itself?

Regards,

Dianne.

Re: Matching infinite sets

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Monday 22 August 2016 at 15:04:49, Marc Perkel wrote:

> I'm confused by the confusion here.
> 
> Set A - a  finite set - has some members,
> Set B - and infinite set - is everything that is NOT in Set A
> 
> So you match a test item to Set A and if it matches it's a member of A.
> If it doesn't match Set A it's a member of B.
> 
> How is this not really simple?

Because "everything that is NOT in Set A" means some surprisingly complicated 
things to some people, and which I believe for the purposes of your spam 
identifier are irrelevant.

It might keep the pedants happier if you were to identify the sets as:

Set A contains some email tokens.

Set B contains all possible email tokens which are not in Set A.

This then precludes the possibility that Set B might contain itself, since a 
set is not a plausible email token.

Antony.

-- 
I just got a new mobile phone, and I called it Titanic.  It's already syncing.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Matching infinite sets

Posted by Christian Grunfeld <ch...@gmail.com>.

What you are trying to do is to identify a source of messages by its
entropy....supposed the entropy of a ham source is distinguishable from a
spam one...

2016-08-22 13:48 GMT-03:00 Antony Stone <
Antony.Stone@spamassassin.open.source.it>:

> On Monday 22 August 2016 at 18:00:35, Marc Perkel wrote:
>
> > On 08/22/16 07:37, Antony Stone wrote:
> > >
> > > So what makes "cheapest Viagra online" a token, such that "cheapest"
> and
> > > "online" are not tokens?
> >
> > They would all be tokens. Just pointing out one that would match spam
> > and not match ham. "cheapest" and "online" would likely be in both sets
> > and would be ignored.
>
> Hm, that doesn't tie up with your earlier reply:
>
> On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:
>
> > On 08/22/16 07:28, Dianne Skoll wrote:
> > > On Mon, 22 Aug 2016 07:16:41 -0700
> > >
> > > As far as I understand your algorithm, if an email contains at least
> one
> > > token in the "ham" set and zero tokens in the "spam" set, you classify
> it
> > > as ham.  And conversely, if it contains at least one spam token but
> zero
> > > ham tokens, you classify it as spam.
> >
> > YES! YES! YES!
>
> Er, really?  See below.
>
> > Although I look at some thousand "fingerprints" to get a more
> > significant result.
> >
> > > The other two possibilities (no tokens in either or some tokens in
> both)
> > > are undecidable.
> >
> > Exactly!
>
> So, it's not that "if an email contains at least one token in the 'ham' set
> and zero tokens in the 'spam' set, you classify it as ham".
>
> You in fact ignore any tokens in the email which are in both the 'ham' and
> 'spam' sets, and then - what - work out which set contains more of the
> left-
> over tokens?
>
>
> Antony.
>
> --
> Pavlov is in the pub enjoying a pint.
> The barman rings for last orders, and Pavlov jumps up exclaiming "Damn!  I
> forgot to feed the dog!"
>
>                                                    Please reply to the
> list;
>                                                          please *don't* CC
> me.
>

Re: Matching infinite sets

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Monday 22 August 2016 at 18:00:35, Marc Perkel wrote:

> On 08/22/16 07:37, Antony Stone wrote:
> > 
> > So what makes "cheapest Viagra online" a token, such that "cheapest" and
> > "online" are not tokens?
>
> They would all be tokens. Just pointing out one that would match spam
> and not match ham. "cheapest" and "online" would likely be in both sets
> and would be ignored.

Hm, that doesn't tie up with your earlier reply:

On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:

> On 08/22/16 07:28, Dianne Skoll wrote:
> > On Mon, 22 Aug 2016 07:16:41 -0700
> > 
> > As far as I understand your algorithm, if an email contains at least one
> > token in the "ham" set and zero tokens in the "spam" set, you classify it
> > as ham.  And conversely, if it contains at least one spam token but zero
> > ham tokens, you classify it as spam.
> 
> YES! YES! YES!

Er, really?  See below.

> Although I look at some thousand "fingerprints" to get a more
> significant result.
> 
> > The other two possibilities (no tokens in either or some tokens in both)
> > are undecidable.
> 
> Exactly!

So, it's not that "if an email contains at least one token in the 'ham' set 
and zero tokens in the 'spam' set, you classify it as ham".

You in fact ignore any tokens in the email which are in both the 'ham' and 
'spam' sets, and then - what - work out which set contains more of the left-
over tokens?


Antony.

-- 
Pavlov is in the pub enjoying a pint.
The barman rings for last orders, and Pavlov jumps up exclaiming "Damn!  I 
forgot to feed the dog!"

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.


On 08/22/16 07:37, Antony Stone wrote:
> On Monday 22 August 2016 at 16:34:09, Marc Perkel wrote:
>
>> OK - Trying to make the really simple. Just talking about concept now.
>>
>> Let's say I get an email where the subject is "I have aednocarsonoma of
>> the lung".
>>
>> Right off you know it's ham because spammers never use the word
>> "aednocarsonoma" and normal people do. Spammer also never use:
>>
>> "of the lung"
>> "the lung"
>> "aednocarsonoma of"
> How do you create those boundaries to define the tokens?

Here's an example:

"the quick brown fox jumps over the lazy dog"

becomes ...

"the" "quick" "the quick" "brown" "quick brown" "the quick brown" "fox" "brown fox" "quick brown fox"
"the quick brown fox" "jumps" "fox jumps" "brown fox jumps" "quick brown fox jumps" "over" "jumps over"
"fox jumps over" "brown fox jumps over" "the" "over the" "jumps over the" "fox jumps over the"
"lazy" "the lazy" "over the lazy" "jumps over the lazy" "dog" "lazy dog" "the lazy dog" "over the lazy dog"






>
>> ....
>>
>> So - tell me you follow this so far. Spammers don't spam about
>> aednocarsonoma.
>>
>> In this case I'm identifying ham because in some previous email people
>> were talking about lung cancer and those phrases were learned as ham.
>> But what makes it really ham is not just that it matches previous ham,
>> but it doesn't match previous spam.
>>
>> A word like Viagra for example would produce no score because it is in
>> both sets. However "cheapest viagra online" would match spam and not
>> match ham indicating it's spam.
> So what makes "cheapest Viagra online" a token, such that "cheapest" and
> "online" are not tokens?
>
>

They would all be tokens. Just pointing out one that would match spam 
and not match ham. "cheapest" and "online" would likely be in both sets 
and would be ignored.

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Monday 22 August 2016 at 16:34:09, Marc Perkel wrote:

> OK - Trying to make the really simple. Just talking about concept now.
> 
> Let's say I get an email where the subject is "I have aednocarsonoma of
> the lung".
> 
> Right off you know it's ham because spammers never use the word
> "aednocarsonoma" and normal people do. Spammer also never use:
> 
> "of the lung"
> "the lung"
> "aednocarsonoma of"

How do you create those boundaries to define the tokens?

> ....
> 
> So - tell me you follow this so far. Spammers don't spam about
> aednocarsonoma.
> 
> In this case I'm identifying ham because in some previous email people
> were talking about lung cancer and those phrases were learned as ham.
> But what makes it really ham is not just that it matches previous ham,
> but it doesn't match previous spam.
> 
> A word like Viagra for example would produce no score because it is in
> both sets. However "cheapest viagra online" would match spam and not
> match ham indicating it's spam.

So what makes "cheapest Viagra online" a token, such that "cheapest" and 
"online" are not tokens?


Antony.

-- 
The words "e pluribus unum" on the Great Seal of the United States are from a 
poem by Virgil entitled "Moretum", which is about cheese and garlic salad 
dressing.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.

OK - Trying to make the really simple. Just talking about concept now.

Let's say I get an email where the subject is "I have aednocarsonoma of 
the lung".

Right off you know it's ham because spammers never use the word 
"aednocarsonoma" and normal people do. Spammer also never use:

"of the lung"
"the lung"
"aednocarsonoma of"
....

So - tell me you follow this so far. Spammers don't spam about 
aednocarsonoma.

In this case I'm identifying ham because in some previous email people 
were talking about lung cancer and those phrases were learned as ham. 
But what makes it really ham is not just that it matches previous ham, 
but it doesn't match previous spam.

A word like Viagra for example would produce no score because it is in 
both sets. However "cheapest viagra online" would match spam and not 
match ham indicating it's spam.

The magic here is that this detects both spam and ham. And it is 
especially good at detecting ham, which greatly reduces false positives.

Re: Matching infinite sets

Posted by Shawn Bakhtiar <sh...@hotmail.com>.

On Aug 22, 2016, at 10:44 AM, Marc Perkel <su...@junkemailfilter.com>> wrote:

On 08/22/16 09:06, Dianne Skoll wrote:
On Mon, 22 Aug 2016 09:03:38 -0700
Marc Perkel <su...@junkemailfilter.com>> wrote:

The ones that are the same are of no interest. Only where it matches
one side and not the other.
But... but... that's exactly like Bayes if you throw out tokens whose
observed probability is not 0 or 1.

Also, in your list of tokens, they are all phrases ranging from 1 to 4 words,
and that's why you get good results. Multiword Bayes is just as good,
and I know that from experience.

This is nothing like bayes. Bayes is creating a mental block. When I describe it to people who don't know bayes they immediately get it. If I describe it to people who know bayes - they confuse it. Bayes is a probability spectrum based on a frequency match on both sets. That's not even close to what I'm doing.

I think you've copied and pasted this same paragraph half a dozen times now, and the list has tried it's best to accommodate your statement about "Bayes is creating a mental block", asking you pertinent questions that either remained un-answered, and/or when answered provided conflicting statements, and when pressed ended up showing that what you are doing is (at best) a slightly modified version.

However, I find the statement "When I describe it to people who don't know bayes they immediately get it" the most telling of them all. Of course people who don't know the probability theory will look at what you are doing and go "Wow!!! This is great!!" BECAUSE THEY DON'T KNOW.

People who know, obviously, recognize it for what it is, and you can claim as much as you like it's NOT, but at the end of they day, if it looks like a rose, smells like a rose (no matter what you call it) tis still rose!

All you have to do is READ the Process section of the following link to see exactly how similar your explanation is (save one factor which is using phrases vs. words), which has already been explained as a feature of SA using multi-word tokens:
https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering

Also - some of what I'm doing is all combinations, not just sequential. So it's like a system that writes and scores it's own rules. I just throw data at it and it classifies it.

The real magic is the feedback learning. So as it identifies ham it learns new words and phrases that then match email from other people. So it learns how normal people speak, it learns how spammers speak, and it identifies the DIFFERENCES between the two. And it's completely automated.

--
Marc Perkel - Sales/Support
support@junkemailfilter.com<ma...@junkemailfilter.com>
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by John Hardin <jh...@impsec.org>.

On Mon, 22 Aug 2016, Matus UHLAR - fantomas wrote:

>> > On Mon, 22 Aug 2016 09:03:38 -0700
>> > Marc Perkel <su...@junkemailfilter.com> wrote:
>
>> The real magic is the feedback learning. So as it identifies ham it learns 
>> new words and phrases that then match email from other people. So it learns 
>> how normal people speak, it learns how spammers speak, and it identifies 
>> the DIFFERENCES between the two. And it's completely automated.
>
> This it just the same as SA bayas with autolearning. However it will suffer
> the same issues and thus will require learning by other sources, either
> manual or other SA rules.

The restriction to probabilities 0 or 1 may mitigate the 
robot-off-the-rails syndrome to a degree.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Politicians never accuse you of "greed" for wanting other people's
   money, only for wanting to keep your own money.    -- Joseph Sobran
-----------------------------------------------------------------------
  2 days until the 1937th anniversary of the destruction of Pompeii

Re: Matching infinite sets

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 8/22/2016 11:40 AM, Matus UHLAR - fantomas wrote:
>>> On Mon, 22 Aug 2016 09:03:38 -0700
>>> Marc Perkel <su...@junkemailfilter.com> wrote:
>>>> The ones that are the same are of no interest. Only where it matches
>>>> one side and not the other.
>
>> On 08/22/16 09:06, Dianne Skoll wrote:
>>> But... but... that's exactly like Bayes if you throw out tokens whose
>>> observed probability is not 0 or 1.
>>>
>>> Also, in your list of tokens, they are all phrases ranging from 1 to
>>> 4 words,
>>> and that's why you get good results. Multiword Bayes is just as good,
>>> and I know that from experience.
>
> On 22.08.16 10:44, Marc Perkel wrote:
>> This is nothing like bayes. Bayes is creating a mental block.
>
> This is just like bayes.
> There are (only) a few differences between what you describe and bayes as
> implemented in SA, but it's still bayes-based.
>
>> When I describe it to people who don't know bayes they immediately get
>> it. If I describe it to people who know bayes - they confuse it. Bayes
>> is a probability spectrum based on a frequency match on both sets.
>> That's not even close to what I'm doing.
>
> Bayes uses probabilities between 0 and 1, while you only accept 0 and 1.
> You have just tweaked bayes, and I'm not even sure if towards better
> detection (i believe, towards worse)
>
>> Also - some of what I'm doing is all combinations, not just
>> sequential. So it's like a system that writes and scores it's own
>> rules. I just throw data at it and it classifies it.
>
> The main difference between bayes as implemented in SA is that you make
> multiword tokens. This is good, but you aren't even first one who proposed
> or did that. The second main difference is in the point above.
>
>> The real magic is the feedback learning. So as it identifies ham it
>> learns new words and phrases that then match email from other people.
>> So it learns how normal people speak, it learns how spammers speak,
>> and it identifies the DIFFERENCES between the two. And it's completely
>> automated.
>
> This it just the same as SA bayas with autolearning. However it will suffer
> the same issues and thus will require learning by other sources, either
> manual or other SA rules.
>

You see, Marc, this has circled around to exactly what I said last week.

The problem I have always had with SA and the Bays learner is that for 
it to work, it requires sources.   In SA it requires a source of spam to 
build tokens and (I guess) requires a source of ham to remove them.  In 
your system it requires a source of ham to build tokens and (I guess)
requires a source of spam to remove them.

But the fundamental problem with all of these is in getting the sources.

Getting spam is simple.   I merely review my email logs looking for 
spammers sending to non-existent e-mail addresses that have NEVER been 
on my server.  When I see a lot of the same attempts I then create a
honeypot email address using that.   Within a couple months I have
some of the highest quality spam available as spammers communicate the
"discovered" email address to each other.   All automatically done.

But, getting ham is HARD.   You have to convince users to give it to 
you.  And you cannot really trust users to do it without contaminating
their ham stream with spam they were too lazy to delete.   So I end up 
wasting a lot of time cleaning the ham before inputting it into SA.

This is why I have said before - and I will repeat it again - that if 
you have found a good way to convince your users to offer up cleaned
ham in an automatic fashion, that would be revolutionary.

It is NOT the back end that matters!!!!!!   That is easy.   I can hire
some programmers and math majors who have doctorates in set theory to
build that part of it, and they can probably do it in an afternoon and
then go out for pizza.

It is the front end that is hard!!!!   And its particularly hard when 
your interface is either IMAP or POP3.   Providing a webinterface that 
forces users to sort ham is somewhat easier but not not all users want a
webinterface.   I personally don't use one myself why would I expect my
users to do it?

You have repeatedly put down whatever user interface you have built by
referring to it as crude programming and you don't want to show it. 
But what you don't seem to get is that every scrap of user interface 
code out there is some of the crudest ugliest most icky and disgusting
code out there.

Users are people and people DO NOT logically interact with computers. 
They use a combination of sort-of-logic, rubbish they learned from some
other interface, and God-knows-what else to operate software interfaces.
So you can design the most elegant and cleanest interface in the world
with the most elegant code behind it and release it to the world and
God-help-you within 5 years that interface code will be so fugly that
you can only force newbie greehorn programmers who have no experience 
but are so desperate to work for you that they will do anything, to work
on it.  And eventually not even then, so you scrap it and release 
Windows 8 and start the cycle all over again.  ( If you think the 
Windows 10 user interface code is less ugly than 8 I have a bridge to 
sell you)

You should not be embarrassed about your ugly user interface code.   You
should be proud of the fact that you got it to work at all.  There's
plenty of commercial user interfaces that don't work at all (windows 8)

But, you don't want to show us your fugly user interface code that 
produces clean ham.   You just want to show us your elegant back-end 
code that digests clean ham.  Well, I already have a back end that eats
clean ham - maybe it don't work as good as yours - but if I replace my
back end with yours - I still have the same problem as before, I'm still
trying to find clean ham!!

So, congratulations Marc!   You are now no different than any other 
programmer out there!  You are an actual programmer now and have passed 
the litmus test because you just want to give us code we can't use and 
not the code we need! <eyeroll>

Ted

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: Matching infinite sets

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

>>On Mon, 22 Aug 2016 09:03:38 -0700
>>Marc Perkel <su...@junkemailfilter.com> wrote:
>>>The ones that are the same are of no interest. Only where it matches
>>>one side and not the other.

>On 08/22/16 09:06, Dianne Skoll wrote:
>>But... but... that's exactly like Bayes if you throw out tokens whose
>>observed probability is not 0 or 1.
>>
>>Also, in your list of tokens, they are all phrases ranging from 1 to 4 words,
>>and that's why you get good results.  Multiword Bayes is just as good,
>>and I know that from experience.

On 22.08.16 10:44, Marc Perkel wrote:
>This is nothing like bayes. Bayes is creating a mental block.

This is just like bayes.
There are (only) a few differences between what you describe and bayes as
implemented in SA, but it's still bayes-based.

> When I 
>describe it to people who don't know bayes they immediately get it. 
>If I describe it to people who know bayes - they confuse it. Bayes is 
>a probability spectrum based on a frequency match on both sets. 
>That's not even close to what I'm doing.

Bayes uses probabilities between 0 and 1, while you only accept 0 and 1. 

You have just tweaked bayes, and I'm not even sure if towards better
detection (i believe, towards worse)

>Also - some of what I'm doing is all combinations, not just 
>sequential. So it's like a system that writes and scores it's own 
>rules. I just throw data at it and it classifies it.

The main difference between bayes as implemented in SA is that you make
multiword tokens.  This is good, but you aren't even first one who proposed
or did that.  The second main difference is in the point above.

>The real magic is the feedback learning. So as it identifies ham it 
>learns new words and phrases that then match email from other people. 
>So it learns how normal people speak, it learns how spammers speak, 
>and it identifies the DIFFERENCES between the two. And it's 
>completely automated.

This it just the same as SA bayas with autolearning. However it will suffer
the same issues and thus will require learning by other sources, either
manual or other SA rules.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Mon, 22 Aug 2016 10:44:42 -0700
Marc Perkel <su...@junkemailfilter.com> wrote:

> This is nothing like bayes.

It's exactly like Bayes.  You're stumbling across a hacked version of
Bayes.  You seem to lack the mathematical background to see what you're
doing, thinking it's somehow fundamentally different.  But it's not.

> The real magic is the feedback learning.

Which is how Bayes works.

> So as it identifies ham it learns new words and phrases that then
> match email from other people.

Which is what Bayes does.

> So it learns how normal people speak, it learns how spammers speak,
> and it identifies the DIFFERENCES between the two. And it's
> completely automated.

You've just described Bayes.  Paul Graham used almost that exact language
14 years ago in his classic paper, http://www.paulgraham.com/spam.html
Check out this paragraph:

    I'm more hopeful about Bayesian filters, because they evolve with the
    spam. So as spammers start using "c0ck" instead of "cock" to evade
    simple-minded spam filters based on individual words, Bayesian filters
    automatically notice. Indeed, "c0ck" is far more damning evidence than
    "cock", and Bayesian filters know precisely how much more.

Regards,

Dianne.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.

On 08/22/16 09:06, Dianne Skoll wrote:
> On Mon, 22 Aug 2016 09:03:38 -0700
> Marc Perkel <su...@junkemailfilter.com> wrote:
>
>> The ones that are the same are of no interest. Only where it matches
>> one side and not the other.
> But... but... that's exactly like Bayes if you throw out tokens whose
> observed probability is not 0 or 1.
>
> Also, in your list of tokens, they are all phrases ranging from 1 to 4 words,
> and that's why you get good results.  Multiword Bayes is just as good,
> and I know that from experience.
>
>

This is nothing like bayes. Bayes is creating a mental block. When I 
describe it to people who don't know bayes they immediately get it. If I 
describe it to people who know bayes - they confuse it. Bayes is a 
probability spectrum based on a frequency match on both sets. That's not 
even close to what I'm doing.

Also - some of what I'm doing is all combinations, not just sequential. 
So it's like a system that writes and scores it's own rules. I just 
throw data at it and it classifies it.

The real magic is the feedback learning. So as it identifies ham it 
learns new words and phrases that then match email from other people. So 
it learns how normal people speak, it learns how spammers speak, and it 
identifies the DIFFERENCES between the two. And it's completely automated.

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Mon, 22 Aug 2016 09:03:38 -0700
Marc Perkel <su...@junkemailfilter.com> wrote:

> The ones that are the same are of no interest. Only where it matches
> one side and not the other.

But... but... that's exactly like Bayes if you throw out tokens whose
observed probability is not 0 or 1.

Also, in your list of tokens, they are all phrases ranging from 1 to 4 words,
and that's why you get good results.  Multiword Bayes is just as good,
and I know that from experience.

Regards,

Dianne.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.


On 08/22/16 07:40, Antony Stone wrote:
> On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:
>
>> On 08/22/16 07:28, Dianne Skoll wrote:
>>
>>> What percentage of emails using your algorithm are actually
>>> decidable?
>> Almost 100% if you look at a wide variety of tokens from multiple
>> attributes. Subject, body, content flags, header structure, combinations
>> of all domains reference, php scripts, name part of from addresses,
>> behavior flags.
> I would have said that a very large number of the words used in spam mails are
> the same as the words used in ham mails, so I suspect I'm confused about what
> constitutes a "token".

The ones that are the same are of no interest. Only where it matches one 
side and not the other.

>
> I fail to see how the "name part of from addresses" are unlikely to match ham,
> for example, since I see quite a lot of spam apparently from myself.
>
>
> Antony.
>

Some spammers have Viagra in the name part. The name part is very 
spammy. I also store to and from email addresses so that relationships 
between people corresponding create a ham result. (I filter outbound as 
well for some people)

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:

> On 08/22/16 07:28, Dianne Skoll wrote:
> 
> > What percentage of emails using your algorithm are actually
> > decidable?
> 
> Almost 100% if you look at a wide variety of tokens from multiple
> attributes. Subject, body, content flags, header structure, combinations
> of all domains reference, php scripts, name part of from addresses,
> behavior flags.

I would have said that a very large number of the words used in spam mails are 
the same as the words used in ham mails, so I suspect I'm confused about what 
constitutes a "token".

I fail to see how the "name part of from addresses" are unlikely to match ham, 
for example, since I see quite a lot of spam apparently from myself.

Antony.

-- 
Never automate fully anything that does not have a manual override capability. 
Never design anything that cannot work under degraded conditions in emergency.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Matching infinite sets

Posted by Shawn Bakhtiar <sh...@hotmail.com>.

> On Aug 22, 2016, at 8:09 AM, John Hardin <jh...@impsec.org> wrote:
> 
> On Mon, 22 Aug 2016, Antony Stone wrote:
> 
>> On Monday 22 August 2016 at 16:45:09, Dianne Skoll wrote:
>> 
>>> On Mon, 22 Aug 2016 07:34:00 -0700 Marc Perkel wrote:
>>>>> So.  What percentage of emails using your algorithm are actually
>>>>> decidable?
>>>> 
>>>> Almost 100% if you look at a wide variety of tokens from multiple
>>>> attributes.
>>> 
>>> I can't believe that, or I'm missing something.  Almost every spam I see
>>> contains words that also appear in ham.  Things like "this" or "invoice"
>>> or "regards" or "dear".
>>> 
>>> What am I missing?
>> 
>> I believe you're missing Marc's definition of "token".
> 
> ...and it looks like we're venturing into the "SA Bayes multiple-word token support" realm (as a surrogate).
> 

Even with the multiple tokens combined into one fingerprint, you've changed little. No matter how you bound the token, the assumption that there are not SPAM emails that contain HAM content, and vice versa is false. 

Regardless that is NOT what you claimed before, you seem to be flip-flopping between definitions to suite your argument.


> -- 
> John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
> jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
> key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>  USMC Rules of Gunfighting #6: If you can choose what to bring to a
>  gunfight, bring a long gun and a friend with a long gun.
> -----------------------------------------------------------------------
> 2 days until the 1937th anniversary of the destruction of Pompeii

Re: Matching infinite sets

Posted by John Hardin <jh...@impsec.org>.

On Mon, 22 Aug 2016, Antony Stone wrote:

> On Monday 22 August 2016 at 16:45:09, Dianne Skoll wrote:
>
>> On Mon, 22 Aug 2016 07:34:00 -0700 Marc Perkel wrote:
>>>> So.  What percentage of emails using your algorithm are actually
>>>> decidable?
>>>
>>> Almost 100% if you look at a wide variety of tokens from multiple
>>> attributes.
>>
>> I can't believe that, or I'm missing something.  Almost every spam I see
>> contains words that also appear in ham.  Things like "this" or "invoice"
>> or "regards" or "dear".
>>
>> What am I missing?
>
> I believe you're missing Marc's definition of "token".

...and it looks like we're venturing into the "SA Bayes multiple-word 
token support" realm (as a surrogate).

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   USMC Rules of Gunfighting #6: If you can choose what to bring to a
   gunfight, bring a long gun and a friend with a long gun.
-----------------------------------------------------------------------
  2 days until the 1937th anniversary of the destruction of Pompeii

Re: Matching infinite sets

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Monday 22 August 2016 at 16:45:09, Dianne Skoll wrote:

> On Mon, 22 Aug 2016 07:34:00 -0700 Marc Perkel wrote:
> > > So.  What percentage of emails using your algorithm are actually
> > > decidable?
> > 
> > Almost 100% if you look at a wide variety of tokens from multiple
> > attributes.
> 
> I can't believe that, or I'm missing something.  Almost every spam I see
> contains words that also appear in ham.  Things like "this" or "invoice"
> or "regards" or "dear".
> 
> What am I missing?

I believe you're missing Marc's definition of "token".


Antony.

-- 
Anyone that's normal doesn't really achieve much.

 - Mark Blair, Australian rocket engineer

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Mon, 22 Aug 2016 09:06:08 -0700
Marc Perkel <su...@junkemailfilter.com> wrote:

> Hi Dianne, what your missing are word combinations. Usually it's not
> a single word but a combination of words that trigger a result.

[snip]

So that's Bayes with multi-word tokens, throwing out tokens whose
probability is neither 0 nor 1.

Regards,

Dianne.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.

On 08/22/16 07:45, Dianne Skoll wrote:
> On Mon, 22 Aug 2016 07:34:00 -0700
> Marc Perkel <su...@junkemailfilter.com> wrote:
>
>>> So.  What percentage of emails using your algorithm are actually
>>> decidable?
>> Almost 100% if you look at a wide variety of tokens from multiple
>> attributes.
> I can't believe that, or I'm missing something.  Almost every spam I see
> contains words that also appear in ham.  Things like "this" or "invoice"
> or "regards" or "dear".
>
> What am I missing?
>
>

Hi Dianne, what your missing are word combinations. Usually it's not a 
single word but a combination of words that trigger a result.

      Example of how NOT matching works

Lets take 2 subject lines and see how this works.

Meet hot Russian Brides Online!
I read an article about Russian Brides in a magazine

A traditional spam filter using Bayesian or hard coded rules about 
Russian Brides might determine that only 1 out of 500 emails 
mentioning the phrase Russian Brides is a good email. Thus the second 
line would have points assessed against it in the classification process 
using these traditional methods.

Using the Evolution Filter the phrase Russian Brides is in both sets 
and therefore has no influence on the results. But the first subject 
matches these phrases in the Spam Only set.

Meet hot
Meet hot Russian
Meet hot Russian Brides
hot Russian Brides Online!
Russian Brides Online!
Brides Online!
Online!

The second subject matches these phrases on the ham only set that are 
never used on the spam set.

I read an article
read an article
read an article about
about Russian
an article about
in a magazine
Brides in a

So even though the phrase Russian Brides has no influence each subject 
hits either ham or spam many times where the same phrase was never used 
in the subject line in the opposite set. And the number of hits is 
significant enough just from these subjects to cause the fingerprints to 
be learned, and thats just looking at the Subject attribute. When this 
is combined with testing all attributes the messages usually come out 
strongly on one side or the other.

In rule based systems one would not normally build a white list rule to 
to allocate points based on seeing the phrase read an article about. 
Thats where the Evolution Filter is different. It didnt need to have 
that rule because since it is comparing to the infinite set of what is 
not matched on the other side, it dynamically create billions of rules 
automatically.

      [edit
      <http://wiki.junkemailfilter.com/index.php?title=The_Evolution_Spam_Filter&action=edit&section=6>]

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Mon, 22 Aug 2016 07:34:00 -0700
Marc Perkel <su...@junkemailfilter.com> wrote:

> > So.  What percentage of emails using your algorithm are actually
> > decidable?

> Almost 100% if you look at a wide variety of tokens from multiple 
> attributes.

I can't believe that, or I'm missing something.  Almost every spam I see
contains words that also appear in ham.  Things like "this" or "invoice"
or "regards" or "dear".

What am I missing?

Regards,

Dianne.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.


On 08/22/16 08:58, RW wrote:
> On Mon, 22 Aug 2016 07:34:00 -0700
> Marc Perkel wrote:
>
>> On 08/22/16 07:28, Dianne Skoll wrote:
>>> The other two possibilities (no tokens in either or some tokens in
>>> both) are undecidable.
>> Exactly!
> In the past you've said that when there are token in both you compare
> the counts.

I do a very little bit of that. I make additional sets I cal nearly-ham 
and nearly-spam where the ratio is very high, and count it as a half score.

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by RW <rw...@googlemail.com>.

On Mon, 22 Aug 2016 07:34:00 -0700
Marc Perkel wrote:

> On 08/22/16 07:28, Dianne Skoll wrote:

> > The other two possibilities (no tokens in either or some tokens in
> > both) are undecidable.  
> 
> Exactly!

In the past you've said that when there are token in both you compare
the counts.


On Wed, 17 Aug 2016 11:02:38 -0700
Marc Perkel wrote:

>  Here's the actual formula.
> 
> card(Test_message intersect Spam diff Ham) minus card(Test_message
> intersect Ham diff Spam)
> 


On Wed, 20 Jan 2016 08:52:05 -0800
Marc Perkel wrote:

> Then you do a set
> diff both ways (ham - spam) (spam - ham) and whichever side is bigger
> wins. Generally it will match on only one side or very predominately
> on one side.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.


On 08/22/16 07:28, Dianne Skoll wrote:
> On Mon, 22 Aug 2016 07:16:41 -0700
> Marc Perkel <su...@junkemailfilter.com> wrote:
>
>> Anthony, Yes - I don't store Set B. I store Set A. B is defined by
>> what's NOT in A. So I test A and if it's not matched it's set B. Set
>> B is just a negative match on A.
> Let me ask you a question.  As far as I understand your algorithm, if
> an email contains at least one token in the "ham" set and zero tokens in
> the "spam" set, you classify it as ham.  And conversely, if it contains
> at least one spam token but zero ham tokens, you classify it as spam.

YES! YES! YES!

Although I look at some thousand "fingerprints" to get a more 
significant result.

>
> The other two possibilities (no tokens in either or some tokens in both)
> are undecidable.

Exactly!

>
> So.  What percentage of emails using your algorithm are actually decidable?

Almost 100% if you look at a wide variety of tokens from multiple 
attributes. Subject, body, content flags, header structure, combinations 
of all domains reference, php scripts, name part of from addresses, 
behavior flags.

>
> Regards,
>
> Dianne.
>
>
>

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Mon, 22 Aug 2016 07:16:41 -0700
Marc Perkel <su...@junkemailfilter.com> wrote:

> Anthony, Yes - I don't store Set B. I store Set A. B is defined by 
> what's NOT in A. So I test A and if it's not matched it's set B. Set
> B is just a negative match on A.

Let me ask you a question.  As far as I understand your algorithm, if
an email contains at least one token in the "ham" set and zero tokens in
the "spam" set, you classify it as ham.  And conversely, if it contains
at least one spam token but zero ham tokens, you classify it as spam.

The other two possibilities (no tokens in either or some tokens in both)
are undecidable.

So.  What percentage of emails using your algorithm are actually decidable?

Regards,

Dianne.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.


On 08/22/16 06:55, Antony Stone wrote:
> On Monday 22 August 2016 at 15:46:41, Dianne Skoll wrote:
>
>> On Mon, 22 Aug 2016 06:04:49 -0700
>>
>> Marc Perkel <su...@junkemailfilter.com> wrote:
>>> Set A - a  finite set - has some members,
>>> Set B - an infinite set - is everything that is NOT in Set A
>> Set B is a very special case of an infinite set.  We're talking about
>> infinite sets in general.
>>
>> Also, you have to realize that although set B is in principle infinite,
>> in practice it is not.  Computers have finite memory, and although the
>> number of email tokens representable in the memory of a computer is very,
>> very, very large, it's not infinite.
> I do not think that Marc is proposing to actually store set B in a computer
> (or anywhere else).
>
> Set B is simply a theoretical construct, defined as the inverse of Set A, and
> to discover whether something is a member of it, you do not search through the
> infinite set B for a match, you instead check all members of finite set A for a
> non-match.
>
> If nothing in Set A matches X, then X is a member of Set B.
>
>
> Antony.
>

Anthony, Yes - I don't store Set B. I store Set A. B is defined by 
what's NOT in A. So I test A and if it's not matched it's set B. Set B 
is just a negative match on A.

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Monday 22 August 2016 at 15:46:41, Dianne Skoll wrote:

> On Mon, 22 Aug 2016 06:04:49 -0700
> 
> Marc Perkel <su...@junkemailfilter.com> wrote:
> > Set A - a  finite set - has some members,
> > Set B - an infinite set - is everything that is NOT in Set A
> 
> Set B is a very special case of an infinite set.  We're talking about
> infinite sets in general.
> 
> Also, you have to realize that although set B is in principle infinite,
> in practice it is not.  Computers have finite memory, and although the
> number of email tokens representable in the memory of a computer is very,
> very, very large, it's not infinite.

I do not think that Marc is proposing to actually store set B in a computer 
(or anywhere else).

Set B is simply a theoretical construct, defined as the inverse of Set A, and 
to discover whether something is a member of it, you do not search through the 
infinite set B for a match, you instead check all members of finite set A for a 
non-match.

If nothing in Set A matches X, then X is a member of Set B.

Antony.

-- 
I have an excellent memory.
I can't think of a single thing I've forgotten.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Mon, 22 Aug 2016 06:04:49 -0700
Marc Perkel <su...@junkemailfilter.com> wrote:

> Set A - a  finite set - has some members,
> Set B - and infinite set - is everything that is NOT in Set A

Set B is a very special case of an infinite set.  We're talking about
infinite sets in general.

Also, you have to realize that although set B is in principle infinte,
in practice it is not.  Computers have finite memory, and although the
number of email tokens representable in the memory of a computer is very,
very, very large, it's not infinite.

Regards,

Dianne.

Re: Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.

I'm confused by the confusion here.

Set A - a  finite set - has some members,
Set B - and infinite set - is everything that is NOT in Set A

So you match a test item to Set A and if it matches it's a member of A. 
If it doesn't match Set A it's a member of B.

How is this not really simple?

Re: Matching infinite sets

Posted by Michael Orlitzky <mi...@orlitzky.com>.

On 08/22/2016 09:02 AM, Joe Quinn wrote:
> On 8/22/2016 8:54 AM, Michael Orlitzky wrote:
>> On 08/21/2016 03:22 PM, Damian wrote:
>>> There is no such set B, as it would contain itself.
>> The empty set contains itself.
> That's an easy mistake to make. The empty set is {}, the set that
> contains only the empty set is {{}}. Sets are discrete elements that
> don't get "flattened".
> 
> In perl syntactic lists do get flattened though, which leads to some fun
> times. You can do silly things like @concatenated = (@listOne, @listTwo).

"Contains" in the context of sets means "is a superset of" =)

(I'm just being pedantic, I don't actually have a point.)

Re: Matching infinite sets

Posted by Joe Quinn <jq...@pccc.com>.

On 8/22/2016 8:54 AM, Michael Orlitzky wrote:
> On 08/21/2016 03:22 PM, Damian wrote:
>> There is no such set B, as it would contain itself.
> The empty set contains itself.
That's an easy mistake to make. The empty set is {}, the set that 
contains only the empty set is {{}}. Sets are discrete elements that 
don't get "flattened".

In perl syntactic lists do get flattened though, which leads to some fun 
times. You can do silly things like @concatenated = (@listOne, @listTwo).

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Mon, 22 Aug 2016 08:54:48 -0400
Michael Orlitzky <mi...@orlitzky.com> wrote:

> The empty set contains itself.

No, it doesn't.  By definition.

Regards,

Dianne.

Re: Matching infinite sets

Posted by Michael Orlitzky <mi...@orlitzky.com>.

On 08/21/2016 03:22 PM, Damian wrote:
>>
> There is no such set B, as it would contain itself.

The empty set contains itself.

Re: Matching infinite sets

Posted by Damian <sp...@arcsin.de>.


Am 21.08.2016 um 18:47 schrieb Marc Perkel:
> Actually - you can match an infinite set. And maybe this is what it's
> hard for some people to wrap their head around.
>
> Suppose set A contains 2 items, apples and oranges.
> So we define set B as everything in the universe that is not in set A.
> So set B is an infinite set, everything in the universe EXCEPT apples
> and oranges.
>
There is no such set B, as it would contain itself.
> Our first test set contain an orange - so it matches set A and not set B.
> Our second test set contains a cherry - so it doesn't match set A but
> it does match set B.
>
> When you have a method that matches against infinite sets to
> completely changes how you think about spam and ham detection.
>
> On 08/16/16 12:57, Shawn Bakhtiar wrote:
>>
>> /
>> /
>> /By they way, you can\u2019t match an infinite set (well theoretically but
>> not actually). /
>> /https://en.wikipedia.org/wiki/Intersection_(set_theory)/
>> <https://en.wikipedia.org/wiki/Intersection_%28set_theory%29>
>> /
>> /
>>
>
> -- 
> Marc Perkel - Sales/Support
> support@junkemailfilter.com
> http://www.junkemailfilter.com
> Junk Email Filter dot com
> 415-992-3400

Re: Matching infinite sets

Posted by Dianne Skoll <df...@roaringpenguin.com>.

On Sun, 21 Aug 2016 09:47:45 -0700
Marc Perkel <su...@junkemailfilter.com> wrote:

> So we define set B as everything in the universe that is not in set A.

That's a very specific kind of infinite set.  It's the complement of a finite set.

Try this one on for size:

Consider the set A of all positive integral powers of pi (pi, pi^2, pi^3, etc.)
That's clearly infinite.

Set B is every element x of A such that the googolth digit (that is,
the 10^100th digit) after the decimal point of the decimal expansion
of x is 7.

Good luck matching B.  It's not even clear to me whether B is infinite
or finite, though I suspect it's infinite.

There are also sets with an uncountable infinity of elements, such as
the real numbers, for which "matching" has little meaning.

Regards,

Dianne.

Matching infinite sets

Posted by Marc Perkel <su...@junkemailfilter.com>.

Actually - you can match an infinite set. And maybe this is what it's 
hard for some people to wrap their head around.

Suppose set A contains 2 items, apples and oranges.
So we define set B as everything in the universe that is not in set A.
So set B is an infinite set, everything in the universe EXCEPT apples 
and oranges.

Our first test set contain an orange - so it matches set A and not set B.
Our second test set contains a cherry - so it doesn't match set A but it 
does match set B.

When you have a method that matches against infinite sets to completely 
changes how you think about spam and ham detection.

On 08/16/16 12:57, Shawn Bakhtiar wrote:
>
> /
> /
> /By they way, you can\u2019t match an infinite set (well theoretically but 
> not actually). /
> /https://en.wikipedia.org/wiki/Intersection_(set_theory)/ 
> <https://en.wikipedia.org/wiki/Intersection_%28set_theory%29>
> /
> /
>

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Posted by Shawn Bakhtiar <sh...@hotmail.com>.

Marc,

Let me first say I am truly sorry to here about your cancer. I lost my father to cancer just over a decade ago, after a long battle with sarcoma of the throat and tongue. So I pray and wish you the best.

I sent this to you in January 2016 (don't recall if I ever got a reply to it) but based on your document:

Set theory is not my strongest suit, but your diagram looks incorrect:
http://www.junkemailfilter.com/patent/patent5.pdf

Let:

H be ham
S be spam
E be an email

Than you state that:
HE = (H u E)
SE = (S u E)

But than the next diagram shows that there is some solution in which (HE u SE) and thus there may be some set which is (HE / SE). Even though in the first diagram S and H do not intersect.

This is not logical. Either (H u S) in which there are tokens common to the ham and spam token sets, or it does not, so which is it?? in other words, if a token is both ham and spam how are you calculating it’s weight?? Is it spam or ham?

Clearly it’s the latter (they do not intersect) as described in this:
http://www.junkemailfilter.com/patent/patent2.pdf

In which case you are simply looking to see if (H u E) > (S u E) and has nothing to do with what is not in the set, and there is indeed no (H u S) or the negation or NOT which is (H / S), so as everyone has been trying to explain it has NOTHING to do with what is NOT matched.

By they way, you can’t match an infinite set (well theoretically but not actually).
https://en.wikipedia.org/wiki/Intersection_(set_theory)

Since the current Bayes learns both SPAM and HAM I imagine that it does a very similar thing, other than perhaps the larger multi word token sets, which seems a trivial thing to add, and available in other tool sets.

I'll only add this, if you believe that your SPAM has been greatly reduced. That's awesome! But have you really isolated it to this "new technique" or in playing around have you inadvertently changed something else that may have changed your results?

I am also not saying that you have not developed some "new technique", but that if you have, your description of it does not line up logically with the technique itself. Back in January you were looking to patent it, today you simply want it to live on. I suggest that if it is indeed the latter, than perhaps it's time to release the source code/scripts and let a few more eyes look at the logic to see exactly what is it doing, that you believe is so different than what is out there.

Again, I pray and hope the best for you,
Shawn

On Aug 16, 2016, at 6:45 AM, Marc Perkel <su...@junkemailfilter.com>> wrote:

Thanks for the encouragement Ted. Unfortunately I know way too much about mathematics and I have a deep understanding of probability spectrums. There's a curve and I'm going to be somewhere on it. If I'm lucky I might be here for some time. But my life is a casino right now. And yes - there is also a probability spectrum for any of us getting hit by a bus tomorrow as well. SpamAssassin is based on statistical probabilities.

I have to have a dual track strategy. One one hand I need to do what I can to move the curve into the future. But at the same time I need to accomplish thing that are important within a limited time slot as well.

Spam filtering isn't just another job to me. I actually have a passion for it. On a philosophical basis I look at the internet as the new nervous system for humanity and is now core to who we are as a species. And email is a very key technology in that nervous system.

In that context spam is like poison where predators suck some of the life out of humanity, and my real life has always been about the progress of the human race.

I am somewhat of a spam fighting savant. I actually run very little of my email through SpamAssassin, truth be told. Over the years I've thrown some ideas into the mix and sometimes they have been adopted to make SA better. Sometimes I just get shouted down by trolls and the ideas go no where.

At this point however there's a deadline and I have ideas that could be implemented in SA very very easily. In fact it was through SA that I discovered Redis, and SA already talks to redis.

Although my innovation is excellent as a programmer I'm mediocre. Never worked as a team. Easily frustrated. Probably somewhat autistic and somewhat arrogant. So mostly living in my own world doing my own development. I have my little online empire. I work from home. I make a great living. And I really like (most of) my customers and enjoy doing tech support. And it's allowed me a lot of free time to do things that I'm really interested in.

But my ideas are now my immortality, so I'm now releasing this to the world. And mostly this simple AI method that SA could easily implement.

This new spam filtering trick is not only extremely effective, it's extremely simple. I had it working in 2 days. The developers here could probably implement it in 1 day. (At least the core functionality) And with a team of better programmers probably do a better job and get a even better result than I get. In fact you don't need or even want my sloppy code (not in Perl). All you need is to read the description of how it works and once you get it - coding it is trivial.

So - this is an opportunity to milk the mind of the dying spam savant. It works, it's easy, and I'm just handing it to you all. There is no reason I would be making this up. All you all need to do is accept this gift.

On 08/16/16 01:03, Ted Mittelstaedt wrote:
Hi Marc,

Back in 1994 I was diagnosed with testicular cancer, it was essentially "stage 4" as it had metastasized throughout my body.

But, it responded to chemo and here I am today. In fact ironically
my original oncologist died a few years ago - on a fishing trip he had
an accident and drowned.

The Universe has an interesting sense of humor and likes to throw
curve balls. Take what you have been told about your "probability
spectrum" and toss it in the trash - hakuna matata. You could accidentally step in front of a bus tomorrow and be dead. You could
live another 20 years. Statistics on people only have meaning on
large groups of people - they are irrelevant when it comes to the
individual.

I've met a number of people who had serious cancers. And I learned
one thing from that. The people who survived - every one of them,
fighters. And everyone fights differently. Some get on the food bandwagon and try overdosing on green tea and every alleged anti-cancer food out there. Others jump into yoga, and I knew one guy who went out and binged watched Monty Python to spend as much time laughing as possible. Me, I fought on a more mental approach. I dropped everything in my life that I was not completely satisfied with - I turned my back on my job, my apartment, etc. - every burden or responsibility that I had which I didn't like and didn't really want - and dove into the treatment, and I never let myself believe I was in any danger of dying.

Of course, not all who fight, survive. But I will say with absolute
conviction that everyone I ever met who had a serious cancer and had
that "attitude of acceptance", later died. You are a fighter or you
wouldn't even be here. Now, fight to win.

Ted

--
Marc Perkel - Sales/Support
support@junkemailfilter.com<ma...@junkemailfilter.com>
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Posted by Marc Perkel <su...@junkemailfilter.com>.


On 08/16/16 15:22, Ted Mittelstaedt wrote:
>
> I read though the site, and here's why I probably couldn't implement it,
> at least not as it stands now.
>
> SpamAssassin basically depends on a diet of spam to feed the learner.
> The learner learns what is spam.  If you add some ham into the learner
> it works better - but the main thrust of it is feed me spam feed me spam.
>
> Your method depends on a diet of -ham- not spam because you are doing 
> the opposite of SA
>
> My problem as an admin is this.  I can guarantee that when a customer
> complains about a piece of junk, that what they give me is junk.
>
> But customers don't complain about ham.  So I'm not going to see it.
> And I cannot just iterate through all my customer mailboxes and
> assume they are all full of ham, because some of my customers are
> lazy and won't delete spam, or they don't read their mailbox for
> months at a time, etc. etc.  I cannot guarantee I'll get only ham
> by doing that - and so therfore I don't have a guaranteed source
> of ham.
>
> You said that your existing perl scripts are hacks and ugly.  But,
> I'm wagering that most of your ugly programming is user interface
> code that somehow coaxes your users to yield up a diet of ham.
>
> My problem is there is a tremendous dearth of user interface code
> out there to get EITHER spam or ham.
>
> The closest I have ever found is the mailwatch interface but that is
> god-awful complex.  I have it running on an ISP customer of mine's
> mailserver but God what a hack.
>
> Without that, all I can do is what I do now, which is make sure that
> all customers accessing my server with IMAP have a junk mail folder and
> know that if they drag spam into there that I'll suck it into the
> learner.  Of course, POP3 clients have nothing and I cannot tell
> some POP3 user "Oh if you really want to reduce your spam load then
> give up your POP3 email client and use this slick webinterface I have 
> setup for you to send and receive email."
>
> I'm actually not as interested in your engine as I am in how you get
> your customers to participate with it because if you have found a
> way to get 'em to do it, that is truly revolutionary.
>
> Mine would rather bitch and moan about spam and when they get it,
> just delete it - which while it puts it in a deleted folder that I
> can get at (if they are IMAP) it mixes it up with deleted ham, so
> I cannot take that mess of mixed unidentified spam and ham and use it 
> for anything.
>
> Ted

Hi Ted,

My system depends on a stream of both ham and spam creating a ham corpus 
and a spam corpus. I already had many rules in place (Not SA) to 
identify ham. Actually all you need is my RBL 
hostkarma.junkemailfilter.com with result 127.0.0.1 and the FcRDNS is 
good - there's your ham stream.

SA has a mindset of detecting spam. You have to change that to detecting 
spam and ham. Once you have streams going into the learner then you can 
not only increase spam detection, but you can positively identify good 
email as good and have almost no false positives. Then the output with 
strong scores are fed back into the learner where it learns how people 
who send ham speak and people who send spam speak. And it's very very 
effective. and I'm just giving it away.

Thanks for looking at it though.



-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 8/16/2016 6:45 AM, Marc Perkel wrote:
> Thanks for the encouragement Ted. Unfortunately I know way too much
> about mathematics and I have a deep understanding of probability
> spectrums. There's a curve and I'm going to be somewhere on it. If I'm
> lucky I might be here for some time. But my life is a casino right now.
> And yes - there is also a probability spectrum for any of us getting hit
> by a bus tomorrow as well. SpamAssassin is based on statistical
> probabilities.
>
> I have to have a dual track strategy. One one hand I need to do what I
> can to move the curve into the future. But at the same time I need to
> accomplish thing that are important within a limited time slot as well.
>
> Spam filtering isn't just another job to me. I actually have a passion
> for it. On a philosophical basis I look at the internet as the new
> nervous system for humanity and is now core to who we are as a species.
> And email is a very key technology in that nervous system.
>
> In that context spam is like poison where predators suck some of the
> life out of humanity, and my real life has always been about the
> progress of the human race.
>

I think you already have found a way to fight your cancer. :-)

> I am somewhat of a spam fighting savant. I actually run very little of
> my email through SpamAssassin, truth be told. Over the years I've thrown
> some ideas into the mix and sometimes they have been adopted to make SA
> better. Sometimes I just get shouted down by trolls and the ideas go no
> where.
>
> At this point however there's a deadline and I have ideas that could be
> implemented in SA very very easily. In fact it was through SA that I
> discovered Redis, and SA already talks to redis.
>
> Although my innovation is excellent as a programmer I'm mediocre. Never
> worked as a team. Easily frustrated. Probably somewhat autistic and
> somewhat arrogant. So mostly living in my own world doing my own
> development. I have my little online empire. I work from home. I make a
> great living. And I really like (most of) my customers and enjoy doing
> tech support. And it's allowed me a lot of free time to do things that
> I'm really interested in.
>
> But my ideas are now my immortality, so I'm now releasing this to the
> world. And mostly this simple AI method that SA could easily implement.
>
> This new spam filtering trick is not only extremely effective, it's
> extremely simple. I had it working in 2 days. The developers here could
> probably implement it in 1 day. (At least the core functionality) And
> with a team of better programmers probably do a better job and get a
> even better result than I get. In fact you don't need or even want my
> sloppy code (not in Perl). All you need is to read the description of
> how it works and once you get it - coding it is trivial.
>
> So - this is an opportunity to milk the mind of the dying spam savant.
> It works, it's easy, and I'm just handing it to you all. There is no
> reason I would be making this up. All you all need to do is accept this
> gift.
>

I read though the site, and here's why I probably couldn't implement it,
at least not as it stands now.

SpamAssassin basically depends on a diet of spam to feed the learner.
The learner learns what is spam.  If you add some ham into the learner
it works better - but the main thrust of it is feed me spam feed me spam.

Your method depends on a diet of -ham- not spam because you are doing 
the opposite of SA

My problem as an admin is this.  I can guarantee that when a customer
complains about a piece of junk, that what they give me is junk.

But customers don't complain about ham.  So I'm not going to see it.
And I cannot just iterate through all my customer mailboxes and
assume they are all full of ham, because some of my customers are
lazy and won't delete spam, or they don't read their mailbox for
months at a time, etc. etc.  I cannot guarantee I'll get only ham
by doing that - and so therfore I don't have a guaranteed source
of ham.

You said that your existing perl scripts are hacks and ugly.  But,
I'm wagering that most of your ugly programming is user interface
code that somehow coaxes your users to yield up a diet of ham.

My problem is there is a tremendous dearth of user interface code
out there to get EITHER spam or ham.

The closest I have ever found is the mailwatch interface but that is
god-awful complex.  I have it running on an ISP customer of mine's
mailserver but God what a hack.

Without that, all I can do is what I do now, which is make sure that
all customers accessing my server with IMAP have a junk mail folder and
know that if they drag spam into there that I'll suck it into the
learner.  Of course, POP3 clients have nothing and I cannot tell
some POP3 user "Oh if you really want to reduce your spam load then
give up your POP3 email client and use this slick webinterface I have 
setup for you to send and receive email."

I'm actually not as interested in your engine as I am in how you get
your customers to participate with it because if you have found a
way to get 'em to do it, that is truly revolutionary.

Mine would rather bitch and moan about spam and when they get it,
just delete it - which while it puts it in a deleted folder that I
can get at (if they are IMAP) it mixes it up with deleted ham, so
I cannot take that mess of mixed unidentified spam and ham and use it 
for anything.

Ted

>
> On 08/16/16 01:03, Ted Mittelstaedt wrote:
>> Hi Marc,
>>
>> Back in 1994 I was diagnosed with testicular cancer, it was
>> essentially "stage 4" as it had metastasized throughout my body.
>>
>> But, it responded to chemo and here I am today. In fact ironically
>> my original oncologist died a few years ago - on a fishing trip he had
>> an accident and drowned.
>>
>> The Universe has an interesting sense of humor and likes to throw
>> curve balls. Take what you have been told about your "probability
>> spectrum" and toss it in the trash - hakuna matata. You could
>> accidentally step in front of a bus tomorrow and be dead. You could
>> live another 20 years. Statistics on people only have meaning on
>> large groups of people - they are irrelevant when it comes to the
>> individual.
>>
>> I've met a number of people who had serious cancers. And I learned
>> one thing from that. The people who survived - every one of them,
>> fighters. And everyone fights differently. Some get on the food
>> bandwagon and try overdosing on green tea and every alleged
>> anti-cancer food out there. Others jump into yoga, and I knew one guy
>> who went out and binged watched Monty Python to spend as much time
>> laughing as possible. Me, I fought on a more mental approach. I
>> dropped everything in my life that I was not completely satisfied with
>> - I turned my back on my job, my apartment, etc. - every burden or
>> responsibility that I had which I didn't like and didn't really want -
>> and dove into the treatment, and I never let myself believe I was in
>> any danger of dying.
>>
>> Of course, not all who fight, survive. But I will say with absolute
>> conviction that everyone I ever met who had a serious cancer and had
>> that "attitude of acceptance", later died. You are a fighter or you
>> wouldn't even be here. Now, fight to win.
>>
>> Ted
>>
>>
>

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: I have some bad news

Posted by Marc Perkel <su...@junkemailfilter.com>.

Thanks for the encouragement Ted. Unfortunately I know way too much 
about mathematics and I have a deep understanding of probability 
spectrums. There's a curve and I'm going to be somewhere on it. If I'm 
lucky I might be here for some time. But my life is a casino right now. 
And yes - there is also a probability spectrum for any of us getting hit 
by a bus tomorrow as well. SpamAssassin is based on statistical 
probabilities.

I have to have a dual track strategy. One one hand I need to do what I 
can to move the curve into the future. But at the same time I need to 
accomplish thing that are important within a limited time slot as well.

Spam filtering isn't just another job to me. I actually have a passion 
for it. On a philosophical basis I look at the internet as the new 
nervous system for humanity and is now core to who we are as a species. 
And email is a very key technology in that nervous system.

In that context spam is like poison where predators suck some of the 
life out of humanity, and my real life has always been about the 
progress of the human race.

I am somewhat of a spam fighting savant. I actually run very little of 
my email through SpamAssassin, truth be told. Over the years I've thrown 
some ideas into the mix and sometimes they have been adopted to make SA 
better. Sometimes I just get shouted down by trolls and the ideas go no 
where.

At this point however there's a deadline and I have ideas that could be 
implemented in SA very very easily. In fact it was through SA that I 
discovered Redis, and SA already talks to redis.

Although my innovation is excellent as a programmer I'm mediocre. Never 
worked as a team. Easily frustrated. Probably somewhat autistic and 
somewhat arrogant. So mostly living in my own world doing my own 
development. I have my little online empire. I work from home. I make a 
great living. And I really like (most of) my customers and enjoy doing 
tech support. And it's allowed me a lot of free time to do things that 
I'm really interested in.

But my ideas are now my immortality, so I'm now releasing this to the 
world. And mostly this simple AI method that SA could easily implement.

This new spam filtering trick is not only extremely effective, it's 
extremely simple. I had it working in 2 days. The developers here could 
probably implement it in 1 day. (At least the core functionality) And 
with a team of better programmers probably do a better job and get a 
even better result than I get. In fact you don't need or even want my 
sloppy code (not in Perl). All you need is to read the description of 
how it works and once you get it - coding it is trivial.

So - this is an opportunity to milk the mind of the dying spam savant. 
It works, it's easy, and I'm just handing it to you all. There is no 
reason I would be making this up. All you all need to do is accept this 
gift.

On 08/16/16 01:03, Ted Mittelstaedt wrote:
> Hi Marc,
>
>   Back in 1994 I was diagnosed with testicular cancer, it was 
> essentially "stage 4" as it had metastasized throughout my body.
>
>   But, it responded to chemo and here I am today.  In fact ironically
> my original oncologist died a few years ago - on a fishing trip he had
> an accident and drowned.
>
>   The Universe has an interesting sense of humor and likes to throw
> curve balls.  Take what you have been told about your "probability
> spectrum" and toss it in the trash - hakuna matata.   You could 
> accidentally step in front of a bus tomorrow and be dead.   You could
> live another 20 years.   Statistics on people only have meaning on
> large groups of people - they are irrelevant when it comes to the
> individual.
>
>   I've met a number of people who had serious cancers.  And I learned
> one thing from that.   The people who survived - every one of them,
> fighters.  And everyone fights differently.  Some get on the food 
> bandwagon and try overdosing on green tea and every alleged 
> anti-cancer food out there.  Others jump into yoga, and I knew one guy 
> who went out and binged watched Monty Python to spend as much time 
> laughing as possible.  Me, I fought on a more mental approach.  I 
> dropped everything in my life that I was not completely satisfied with 
> - I turned my back on my job, my apartment, etc. - every burden or 
> responsibility that I had which I didn't like and didn't really want - 
> and dove into the treatment, and I never let myself believe I was in 
> any danger of dying.
>
>   Of course, not all who fight, survive.  But I will say with absolute
> conviction that everyone I ever met who had a serious cancer and had
> that "attitude of acceptance", later died.  You are a fighter or you
> wouldn't even be here.  Now, fight to win.
>
> Ted
>
>

-- 
Marc Perkel - Sales/Support
support@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: I have some bad news

Posted by Ted Mittelstaedt <te...@ipinc.net>.

Hi Marc,

   Back in 1994 I was diagnosed with testicular cancer, it was 
essentially "stage 4" as it had metastasized throughout my body.

   But, it responded to chemo and here I am today.  In fact ironically
my original oncologist died a few years ago - on a fishing trip he had
an accident and drowned.

   The Universe has an interesting sense of humor and likes to throw
curve balls.  Take what you have been told about your "probability
spectrum" and toss it in the trash - hakuna matata.   You could 
accidentally step in front of a bus tomorrow and be dead.   You could
live another 20 years.   Statistics on people only have meaning on
large groups of people - they are irrelevant when it comes to the
individual.

   I've met a number of people who had serious cancers.  And I learned
one thing from that.   The people who survived - every one of them,
fighters.  And everyone fights differently.  Some get on the food 
bandwagon and try overdosing on green tea and every alleged anti-cancer 
food out there.  Others jump into yoga, and I knew one guy who went out 
and binged watched Monty Python to spend as much time laughing as 
possible.  Me, I fought on a more mental approach.  I dropped everything 
in my life that I was not completely satisfied with - I turned my back 
on my job, my apartment, etc. - every burden or responsibility that I 
had which I didn't like and didn't really want - and dove into the 
treatment, and I never let myself believe I was in any danger of dying.

   Of course, not all who fight, survive.  But I will say with absolute
conviction that everyone I ever met who had a serious cancer and had
that "attitude of acceptance", later died.  You are a fighter or you
wouldn't even be here.  Now, fight to win.

Ted

On 8/15/2016 10:22 PM, Marc Perkel wrote:
> Well, this is kind of hard to say so just going to say it. I have stage
> 4 lung cancer and the probably spectrum is not good. I've been fighting
> spam for the last 15 years and I'd like to keep fighting spam from the
> grave. So I'm willing to share my technology with anyone interested.
>
> Several months ago I talked about a new trick I came up with to fight
> spam and also positively identify good email as good. I've been running
> it now for 7 months and it is a breakthrough. At the time I had intended
> to patent it just to get enough protection to license it to the big
> boys, but now it is unlikely I'll be around long enough for that. I have
> however noticed that because of my condition people are paying attention
> to me more now that there's a deadline.
>
> Here's my spam filtering trick. It's something that can be easily
> integrated into SpamAssassin. Being that my programming is somewhat
> sloppy at times it can probably be done even better than what I did. The
> thing to keep in mind when reading this is that it's not bayesian
> filtering. Many people in the spam filtering community make that
> mistake. This is done with set operations using Redis. Here's the link.
>
> http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter
>
> I'm still doing well for now and if not for this diagnosis I wouldn't
> know I was sick, And I want to get as much done in this window as
> possible. Since I live in Gilroy California I'm thinking I'd like to
> contact the spam filtering person at Google and let them continue to
> really develop what I started. So if someone could hook me up with the
> right person(s) there I would appreciate it. And I'm willing to work
> with anyone else that can make use of my work. (My way of cheating death.)
>
> Below is a letter I wrote to EFF staff where I used to work. It
> summarizes my situation. I'm still doing well considering.
>
>
> Hi Cindy,
>
> Hate to ruin your Monday morning but I have some bad news. I have stage
> 4 lung cancer and the odds are not with me. I'm slowly telling the world
> and realizing the the problem with having so many friends is that I'm
> making a lot of people very sad. And that is very difficult for me to do.
>
> I'm dealing with it about as well as can be expected, maybe a little
> better than that. My needs are covered for now, but dealing with rolling
> out the information. Please pass this email on to the staff there. I'm
> somewhat concerned about getting too much response at once. There is no
> specific time frame for me yet but stage 4 lung is almost always fatal
> and it's more likely months and not years.
>
> I have a lot of friends who are offering to take care of me. I have a
> paid for house, some savings, and I'm still doing well off my spam
> filtering business. I am going to be looking for someone to take over my
> small techno empire in the hopes of keeping my web sites and the people
> who I host for online. While I plan to put up a good fight if I get 2
> years that would be considered a win. Taking over my empire would be a
> great opportunity for the right person and I need to find someone to do
> that. I am unfortunately really good at what I do and might be tricky
> getting someone to take that over.
>
> I have lived a good life. I have done more than most people have done in
> 100 lifetimes. At the age of 60 I was already down to my last 1/4 tank
> so if I don't get the last 20 years I really have little to complain
> about. At this point my goals are to upload what's left of me to the
> web, which is the afterlife in my world. I have to finish up certain
> philosophical projects with my Church of Reality, which, interestingly
> enough might lead to a solution for the control problem for Artificial
> Intelligence. (Something I need to finish writing up.)
>
> Oddly enough the idea of being dead doesn't worry me. And that might be
> the denial speaking. However the process of getting there is going to be
> overwhelming. And it's been just a week since I found out. And I'm
> exploring the idea that there might even be an upside to being terminal.
> Maybe new opportunities will open up.
>
> I do want to say that working at EFF was some of the best times of my
> life and I really appreciate having had that opportunity. The internet
> is the new nervous system of humanity and is therefore sacred space, not
> just in a religious sense, but in a Reality based sense. To protect it
> is to protect the essence of humanity itself. The Internet is our common
> mind and it is the core of who we are as a human species. (Note to legal
> team, I think there is a legal argument opportunity in this statement.)
>
> A person's story is everything they do from the moment they are born to
> the moment they die. And then your story is the effect you had on
> advancing the evolution of life from what we were, to what we are, to
> what we will become. So my story will become part of the story of
> humanity, which is part of the story of life on this planet, and part of
> the story of the universe. And with the internet the essence of who I am
> and what makes my existence have meaning will be preserved.
>
> I have always believed that if a person decides to "own their story" and
> choose to live a life worth living that when they are faced with the end
> of their personal existence it would be much easier. And now that I am
> there I can say it is definitely true. I have not lived a perfect life
> and looking back there are quite a few things where I could have made a
> better choice. But at this point I'm feeling unusually positive about my
> situation as my last adventures unfold.
>
> While I have spent much of my life writing software for cyberspace I
> have also written quite a bit of software for meat space. This email is
> an example of that. Meat space is coded in ideas and philosophies and
> I'm hoping in the time I have left to see what else I can accomplish.
> Facing death definitely sharpens the mind so I'm going to take advantage
> of that.
>
> I suppose I'll wrap this up here as I can ramble on forever. And forever
> isn't as quite long as it used to be.
>
> Marc Perkel
> /root
>
> --
> Marc Perkel - Sales/Support
> support@junkemailfilter.com
> http://www.junkemailfilter.com
> Junk Email Filter dot com
> 415-992-3400
>

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: I have some bad news

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 6 Sep 2016, at 16:04, doark@mail.com wrote:

> On Mon, 05 Sep 2016 20:17:18 "Bill Cole" wrote:
>> On 4 Sep 2016, at 21:11, @lbutlr wrote:
>>
>>> On Sep 1, 2016, at 7:41 PM, David Niklas >
>>> &lt;[doark@mail.com](<ma...@mail.com>)&gt; wrote:
>>>>
>>>> Would you like to go out to lunch?
>>>
>>> Other than your message, that phrase does not appear in 7 years of 
>>> my
>>> mail.
>>
>> It's in hash-buster/bayes-buster parts in 5 messages in my spam 
>> corpus
>> spread over 4 years without other obvious commonalities (other than
>> their use of such tactics.)
>
> It was just an example to make a point. You would need to look at your
> cool database for a non-spamy string and place it in with an equally 
> spamy
> one to figure out if I have found a bug in your cool program.
>
> BTW: You never mentioned if anyone accepted your offer yet.

You seem to have me confused with Marc Perkel. I am not Marc Perkel. 
This should have been apparent from the attribution line you included in 
your message.

The point I was hoping others would infer is simply that different 
people get substantially different mail (ham and spam) which makes 
statistical approaches of all sorts increasingly ineffective as you 
increase the diversity of the recipient population. This latest FUSSP 
proposal is even more fragile to that sort of breakage because all it 
takes to completely burn a classifier token is a single appearance in 
both classes. As one grows a source corpus across a broad enough 
audience, the usable tokens trend inevitably towards zero while the 
remaining usable tokens are those which simply don't occur very often 
and so aren't operationally valuable.

Despite Mr. Perkel's extensive insistence to the contrary, his proposal 
does logically reduce to a variation on Bayesian filtering which avoids 
FPs at the cost of not being able to make any judgment at all on the 
actually difficult cases.

Re: I have some bad news

Posted by do...@mail.com.

On Mon, 05 Sep 2016 20:17:18 "Bill Cole" wrote:
> On 4 Sep 2016, at 21:11, @lbutlr wrote:
> 
> > On Sep 1, 2016, at 7:41 PM, David Niklas >
> > &lt;[doark@mail.com](<ma...@mail.com>)&gt; wrote: 
> >>  
> >> Would you like to go out to lunch?  
> >
> > Other than your message, that phrase does not appear in 7 years of my
> > mail.  
> 
> It's in hash-buster/bayes-buster parts in 5 messages in my spam corpus
> spread over 4 years without other obvious commonalities (other than
> their use of such tactics.)

It was just an example to make a point. You would need to look at your
cool database for a non-spamy string and place it in with an equally spamy
one to figure out if I have found a bug in your cool program.

BTW: You never mentioned if anyone accepted your offer yet.

Sincerely,
David

Re: I have some bad news

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 4 Sep 2016, at 21:11, @lbutlr wrote:

> On Sep 1, 2016, at 7:41 PM, David Niklas 
> &lt;[doark@mail.com](<ma...@mail.com>)&gt; wrote:
>
>>
>
>>
>> Would you like to go out to lunch?
>
> Other than your message, that phrase does not appear in 7 years of my 
> mail.


It's in hash-buster/bayes-buster parts in 5 messages in my spam corpus 
spread over 4 years without other obvious commonalities (other than 
their use of such tactics.)

Re: I have some bad news

Posted by Dave Warren <da...@hireahit.com>.

On Sun, Sep 4, 2016, at 18:11, @lbutlr wrote:
> On Sep 1, 2016, at 7:41 PM, David Niklas <do...@mail.com> wrote:
>>
>> Would you like to go out to lunch?
>
> Other than your message, that phrase does not appear in 7 years of
> my mail.

And? Replace the string with an example that does appear frequently in
ham. Or, a dozen examples that do, structured into a plausible
paragraph.

Re: I have some bad news

Posted by David Niklas <do...@mail.com>.

On Mon, 15 Aug 2016 22:22:47 -0700
Marc Perkel <su...@junkemailfilter.com> wrote:

> Well, this is kind of hard to say so just going to say it. I have stage
> 4 lung cancer and the probably spectrum is not good. I've been fighting
> spam for the last 15 years and I'd like to keep fighting spam from the
> grave. So I'm willing to share my technology with anyone interested.
> 
> Several months ago I talked about a new trick I came up with to fight
> spam and also positively identify good email as good. I've been running
> it now for 7 months and it is a breakthrough. At the time I had
> intended to patent it just to get enough protection to license it to
> the big boys, but now it is unlikely I'll be around long enough for
> that. I have however noticed that because of my condition people are
> paying attention to me more now that there's a deadline.
> 
> Here's my spam filtering trick. It's something that can be easily
> integrated into SpamAssassin. Being that my programming is somewhat
> sloppy at times it can probably be done even better than what I did.
> The thing to keep in mind when reading this is that it's not bayesian
> filtering. Many people in the spam filtering community make that
> mistake. This is done with set operations using Redis. Here's the link.
> 
> http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter
> 
> I'm still doing well for now and if not for this diagnosis I wouldn't
> know I was sick, And I want to get as much done in this window as
> possible. Since I live in Gilroy California I'm thinking I'd like to
> contact the spam filtering person at Google and let them continue to
> really develop what I started. So if someone could hook me up with the
> right person(s) there I would appreciate it. And I'm willing to work
> with anyone else that can make use of my work. (My way of cheating
> death.)
> 
> Below is a letter I wrote to EFF staff where I used to work. It
> summarizes my situation. I'm still doing well considering.
> 
> 
> Hi Cindy,
> 
> Hate to ruin your Monday morning but I have some bad news. I have stage
> 4 lung cancer and the odds are not with me. I'm slowly telling the
> world and realizing the the problem with having so many friends is that
> I'm making a lot of people very sad. And that is very difficult for me
> to do.
> 
> I'm dealing with it about as well as can be expected, maybe a little
> better than that. My needs are covered for now, but dealing with
> rolling out the information. Please pass this email on to the staff
> there. I'm somewhat concerned about getting too much response at once.
> There is no specific time frame for me yet but stage 4 lung is almost
> always fatal and it's more likely months and not years.
> 
> I have a lot of friends who are offering to take care of me. I have a
> paid for house, some savings, and I'm still doing well off my spam
> filtering business. I am going to be looking for someone to take over
> my small techno empire in the hopes of keeping my web sites and the
> people who I host for online. While I plan to put up a good fight if I
> get 2 years that would be considered a win. Taking over my empire would
> be a great opportunity for the right person and I need to find someone
> to do that. I am unfortunately really good at what I do and might be
> tricky getting someone to take that over.
> 
> I have lived a good life. I have done more than most people have done
> in 100 lifetimes. At the age of 60 I was already down to my last 1/4
> tank so if I don't get the last 20 years I really have little to
> complain about. At this point my goals are to upload what's left of me
> to the web, which is the afterlife in my world. I have to finish up
> certain philosophical projects with my Church of Reality, which,
> interestingly enough might lead to a solution for the control problem
> for Artificial Intelligence. (Something I need to finish writing up.)
> 
> Oddly enough the idea of being dead doesn't worry me. And that might be
> the denial speaking. However the process of getting there is going to
> be overwhelming. And it's been just a week since I found out. And I'm
> exploring the idea that there might even be an upside to being
> terminal. Maybe new opportunities will open up.
> 
> I do want to say that working at EFF was some of the best times of my
> life and I really appreciate having had that opportunity. The internet
> is the new nervous system of humanity and is therefore sacred space,
> not just in a religious sense, but in a Reality based sense. To protect
> it is to protect the essence of humanity itself. The Internet is our
> common mind and it is the core of who we are as a human species. (Note
> to legal team, I think there is a legal argument opportunity in this
> statement.)
> 
> A person's story is everything they do from the moment they are born to
> the moment they die. And then your story is the effect you had on
> advancing the evolution of life from what we were, to what we are, to
> what we will become. So my story will become part of the story of
> humanity, which is part of the story of life on this planet, and part
> of the story of the universe. And with the internet the essence of who
> I am and what makes my existence have meaning will be preserved.
> 
> I have always believed that if a person decides to "own their story"
> and choose to live a life worth living that when they are faced with
> the end of their personal existence it would be much easier. And now
> that I am there I can say it is definitely true. I have not lived a
> perfect life and looking back there are quite a few things where I
> could have made a better choice. But at this point I'm feeling
> unusually positive about my situation as my last adventures unfold.
> 
> While I have spent much of my life writing software for cyberspace I
> have also written quite a bit of software for meat space. This email is
> an example of that. Meat space is coded in ideas and philosophies and
> I'm hoping in the time I have left to see what else I can accomplish.
> Facing death definitely sharpens the mind so I'm going to take
> advantage of that.
> 
> I suppose I'll wrap this up here as I can ramble on forever. And
> forever isn't as quite long as it used to be.
> 
> Marc Perkel
> /root
> 
It seems that we lost the conversation in an OT discussion of set theory.
did anyone say that they would like to use this?

My thoughts on your idea are (have been, I remember your first
announcement of this):
1. It may take some time to cache on and
2. If it were deployed in a wide spread manner then spammers may try to
hijack it.

Example spam:
Would you like to go out to lunch?
Don't forget some virga href="http://example.com/spam"

Score: +5 for lunch. -5 for virga. Total Score: 0. NOT SPAM


Sincerely,
David

Re: I have some bad news

Posted by "@lbutlr" <kr...@kreme.com>.

On 15 Aug 2016, at 23:22, Marc Perkel <su...@junkemailfilter.com> wrote:
> Well, this is kind of hard to say so just going to say it. I have stage 4 lung cancer and the probably spectrum is not good. I've been fighting spam for the last 15 years and I'd like to keep fighting spam from the grave. So I'm willing to share my technology with anyone interested.

I encourage you to concentrate of fighting cancer right now, and while the prognosis for stage-4 anything is not good, it is neither certain. It appears that attitude does help, so pump yourself up to beat it.

Re: I have some bad news

Posted by Benny Pedersen <me...@junc.eu>.

On 2016-08-19 12:34, Ram wrote:

> And chill about spam. I know you have been great at contributions to
> anti-spam ( and we all remember your distinct hate of SPF :-) ).
> But antispam is just "commodity" technology.

sid-milter is at fault, i think more users now use pypolicyd-spf to get 
rid of sender-id :=)

> Links:
> ------
> [1]
> http://www.localhost.localdomain/foo/?utm_source=All-emp&utm_medium=Email-Disclaimer&utm_campaign=Weekly-Webinar-2

is this link better then squid.... ?

Re: I have some bad news

Posted by Ted Mittelstaedt <te...@ipinc.net>.

On 8/19/2016 3:34 AM, Ram wrote:
>
>
> Marc thats too bad. But stage 4 lung cancer does not mean you have to
> die of it.
> And chill about spam. I know you have been great at contributions to
> anti-spam ( and we all remember your distinct hate of SPF :-) ).
> But antispam is just "commodity" technology.
>
> Probably ML will take over antispam in the future and people would just
> subscribe to some good ML antispam service. Running your own antispam is
> too much of an attention grabbing task, and no one wants to put in so
> much time today
>

You must not have checked prices on antispam services lately or prices 
on mailboxes.  Just about everyone out there in the web hosting biz 
provides 10-20 free emailboxes (they have to, otherwise small businesses 
would switch to a competitor) and every antispam service out
there charges at least a buck a month per box.  (they have to otherwise 
they would go out of business)

In this environment people have no choice but to run their own antispam.

But you are right in that nobody (including the people running it) wants 
to put in time to doing it.   Do you -want- to clean your toilet?  Do 
you -have-the-money- to pay someone else to do it?

We are all always looking for better toilet-cleaning brushes.  If Marc
has invented a better one, people will want it!   But they won't go buy
a $200 toilet cleaning brush when they can go to the grocery store and
buy a plastic one for $5 that will last 20 years.

Ted

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: I have some bad news

Posted by Ram <ra...@netcore.co.in>.


On Tuesday 16 August 2016 10:52 AM, Marc Perkel wrote:
> Well, this is kind of hard to say so just going to say it. I have 
> stage 4 lung cancer and the probably spectrum is not good. I've been 
> fighting spam for the last 15 years and I'd like to keep fighting spam 
> from the grave. So I'm willing to share my technology with anyone 
> interested.
Marc thats too bad. But stage 4 lung cancer does not mean you have to 
die of it.
And chill about spam. I know you have been great at contributions to 
anti-spam ( and we all remember your distinct hate of SPF :-) ).
But antispam is just "commodity" technology.

Probably ML will take over antispam in the future and people would just 
subscribe to some good ML antispam service. Running your own antispam is 
too much of an attention grabbing task, and no one wants to put in so 
much time today





>
> Several months ago I talked about a new trick I came up with to fight 
> spam and also positively identify good email as good. I've been 
> running it now for 7 months and it is a breakthrough. At the time I 
> had intended to patent it just to get enough protection to license it 
> to the big boys, but now it is unlikely I'll be around long enough for 
> that. I have however noticed that because of my condition people are 
> paying attention to me more now that there's a deadline.
>
> Here's my spam filtering trick. It's something that can be easily 
> integrated into SpamAssassin. Being that my programming is somewhat 
> sloppy at times it can probably be done even better than what I did. 
> The thing to keep in mind when reading this is that it's not bayesian 
> filtering. Many people in the spam filtering community make that 
> mistake. This is done with set operations using Redis. Here's the link.
>
> http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter
>
> I'm still doing well for now and if not for this diagnosis I wouldn't 
> know I was sick, And I want to get as much done in this window as 
> possible. Since I live in Gilroy California I'm thinking I'd like to 
> contact the spam filtering person at Google and let them continue to 
> really develop what I started. So if someone could hook me up with the 
> right person(s) there I would appreciate it. And I'm willing to work 
> with anyone else that can make use of my work. (My way of cheating 
> death.)
>
> Below is a letter I wrote to EFF staff where I used to work. It 
> summarizes my situation. I'm still doing well considering.
>
>
> Hi Cindy,
>
> Hate to ruin your Monday morning but I have some bad news. I have 
> stage 4 lung cancer and the odds are not with me. I'm slowly telling 
> the world and realizing the the problem with having so many friends is 
> that I'm making a lot of people very sad. And that is very difficult 
> for me to do.
>
> I'm dealing with it about as well as can be expected, maybe a little 
> better than that. My needs are covered for now, but dealing with 
> rolling out the information. Please pass this email on to the staff 
> there. I'm somewhat concerned about getting too much response at once. 
> There is no specific time frame for me yet but stage 4 lung is almost 
> always fatal and it's more likely months and not years.
>
> I have a lot of friends who are offering to take care of me. I have a 
> paid for house, some savings, and I'm still doing well off my spam 
> filtering business. I am going to be looking for someone to take over 
> my small techno empire in the hopes of keeping my web sites and the 
> people who I host for online. While I plan to put up a good fight if I 
> get 2 years that would be considered a win. Taking over my empire 
> would be a great opportunity for the right person and I need to find 
> someone to do that. I am unfortunately really good at what I do and 
> might be tricky getting someone to take that over.
>
> I have lived a good life. I have done more than most people have done 
> in 100 lifetimes. At the age of 60 I was already down to my last 1/4 
> tank so if I don't get the last 20 years I really have little to 
> complain about. At this point my goals are to upload what's left of me 
> to the web, which is the afterlife in my world. I have to finish up 
> certain philosophical projects with my Church of Reality, which, 
> interestingly enough might lead to a solution for the control problem 
> for Artificial Intelligence. (Something I need to finish writing up.)
>
> Oddly enough the idea of being dead doesn't worry me. And that might 
> be the denial speaking. However the process of getting there is going 
> to be overwhelming. And it's been just a week since I found out. And 
> I'm exploring the idea that there might even be an upside to being 
> terminal. Maybe new opportunities will open up.
>
> I do want to say that working at EFF was some of the best times of my 
> life and I really appreciate having had that opportunity. The internet 
> is the new nervous system of humanity and is therefore sacred space, 
> not just in a religious sense, but in a Reality based sense. To 
> protect it is to protect the essence of humanity itself. The Internet 
> is our common mind and it is the core of who we are as a human 
> species. (Note to legal team, I think there is a legal argument 
> opportunity in this statement.)
>
> A person's story is everything they do from the moment they are born 
> to the moment they die. And then your story is the effect you had on 
> advancing the evolution of life from what we were, to what we are, to 
> what we will become. So my story will become part of the story of 
> humanity, which is part of the story of life on this planet, and part 
> of the story of the universe. And with the internet the essence of who 
> I am and what makes my existence have meaning will be preserved.
>
> I have always believed that if a person decides to "own their story" 
> and choose to live a life worth living that when they are faced with 
> the end of their personal existence it would be much easier. And now 
> that I am there I can say it is definitely true. I have not lived a 
> perfect life and looking back there are quite a few things where I 
> could have made a better choice. But at this point I'm feeling 
> unusually positive about my situation as my last adventures unfold.
>
> While I have spent much of my life writing software for cyberspace I 
> have also written quite a bit of software for meat space. This email 
> is an example of that. Meat space is coded in ideas and philosophies 
> and I'm hoping in the time I have left to see what else I can 
> accomplish. Facing death definitely sharpens the mind so I'm going to 
> take advantage of that.
>
> I suppose I'll wrap this up here as I can ramble on forever. And 
> forever isn't as quite long as it used to be.
>
> Marc Perkel
> /root
>
> -- 
> Marc Perkel - Sales/Support
> support@junkemailfilter.com
> http://www.junkemailfilter.com
> Junk Email Filter dot com
> 415-992-3400
>


http://www.netcoremarketingcloud.com/marketing-automation-webinar-2016/?utm_source=All-emp&utm_medium=Email-Disclaimer&utm_campaign=Weekly-Webinar-2