You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2008/03/03 11:05:24 UTC

[Spamassassin Wiki] Update of "ExperimentalTheoretical" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/ExperimentalTheoretical

The comment on the change is:
move Marc's page to something

------------------------------------------------------------------------------
= Experimental and Theoretical ways to get rid of Spam =

- This page is for the development of ideas and projects to move SpamAssassin and related spam fighting projects forward. The idea here is to write about ideas and projects the represent the cutting edge ideas that are in development related to spam fighting.
+ This page is for the development of ideas and projects that are in development related to spam fighting, not necessarily related to or part of the SpamAssassin project.

- == New Black/White/Yellow List Technologies ==
+ * MarcPerkelsExperiments: details of some projects being worked on by Marc Perkel

- The following articles written by Marc Perkel describe experimental technologies used at [http://www.junkemailfilter.com Junk Email Filter]. I am writing this in the hopes that other people will pick up on these ideas and improve them. Although it is working very well for us so far, it can be improved and expanded. It is my hope that this will inspire others to work on this and make it far better than what we have developed. Information about our Host Karma lists can be found [http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists here]
-
- We are all familiar with black lists. Black lists are DNS lists that where IP addresses of spammer are looked up. You send a request to the list with the IP and if it is in the blacklist then you bounce the email. Sounds simple enough except for the false positives.
-
- White lists help reduce false positives. If an IP address is white listed then you can just pass it and not look at black lists.
-
- Yellow lists are lists of hosts that send a mix of spam and non-spam. Email services like Yahoo, Gmail, and Hotmail are examples of mixed source servers. These servers should never be either black or white listed. These are servers where the IP address yields no useful information as to whether the message is spam or ham.
-
- === Multi-Color List Processing Logic ===
-
- When looking at lists from a black/white/yellow perspective there is an order to the list processing. First the yellow lists are tested. If the message is yellow listed then checking black and white lists isn't necessary. The the white lists are checked. If the message is white listed it can be passed without having to run it through SA or SA can short circuit the tests and declare the message ham. Then the black lists are checked and if it is in a trusted blacklist or several less trusted blacklists the message can be rejected.
-
- === Reducing Lookups by using multiple return codes to indicate the result ===
-
- Many DNS lists return and code to indicate yes and nothing to indicate no. This was fine for only black list lookups but if you are looking up multiple states you would have to make a separate DNS call for each one.
- {{{
- yellowlist.junkemailfilter.com
- whitelist.junkemailfilter.com
- blacklist.junkemailfilter.com
- }}}
- But why do three lookups when you can do one? That's the way the HostKarma list works. It returns a different value to indicate black/white/yellow.
- {{{
- Lookup format for IP 1.2.3.4
- dig 4.3.2.1.hostkarma.junkemailfilter.com
-
- 127.0.0.1 white
- 127.0.0.2 black
- 127.0.0.3 yellow
- }}}
-
- === Forward Confirmed rDNS (FCrDNS) ===
-
- Forward Confirmed reverse DNS is an important concept for the ideas I'm about to introduce below. One of the ways to separate spam from ham is to find things that spammers can't spoof. One of those things is forward confirmed rDNS.
-
- Reverse DNS is straight forward. An IP has a PTR record so that when you look up the IP it returns a name associated with the IP. Unfortunately a spammer can put any name they want in a PTR record. But when that name is looked up, it points somewhere else.
-
- Forward Confirmed rDNS means that when the name returned by a PTR lookup is checked it will point back to the original IP address that was looked up.
- {{{
- 1.2.3.4 -> mail.mydomain.com
- mail.mydomain.com -> 1.2.3.4
- }}}
- Spammers can't spoof this because even though PTR can be faked the spammer can't create an A record to point back to the original IP because they don't have control over the faked domain.
-
- FCrDNS is very reliable for detecting white and yellow domains. It can also be used for black domains but generally spammers aren't that stable. So this is mostly for actively detecting ham and avoiding false positives than detecting spam.
-
- Is your host name DNS set up correctly? Here's a [http://ipadmin.junkemailfilter.com/rdns.php FCrDNS checking tool] to test it.
-
- === Host Name based Lists based on FCrDNS host names ===
-
- In addition to IP based black/white/yellow lists the [http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists HostKarma lists] also contain host names that are also listed. A host name lookup would look like this:
- {{{
- dig mydomain.com.hostkarma.junkemailfilter.com
- }}}
- The FCrDNS of the sending host name (which can't be spoofed) is looked up in the same database that is used for IP based lookups. The Junk Email Filter HostKarma DNS list supports name based lookups as well as IP based lookups. If the lookup succeeds the IP base tests need not be done. The name based test are actually more powerful than IP based tests for white and yellow listed servers where IP based lookups are better for black listed hosts.
-
- For example, if the FCrDNS of the sending host resolves to yahoo.com then no other DNS tests need to be done. Yahoo is neither a certified spam or ham domain and once the name returns yellow no other lookups need be done. The message then can go to content testing to figure out if it is spam or ham. If you get a message from your bank, like Wells Fargo Bank, and you see that the sending host is mail.wellsfargo.com, then it will look up as white listed and the message can be declared ham without any other testing.
-
- === Using name base lookups to build IP based lookup lists ===
-
- The white and yellow lists at HostKarma IP entries are driven by the name based lookups. A message is received and the host is looked up and verified. The host name is then looked up and if the name is found the IP address is added to the list. Thus if the name is whitelisted the IP address is sent to the DNS server and it is added as a white listed IP. That make it available to the world for those admins who can only do IP based lookups. The advantage of this is that one does not need to know the IP addresses of all of yahoo's servers. These lists can be created dynamically as Yahoo sends out email.
-
- === Expanding the Project ===
-
- If this system were implemented on a more massive scale then it would be both more accurate and more comprehensive. Although black lists are more dynamic because spammers are constantly on the move, white and yellow lists are very stable. It would be fairly easy to accurately list 95% of the world's servers that should be either white or yellow listed. If this were done then it would greatly reduce false positives and drastically reduce IP lookups to black lists and cut processing time for good email. I am therefore inviting the smart people in the spam filtering community to pick up on the idea that works great for us and make it massively bigger that it is.
-
- Here's what needs to be done.
-
- 1. Need to massively expand the list of names for white, black, and yellow hosts. With enough people participating this could be automated. Hosts that send only spam has they host name black listed. Hosts that sent only ham have their host name white listed. Hosts that send a mixture have their host names yellow listed.
-
- 2. Once we have a good name based list then through automation of a few big players once a name is found then the IP associated with that name is automatically added to the IP lists. So whenever the name Yahoo.com is confirmed as yellow listed the IP address is sent in to be yellow listed as well. Thus all IPs for Yahoo servers would be listed with enough participants and the list would follow IP changes Yahoo makes.
-
- == Other Kinds of DNS Lists ==
-
- There are a number of other DNS lists that would be useful for fighting spam besides just black/white/yellow lists. This DNS lists can be black lists for content strings that work the same way as URI blacklists. Or it can be lists that provide information about classifications of hosts that have a significant characteristic in detecting spam.
-
- === Registrar Barrier List ===
-
- We at Junk Email Filter host a [http://wiki.junkemailfilter.com/index.php/Registrar_Barrier_DNS_List Registrar Barrier List]. This list returns a code to indicate where the registrar barrier is so you can separate the domain par out of the host name.
- {{{
- dig example.com.rb.junkemailfilter.com - returns 127.0.0.1
- dig example.co.uk.rb.junkemailfilter.com - returns 127.0.0.2
- }}}
-
- === Freemail Domain List ===
-
- This list would be a lookup to determine if the sending host is a free email provider like Yahoo, Gmail, or Hotmail. This information is often useful in determining how to process messages. Often, for example, phishers will have a different from address than the reply to because the from address is often shut down by the reply to address still works. When the from address and the reply to address doesn't match and both addresses are from freemail providers it's almost always a phishing scam. I intend to post a freemail DNS list soon.
-
- === Phone Number and Email Address Lookup Lists ===
-
- We have all seen the success if URI blacklist so spammers can't link to web sites without getting caught. But spammers also put phone numbers and email addresses within messages. These are necessary so that you can contact the spammer and they can get your money. If we also added email addresses and phone numbers to the URI blacklists then we can target them as well. In the case of email addresses these can also be reported to free mail vendors so that they can eliminate these accounts.
-
- == Isolating end users from port 25 and SMTP protocol ==
-
- The biggest source of spam is computer viruses and computer viruses spread primarily through email. If we could isolate computer viruses so that they can't spread then we can eliminate the main source of spam as will as bot armies that can be used for DDOS attacks, hacking, and other types of criminal activities.
-
- One of the problems with SMTP is that end users send email to servers the same way that servers send email to each other. Thus a server can't easily tell if it is getting email from another server or a virus infected spam bot. What I think we need to do is by default take port 25 away from the consumer and make port 25 only a server to server protocol.
-
- Taking port 25 away from consumers does have free speech implications and it needs to be done right. People should be allowed to run email servers from home. But the vast majority of computer ignorant end users need protection from predators. Therefore it would seem reasonable to have port 25 blocked by default but allow the user to open it up if they choose to do so. This could be done as an HTTP menu built into DSL/Cable modems. It would be blocked by default and can be unblocked by the user. The idea being that if you're smart enough to set up a home email server you are smart enough to unblock port 25.
-
- With port 25 blocked a virus can no longer communicate with SMTP servers and send email. Thus these kind of viruses can't spread. In time these computers will either be fixed or replaced and the virus bot armies go away.
-
- === Switching to the Submission Port 587 ===
-
- Current technology allows email to be sent to servers on port 587 as well as port 25. Even without authentication this would be a huge improvement because there is an advantage to having consumers use a different port to send email out that for servers to talk to each other.
-
- Suppose for example ISPs had two SMTP servers. (Actually one server listening on two different IP addresses is how it would likely be implemented.) one is the outgoing server that listens only on port 587 and receives email from consumers to be sent. The other is the incoming mail server that listens only on port 25 for incoming email to be delivered to mailboxes for consumers to pick up with POP or IMAP.
-
- In this configuration consumers would only be able to send outgoing email to outgoing servers. These servers would be able to deliver it to the incoming servers, which makes it available for pickup. Spam bots can't talk to incoming servers as they are blocked on port 25. Spam bots also can't talk to incoming servers because they aren't listening on port 25. And spam bots can't send on outgoing servers because they would require a password and the virus wouldn't know the password.
- {{{
- Sender -- port 587 --> Outgoing SMTP Server -- port 25 --> Incoming SMTP Server -- port 143 IMAP --> Recipient
- }}}
-
- === Using IMAP as an outgoing transport to SMTP server ===
-
- The IMAP/POP protocols could be rewritten to include an outgoing email transport. This would allow outgoing email to be transported back over the same channel as incoming email. Because the outgoing email goes out over the same connection there is no need to configure outgoing email separately. Once IMAP is configured for incoming mail, it can be used for outgoing email. This cuts the setup time in half.
-
- On the server side the IMAP server receives the outgoing email and hands it off to the outgoing SMTP server which delivers the message normally.
-
- This method also has the advantage that the email sender has to have the ability to read email for the account that they are sending through. Policy restrictions can be used so that the sender can not be overridden by the consumer. So if you receive mail at me@mydomain.com then your outgoing mail would be the same.
-
- Adding an outgoing transport to IMAP would have the same function as using SMTP port 587 submission but could be easier to set up. This is something that would be in addition to existing methods and would eventually take over through evolution.
-
- === ISP Consumer Firewalls ===
-
- As stated above end users need protecting from predators. If ISPs provided firewalls / NAT by default to end users through the Cable/DSL modems it would eliminate incoming hacker traffic. So even if the end user were running a hot unpatched copy of Windows they would be protected from becoming a hazard to the Internet.
-
- === DNS based admin lookup ===
-
- There should be some way to look up the admin of any IP address so that you can get a message to them that there are spammers in their IP space. I think this should be done with some sort of DNS TXT lookup. But if people who know about problems can communicate with those who can fix the problem then the problem might get fixed sooner.
-
- === ISPs can use dns black lists to find virus infected computers owned by customers ===
-
- One thing ISPs could do and it's a mystery that they don't is download IP blacklists and look to see if any IP addresses in their IP space is listed. Then they would instantly know where trouble is and they could block off port 25 to the virus infected end users and let them know they need to get their computer fixed. I have a list of ovler a million IP addresses that are computers currently infected or have been in the last 4 days. ISPs could use my list to shut it down at the source. Anyone interested in that should cantact me at [mailto:support@junkemailfilter.com support@junkemailfilter.com].
-
-
- == Anti Spam Educational Video for End Users ==
-
- Spam is about money. It is driven by people taking advantage of people who for some reason are fooled by spam. People who understand that all spam is fraud never fall for it. What I worked for the [http://www.eff.org Electronic Frontier Foundation] I remember one staff meeting we talked about a spammer who went to jail for fraud relating to selling penis enlargement products through spam. This guy had made millions by the time he went to jail. I joked that maybe I was on the wrong side of the spam wars. I also sometimes watch the TV show Judge Judy where someone gets a spam and gets a check in the mail and sends back the money before the check bounces.
-
- People really do fall for this stuff and if these spammers weren't making money they wouldn't do it. The problem is driven by consumer ignorance. So how do we fix that? We educate the consumer.
-
- I suggest the production of many YouTube grade videos in many languages that educate end users about the dangers of spam. One might start with a picture of Africa. The narrator says, "this is Africa. Nobody here is going to transfer 56 million dollars into your account." Then they put up a map of the UK and say, "This is the United Kingdom. There is no UK Lottery. And you have not won."
-
- The videos would be fairly short, under 10 minutes, and after these videos are produced then we can encourage ISPs to require the user to watch a video as part of the process of signing up for a new account. The more people are educated about spam scams the less profitable spam becomes. And when spam is less profitable there will be less spam.
-
- I'm looking for as many people as possible to just do it and post the videos. Maybe we can make a contest out of it and some of us in the spam filtering business can kick in some bucks (or euros) for a prize for the winners? I think a good set of educational videos about hot to not get ripped off could lead to a measurable reduction in spam.
-
-
- End of Marc Perkel section. I'm putting my name in this because I tend to write in the first person and wanted to clarify it. Also if someone wants to work with me to develop these ideas feel free to [mailto:support@junkemailfilter.com contact me]. Some people might not agree with everything I posted here and I may be wrong and there may be better ideas out there. But this was meant to stimulate thought in the hopes that some part of this rant will lead to the future development of Spam Assassin and related spam fighting tools.
-
- ----
-
- Post your innovative ideas here.
-