Last time I explained what spam is and where it does come from. Also where spammers go for collecting email addresses. This time I'm going to show you how spam has evolved and what there is, you can do to prevent being plagued by it.
How do you say? "Nothing is more steady than change" Also spam has changed over the years, or more accurately: it has evolved. As soon as a new technique form the anti-spam frontier starts to be effective, spam changes its face in order to escape the new methods.
In the very beginning, the internet was a small, hardly known network with only a handful of participants. Mostly they knew each other. Spam was literally inexistent. If you got some junk mail repeatedly from somebody, you simply told them to stop it and that was it. In the early nineties the internet slowly opened to the broad public. It became easier and affordable to get access to this steadily growing, worldwide network. This growth lead to anonymity and with anonymity spam started.
The first incarnations of spam were easy to be detected automatically. Very often friend@public.com was used as either sender or recipient address in headers. Soon the first programs were available to automate sending spam. To make tracing spam a more difficult and confusing task, these programs used to fake headers. Obviously the programmers of these first tools checked their own email inbox to find out what headers there are in internet email and how they look like. Among others, you can find there the X-UIDL: header if you use a POP3 mailreader. However, this header is added by the POP3 reader when you collect your mail from your provider and thus is never seen when transporting mail. The same goes for the X-PMFlags: header, which is added by Pegasus Mail. Some spamming programs fortunately made the mistake to add one or even both of these headers to mail they sent out. This made it easy to accurately flag spam mail and filter it out before it even got to the users mailbox. Of course, later versions of such programs fixed this.
Some people and organizations started to keep publicly accessible lists of sender addresses known to be used by spammers and lists of providers known to be used as spam relays. The problem with these lists, however, is the fact that addresses and hosts are sometimes added by mistake or due to a one time configuration mistake etc. Once you are on one of these lists, nobody using them can receive mail from you any more; once a provider is on such a list, nobody using these lists can receive any mail going through that provider. So even regular mail will be blocked. On top of that, providers of free email accounts, like Yahoo, Hotmail and what they are all called, are popping up all over the place. So a spammer does not need to send from his own account any more. He can simply register a new account every day and send his spam of the day from that account. These days a spammer can change hie email address much faster than these lists like MAPS-RBL etc. can be updated. Furthermore such lists are bound to grow too big sooner or later.
A successful move against spam was the change from open to closed mail relays. In earlier times access to the internet was slow and expensive. Not every site could afford to have their mailserver accessible from the internet all the time, not even commercial establishments. Email was sent and transported to as close to the destination as possible. There it was spooled for later delivery and when the mailserver, which should be the final destination, established one of its periodic connections to the internet, mail was delivered out of the spool. It was custom that most of the mailservers allowed relaying, the accepting and passing on of email for other destinations. For spammers this had the huge advantage that one single copy of the message could be sent to any one mailserver together with a huge list of recipients. This could be done in a couple of seconds. It was then the responsibility of the contacted mailserver to deliver the message to all the recipients. So the mailserver had to do all the work and use its own bandwidth, not the spammer. Mainly due to the spamming problem these days mailservers are usually configured to deny any relaying; they only accept mail for and from the domain or domains they are responsible for. As the Internet Mail Consortium UBE-RELAY survey states, in January 1998 over 50 % of SMTP servers allowed relaying, in August 2002 this number was down to 0.4 %. For a spammer this means much increased cost and time, as often messages must now be sent one by one, unless one of the still existing open relays can be found. But with ever faster and cheaper internet access also this hindrance becomes less and less effective. Just take today's ADSL flatrate accounts. With these, anti-relay settings are not a real show-stopper any more.
Various countries have started to make spamming illegal, usually reasoning that the recipient of spam has to bear most of the cost involved, not the sender. But it is not possible to really prohibit spamming. If a disclaimer is required to escape illegality, then a spammer will simply add that disclaimer, no big deal. Or just send the mail from a foreign account. After all, the internet is a worldwide network and you can travel to just about any country within split seconds, all from your home. Probably laws will turn out not to be very effective in fighting spam.
Filtering email is not so reliable either. A filter only knows two states: spam or not spam. It can be annoying if a filter makes a mistake and flags a spam email as non spam. This is called a 'False-Negative'. But what if it makes a mistake at the other end of the scale, if it flags a non spam message as spam and possibly even deletes it? Depending on the content and importance of such a message, this can be anywhere between unimportant and disaster. Setting filters correctly is impossible, but even setting them acceptably can be a daunting task. For example, scanning for the character combination 'sex' would also mistakenly find all the messages which talk about 'Swissexchange', the swiss stock exchange. Simply not acceptable if you handle mail for one of the largest swiss banking groups. Similarly, filtering 'anal' would also ring alarm bells over here in Switzerland for 'Banalitaet' (banality) or 'Vertriebskanal' (sales channel). These are called 'False-Positives'. Depending on the amount and type of email you receive, 100 false-negatives per 1 false-positive can be absolutely acceptable. For somebody else this might be an absolutely wrong ratio and for yet another person false-positives are completely unacceptable.
Another method for reducing spam, which was quite useful for some time, is hashing. When using this method, a hash value is calculated from the content of an email. Exactly the same email message will always yield the same hash value, chances that two email messages result in the same hash value are extremely slim. Now, centrally a publicly accessible database is maintained with hash values of known spam messages. After the hash value for the mail you just received has been calculated, a check is done to see whether the value is in the database. If it is, the message is flagged as spam. One program which uses this method is Vipul's Razor. In the meantime, however, spammers have started to individualize their spam. Somewhere in the message they will add your name or email address or any other unique text. As they are often forced to send every message separately anyway, due to anti-relay settings of most mailservers, this additional individualizing of messages is no big deal, and if it helps to avoid being caught by hashing spam filters, it is well worth the additional processing.
There are also some peculiar methods of spam prevention in use. A lawyer in California, for example, has copyrighted a small poem. If you use this poem without permission, you can be sued. Her filtering system lets pass any message as clean, which contains a certain header with this poem. If a spammer really should include the header with the poem in order to get through her filter and the source of the spam can be traced, then a law suit is very likely and most probably it won't be too cheap for the spammer.
Timo Salmi, a University professor of Vaasa University in Finland, not unknown to many computer enthusiasts, has developed his own system. Email from unknown senders is not delivered to his mailbox. Instead the sender gets a return message explaining the system and containing a password which must be used to in order to deliver mail to Timo's mailbox. His mailbox will most probably be really free of spam, as spammers usually don't care about any return mails. But this system has a huge drawback. Any regular mail correspondent will, at first, be confused, maybe annoyed. And surely the requirement to use a password for delivering mail makes sending email a lot more troublesome.
A new and very promising method to automatically flag spam in email is based on statistics and has recently been described in a well received paper by Paul Graham. This method makes a list of all the words in an email and counts how often they appear. The words with the top count are rated. This approach takes advantage of the fact, that spam mails usually want to sell something and thus contain only a very narrow set of words. Regular emails consist of a far wider range of words and different words. From the number and type of the counted words in an email the probability of spam is calculated. All this may sound a little strange at first, but, in fact, it does work surprisingly well. The drawback is that for each user a separate database with good and bad words needs to be created and each user needs to train the system for his or her particular requirements. In the beginning no spam will be detected at all and the user must manually pass each spam mail to the system for training it. But very soon invested time will start to pay off. Like all the other systems too, this method is not perfect and will never be, but it does yield really good results very quickly and as it is trained by each person separately, the results are tailored to each person. A very promising utility using this method is SpamProbe.
Still, the absolutely most reliable method for filtering out unwanted mail is to have a secretary who will check your mail first.
Your first line of defense should be to try to keep your email address our of spammers address lists.
Should you receive spam, even after all the precaution, there are some rules recommended to follow.
Hopefully I could give you some useful info on how to fight, or even better, avoid spam. Not having to deal with spam does save a lot of time; whenever the spammers develop new methods to circumvent the latest anti-spam measures, the time I need to go through my own email starts increasing.
| UBE-RELAY survey | http://www.imc.org/ube-relay.html |
| Vipul's Razor | http://razor.sourceforge.net/ |
| Timo Salmi's system | http://www.uwasa.fi/~ts/info/spamfoil.html |
| Paul Graham - A Plan for Spam | http://www.paulgraham.com/spam.html |
| SpamProbe | http://spamprobe.sourceforge.net/ |
| SpamCon foundation | http://www.spamcon.org/ |
| Hoaxbusters | http://hoaxbusters.ciac.org/ |
| PINBOARD | http://www.pinboard.com/ |
| HighTechSamurai | http://kurt.www.pinboard.com/ |