Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Dr Gates was right, or how I learned to stop worrying and love the spam
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Dr Gates was right, or how I learned to stop worrying and love the spam
Data MiningPredictive Analytics

Dr Gates was right, or how I learned to stop worrying and love the spam

DavidMSmith
DavidMSmith
6 Min Read
SHARE

In 2004 Microsoft founder (and honorary doctorate recipient) Bill Gates confidently stated that “Spam will soon be a thing of the past.” It’s now five years later (Gates suggested the problem would be solved in two), and spam is now 95% of all emails sent. Nonetheless, I think Gates was mostly right in principle even if the timeline was optimistic. A decade ago, when email spam was a real problem, I took care not to let my email address be displayed in public. Spammers had a habit of scraping email addresses from web-sites, with automated robots crawling the web looking…

In 2004 Microsoft founder (and honorary doctorate recipient) Bill Gates confidently stated that "Spam will soon be a thing of the past." It's now five years later (Gates suggested the problem would be solved in two), and spam is now 95% of all emails sent. Nonetheless, I think Gates was mostly right in principle even if the timeline was optimistic.

A decade ago, when email spam was a real problem, I took care not to let my email address be displayed in public. Spammers had a habit of scraping email addresses from web-sites, with automated robots crawling the web looking for any text containing the @-symbol. Despite my efforts, I had to abandon a couple of email addresses after they got added to the mailing lists traded between spammers, and the noise overwhelmed the signal in my inbox.

More Read

Open Source Analytics Reaches Main Street (and Some Other Trends in Analytics)
How are Supply Chain Executives dealing with today’s…
Experts, Fortune-tellers and Bookmakers: Zero Points!
Next Generation Warranty Systemsv
Three Ways to Get Your Predictive Models Deployed
That was before the advent of good spam filters, though, which have greatly improved in the last couple of years. I now use Google Mail for all my mail, which has excellent spam-filtering technology. Even my non-Google addresses are forwarded to a gmail account, which I can rely on to filter the crap so that I can see the emails I actually care about.

I started my current job about 9 months ago now, and I made a conscious decision to stop worrying about spam and let my email address — david@revolution-computing.com — be free. It's linked directly on every page of this blog and on the REvolution Computing website, and I don't hesitate to include it in other public venues. It's been out there long enough to be picked up by robots and web searches, so it's probably time to evaluate the results. I'd say it's a success, and I'm very glad I took the plunge. I maybe get 2 spam emails a week in my Google Mail account (faithfully tucked away in my Spam folder), and better yet I don't think I've lost any legitimate mail to the spam filter. (So if you've emailed me and I haven't replied, I have only myself to blame. My apologies – I do get a lot of legitimate email.) I don't use any other email services so I can't speak to the performance of their spam filters, but I'm happy with my results.

So what changed between 2004 and now? My guess is that it's mainly been the transition to web-based email services. Statisticians have attempted to solve the spam problem before with predictive models, but the results were never that great at the time. The problem was likely twofold: it's a highly asymmetrical problem, where a false positive is a much bigger problem than a false negative, but too many false negatives mean the filter isn't really useful in practice. Secondly, I think the corpus was simply too small: a few hundred thousand emails, or even all the emails for all the employees of a largish company with a central email server, simply isn't going to result in a filter that gives a clean inbox while not trashing any legitimate mail sent to a broad community of users.

Web-based email certainly solves the corpus-size problem, but there's one additional detail that I expect makes it work. The defining feature of spam is that a spam email is sent to lots and lots of people and a web-based email service like Google Mail can easily see when a duplicate email is sent to lots and lots of users at the same time. Spammers have attempted various tricks to make that process more difficult — converting text to images, or adding random text to each mail to make it harder to detect duplicates — but Google seems to have largely overcome these hurdles.

So then, is the spam problem solved? At a technical level, clearly not — spam still consumes a tremendous amount of bandwidth and costs billions of dollars to contain — but at the personal level it's hardly more than a minor irritant these days. (And if it's not for you, consider a new email service.) For individuals, the real spam problem these days lies in other venues: social networking spam, blog spam, link farms, and so on. Mr Gates, when can we expect solutions to those problems? 

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

Hidden AI, a risk?
Hidden AI, Real Risk: A Governance Roadmap For Mid-Market Organizations
Artificial Intelligence Exclusive Infographic
unusual trading activity
Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
Analytics Exclusive Infographic
Ai agents
AI Agent Trends Shaping Data-Driven Businesses
Artificial Intelligence Exclusive Infographic
Why Businesses Are Using Data to Rethink Office Operations
Why Businesses Are Using Data to Rethink Office Operations
Big Data Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Answers to the Most Frequently Asked PAW Questions

4 Min Read

D.E. Shaw Article

0 Min Read

Hiring a Data Scientist? Machine Intelligence Can Help

7 Min Read
AI-driven SEO
Big DataData Mining

How Data Mining Tools Break Through Misconceptions To Optimize SEO

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?