A Statistical Approach to the Spam Problem
To date, the software using this approach is based on one word per token. Other approaches are possible, such as building a hash table of phrases. It is expected that the math described here can be employed in those contexts as well, and there is reason to believe that phrase-based systems will have performance advantages, although there is controversy about that idea. Future Linux Journal articles can be expected to cover any developments in such directions. CRM114 (see Resources) is an example of a phrase-based system that has performed very well, but at the time of this writing it hasn't been directly tested against other approaches on the same corpus. (At the time of this writing, CRM114 is using the Bayesian chain rule to combine p(w)s.)
The techniques described here have been used in projects such as Spambayes and Bogofilter to improve performance of the spam-filtering task significantly. Future developments, which may include integrating these calculations with a phrase-based approach, can be expected to achieve even better performance.
Gary Robinson is CEO of Transpose, LLC (www.transpose.com), a company specializing in internet trust and reputation solutions. He has worked in the field of collaborative filtering since 1985. His personal weblog, which frequently covers spam-related developments, is radio.weblogs.com/0101454, and he can be contacted at firstname.lastname@example.org.
|December Daily Giveaways are Back!||Dec 01, 2015|
|December 2015 Video Preview||Nov 30, 2015|
|Take Control of Your PC with UEFI Secure Boot||Nov 30, 2015|
|Geek Hide-away in Guatemala - Stay for Free!||Nov 26, 2015|
|Microsoft and Linux: True Romance or Toxic Love?||Nov 25, 2015|
|Non-Linux FOSS: Install Windows? Yeah, Open Source Can Do That.||Nov 24, 2015|
- Take Control of Your PC with UEFI Secure Boot
- Cipher Security: How to harden TLS and SSH
- Perceptions of the Linux OS Among Undergraduate System Administrators
- Non-Linux FOSS: Install Windows? Yeah, Open Source Can Do That.
- December Daily Giveaways are Back!
- Simple Photo Editing, Linux Edition!
- Firefox's New Feature for Tighter Security
- Web Stores Held Hostage
- Microsoft and Linux: True Romance or Toxic Love?
- PuppetLabs Introduces Application Orchestration