Validate an E-Mail Address with PHP, the Right Way
The Internet Engineering Task Force (IETF) document, RFC 3696, “Application Techniques for Checking and Transformation of Names” by John Klensin, gives several valid e-mail addresses that are rejected by many PHP validation routines. The addresses: Abc\@def@example.com, customer/department=shipping@example.com and !def!xyz%abc@example.com are all valid. One of the more popular regular expressions found in the literature rejects all of them:
"^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)
↪*(\.[a-z]{2,3})$"
This regular expression allows only the underscore (_) and hyphen (-) characters, numbers and lowercase alphabetic characters. Even assuming a preprocessing step that converts uppercase alphabetic characters to lowercase, the expression rejects addresses with valid characters, such as the slash (/), equal sign (=), exclamation point (!) and percent (%). The expression also requires that the highest-level domain component has only two or three characters, thus rejecting valid domains, such as .museum.
Another favorite regular expression solution is the following:
"^[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$"
This regular expression rejects all the valid examples in the preceding paragraph. It does have the grace to allow uppercase alphabetic characters, and it doesn't make the error of assuming a high-level domain name has only two or three characters. It allows invalid domain names, such as example..com.
Listing 1 shows an example from PHP Dev Shed (www.devshed.com/c/a/PHP/Email-Address-Verification-with-PHP/2). The code contains (at least) three errors. First, it fails to recognize many valid e-mail address characters, such as percent (%). Second, it splits the e-mail address into user name and domain parts at the at sign (@). E-mail addresses that contain a quoted at sign, such as Abc\@def@example.com will break this code. Third, it fails to check for host address DNS records. Hosts with a type A DNS entry will accept e-mail and may not necessarily publish a type MX entry. I'm not picking on the author at PHP Dev Shed. More than 100 reviewers gave this a four-out-of-five-star rating.
Listing 1. An Incorrect E-mail Validation
function checkEmail($email) {
if(preg_match("/^([a-zA-Z0-9])+([a-zA-Z0-9\._-])
↪*@([a-zA-Z0-9_-])+([a-zA-Z0-9\._-]+)+$/",
$email)){
list($username,$domain)=split('@',$email);
if(!checkdnsrr($domain,'MX')) {
return false;
}
return true;
}
return false;
One of the better solutions comes from Dave Child's blog at ILoveJackDaniel's (ilovejackdaniels.com), shown in Listing 2 (www.ilovejackdaniels.com/php/email-address-validation). Not only does Dave love good-old American whiskey, he also did some homework, read RFC 2822 and recognized the true range of characters valid in an e-mail user name. About 50 people have commented on this solution at the site, including a few corrections that have been incorporated into the original solution. The only major flaw in the code collectively developed at ILoveJackDaniel's is that it fails to allow for quoted characters, such as \@, in the user name. It will reject an address with more than one at sign, so that it does not get tripped up splitting the user name and domain parts using explode("@", $email). A subjective criticism is that the code expends a lot of effort checking the length of each component of the domain portion—effort better spent simply trying a domain lookup. Others might appreciate the due diligence paid to checking the domain before executing a DNS lookup on the network.
Listing 2. A Better Example from ILoveJackDaniel's
function check_email_address($email) {
// First, we check that there's one @ symbol,
// and that the lengths are right.
if (!ereg("^[^@]{1,64}@[^@]{1,255}$", $email)) {
// Email invalid because wrong number of characters
// in one section or wrong number of @ symbols.
return false;
}
// Split it into sections to make life easier
$email_array = explode("@", $email);
$local_array = explode(".", $email_array[0]);
for ($i = 0; $i < sizeof($local_array); $i++) {
if
(!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&
↪'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$",
$local_array[$i])) {
return false;
}
}
// Check if domain is IP. If not,
// it should be valid domain name
if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) {
$domain_array = explode(".", $email_array[1]);
if (sizeof($domain_array) < 2) {
return false; // Not enough parts to domain
}
for ($i = 0; $i < sizeof($domain_array); $i++) {
if
(!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|
↪([A-Za-z0-9]+))$",
$domain_array[$i])) {
return false;
}
}
}
return true;
}
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- RSS Feeds
- Trying to Tame the Tablet
- New Products
- What's the tweeting protocol?
- Dart: a New Web Programming Experience
- Drupal is an Awesome CMS and a Crappy development framework
1 hour 39 min ago - IT industry leaders
4 hours 1 min ago - Reply to comment | Linux Journal
20 hours 49 min ago - Reply to comment | Linux Journal
23 hours 22 min ago - Reply to comment | Linux Journal
1 day 39 min ago - great post
1 day 1 hour ago - Google Docs
1 day 1 hour ago - Reply to comment | Linux Journal
1 day 6 hours ago - Reply to comment | Linux Journal
1 day 7 hours ago - Web Hosting IQ
1 day 8 hours ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
Great article, just a slight fix
This is terrific.
There is some sort of typo in the part of the code in Listing 9 where you check the A and MX DNS records, which make this break as written.
Changing:
if ($isValid && !(checkdnsrr($domain,"MX") ||
↪checkdnsrr($domain,"A")))
To:
if ($isValid && !((checkdnsrr($domain,"MX")) ||
(checkdnsrr($domain,"A"))))
seems to make it work.
your fix works for me too
Thanks for the awesome script!
I ran into the same error with that line, and your fix made it work for me too!
Your format validation code
Your format validation code will inappropriately permit an all numeric TLD.
“There is an additional rule that essentially requires that top-level domain names not be all- numeric.“ - RFC 3696 - 2
http://SimonSlick.com/VEAF/ValidateEmailAddressFormat.html
Sure your DNS lookup up would fail, but what is the point of validating the format if you are just going to do a DNS lookup anyway for a domain name that should have already been deemed invalid by the format validation code.
Format validation and existence verification (DNS lookup) serve different purposes, and just because a domain name does not exist does not mean the format is not valid.
There are so many holes in your code, whoever paid you for this write-up is highly deserving of a total refund. If you are going to title such an article as "... the Right Way", you could at least do it the Right Way.
The code at http://SimonSlick.com/VEAF/ValidateEmailAddressFormat.html is actually better, and even includes code for verifying actual existence of an eMailbox.
simonslick.com/veaf is busted
The code at simonslick.com is wrong -- it does not seem to match the RFC at all. Just try the examples given in this article as well as more common cases like:
foo+bar@example.com
foo%bar@example.com
foo <bar@example.com>
(foo) bar@example.com
Working Code & Extensive Regular Expressions
Working Code with Extensive use of Regular Expressions for validating email address format.
Check it out and see if you can find any faults.
http://SimonSlick.com/VEAF/ValidateEmailAddressFormat.html
Email address validation head-to-head
Yes, there are some faults with the Simon Slick code. It's also worth pointing out that both Simon Slick and Doug Lovell's code is copyright All Rights Reserved. You can't use it in your project.
I've written about some public-domain validation functions here: http://www.dominicsayers.com/isemail/
The Simon Slick code fails on some of the examples in RFC3696.
As some of these comments have pointed out, there are a lot of RFCs that cover this ground. For what it's worth, I believe my function complies with RFCs 1123, 2396, 3696, 4291, 4343, 5321 & 5322.
RFC Compliance
RFC Compliance
Backslash is not an RFC compliant component of an non-quoted email address local-part. May have been in the past, but not anymore, and has not be since the publication of RFC 2822 (2001). Move on folks.
This is also reinforced by RFC 3696 (2004).
http://tools.ietf.org/html/rfc3696
3. Restrictions on email addresses
Without quotes, local-parts may consist of any combination of
alphabetic characters, digits, or any of the special characters
! # $ % & ' * + - / = ? ^ _ ` . { | } ~
period (".") may also appear, but may not be used to start or end the
local part, nor may two or more consecutive periods appear. Stated
differently, any ASCII graphic (printing) character other than the
at-sign ("@"), backslash, double quote, comma, or square brackets may
appear without quoting. If any of that list of excluded characters
are to appear, they must be quoted.
Also see the RFC3696 Errata
http://www.rfc-editor.org/cgi-bin/errataSearch.pl?rfc=3696
These are not RFC compliant:
Fred\ Bloggs@example.com
Joe.\\Blow@example.com
And should have read as:
"Fred\ Bloggs"@example.com
"Joe.\\Blow"@example.com
Also, "the upper limit on address lengths (local-part@domain-part) should normally be considered to be 256."
And as someone already alluded to, the domain name is now, for quite some time I might add, allowed to begin with a digit.
You need to update your code, test data and this article.
RFC Compliance
Also the quoted string check appears would allow null (x00). According to RFC 2822 3.2.5. Quoted strings and 3.2.1. Primitive Tokens the permitted NO-WS-CTL characters are x01-x08, x0B, x0E-x1F, x7F. This does not include the null character x00.
RFC 2822
4.1. Miscellaneous obsolete tokens
The obs-char and obs-qp elements each add ASCII value 0.
Appendix B. Differences from earlier standards
Items marked with an asterisk (*) below are items which
appear in section 4 of this document and therefore can no longer be
generated.
12. ASCII 0 (null) removed.*
Challenge
So should the e-mail address someone@3com.com be accepted or not? It fails to satisfy requirement #7 above but I guess the code in Listing 9 would accept it.
Tom
It is working good
I tested it with many options and its working fine :D