You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 

435 lines
16 KiB

  1. Metadata-Version: 2.1
  2. Name: email-validator
  3. Version: 1.0.4
  4. Summary: A robust email syntax and deliverability validation library for Python 2.x/3.x.
  5. Home-page: https://github.com/JoshData/python-email-validator
  6. Author: Joshua Tauberer
  7. Author-email: jt@occams.info
  8. License: CC0 (copyright waived)
  9. Keywords: email address validator
  10. Platform: UNKNOWN
  11. Classifier: Development Status :: 5 - Production/Stable
  12. Classifier: License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
  13. Classifier: Intended Audience :: Developers
  14. Classifier: Topic :: Software Development :: Libraries :: Python Modules
  15. Classifier: Programming Language :: Python :: 2
  16. Classifier: Programming Language :: Python :: 2.7
  17. Classifier: Programming Language :: Python :: 3
  18. Classifier: Programming Language :: Python :: 3.4
  19. Classifier: Programming Language :: Python :: 3.5
  20. Classifier: Programming Language :: Python :: 3.6
  21. Requires-Dist: idna (>=2.0.0)
  22. Requires-Dist: dnspython (>=1.15.0)
  23. email\_validator
  24. ================
  25. A robust email address syntax and deliverability validation library
  26. for Python 2.7/3.4 by `Joshua Tauberer <https://razor.occams.info>`__.
  27. This library validates that address are of the form ``x@y.com``. This is
  28. the sort of validation you would want for a login form on a website.
  29. Key features:
  30. * Good for validating email addresses used for logins/identity.
  31. * Friendly error messages when validation fails (appropriate to show to end users).
  32. * (optionally) Checks deliverability: Does the domain name resolve?
  33. * Supports internationalized domain names and (optionally) internationalized local parts.
  34. * Normalizes email addresses (super important for internationalized addresses! see below).
  35. The library is NOT for validation of the To: line in an email message (e.g.
  36. ``My Name <my@address.com>``), which `flanker <https://github.com/mailgun/flanker>`__
  37. is more appropriate for. And this library does NOT permit obsolete
  38. forms of email addresses, so if you need strict validation against the
  39. email specs exactly, use `pyIsEmail <https://github.com/michaelherold/pyIsEmail>`__.
  40. The current version is 1.0.3 (Sept 12, 2017). The only changes since 1.0.0 (Sept 5, 2015)
  41. have been small bug and packaging fixes.
  42. Installation
  43. ------------
  44. This package is on PyPI, so:
  45. ::
  46. pip install email_validator
  47. ``pip3`` also works.
  48. Usage
  49. -----
  50. If you're validating a user's email address before creating a user
  51. account, you might do this:
  52. ::
  53. from email_validator import validate_email, EmailNotValidError
  54. email = "my+address@mydomain.tld"
  55. try:
  56. v = validate_email(email) # validate and get info
  57. email = v["email"] # replace with normalized form
  58. except EmailNotValidError as e:
  59. # email is not valid, exception message is human-readable
  60. print(str(e))
  61. This validates the address and gives you its normalized form. You should
  62. put the normalized form in your database and always normalize before
  63. checking if an address is in your database.
  64. The validator will accept internationalized email addresses, but email
  65. addresses with non-ASCII characters in the *local* part of the address
  66. (before the @-sign) require the `SMTPUTF8 <https://tools.ietf.org/html/rfc6531>`__
  67. extension which may not be supported by your mail submission library or
  68. your outbound mail server. If you know ahead of time that SMTPUTF8 is
  69. not supported then **add the keyword argument allow_smtputf8=False
  70. to fail validation for addresses that would require SMTPUTF8**:
  71. ::
  72. validate_email(email, allow_smtputf8=False)
  73. Overview
  74. --------
  75. The module provides a function ``validate_email(email_address)`` which takes
  76. an email address (either a ``str`` or ASCII ``bytes``) and:
  77. - Raises a ``EmailNotValidError`` with a helpful, human-readable error
  78. message explaining why the email address is not valid, or
  79. - Returns a dict with information about the deliverability of the email
  80. address.
  81. When an email address is not valid, ``validate_email`` raises either an
  82. ``EmailSyntaxError`` if the form of the address is invalid or an
  83. ``EmailUndeliverableError`` if the domain name does not resolve. Both
  84. exception classes are subclasses of ``EmailNotValidError``, which in
  85. turn is a subclass of ``ValueError``.
  86. But when an email address is valid, a dict is returned containing
  87. information that might aid deliverability (see below).
  88. The validator doesn't permit obsoleted forms of email addresses that no one
  89. uses anymore even though they are still valid and deliverable, since they
  90. will probably give you grief if you're using email for login. (See later in the
  91. document about that.)
  92. The validator checks that the domain name in the email address resolves.
  93. There is nothing to be gained by trying to actually contact an SMTP
  94. server, so that's not done here. For privacy, security, and practicality
  95. reasons servers are good at not giving away whether an address is
  96. deliverable or not: email addresses that appear to accept mail at first
  97. can bounce mail after a delay, and bounced mail may indicate a temporary
  98. failure of a good email address (sometimes an intentional failure, like
  99. greylisting).
  100. The function also accepts the following keyword arguments (default as
  101. shown):
  102. ``allow_smtputf8=True``
  103. Set to ``False`` to prohibit internationalized
  104. addresses that would require the `SMTPUTF8 <https://tools.ietf.org/html/rfc6531>`__
  105. extension.
  106. ``check_deliverability=True``
  107. Set to ``False`` to skip the domain name resolution check.
  108. ``allow_empty_local=False``
  109. Set to ``True`` to allow an empty local
  110. part (i.e. ``@example.com``), e.g. for validating Postfix aliases.
  111. Internationalized email addresses
  112. ---------------------------------
  113. The email protocol SMTP and the domain name system DNS have historically
  114. only allowed ASCII characters in email addresses and domain names,
  115. respectively. Each has adapted to internationalization in a separate
  116. way, creating two separate aspects to email address
  117. internationalization.
  118. Internationalized domain names (IDN)
  119. ''''''''''''''''''''''''''''''''''''
  120. The first is `internationalized domain names (RFC
  121. 5891) <https://tools.ietf.org/html/rfc5891>`__, a.k.a IDNA 2008. The DNS system has not
  122. been updated with Unicode support. Instead, internationalized domain
  123. names are converted into a special IDNA ASCII form starting with
  124. ``xn--``. When an email address has non-ASCII characters in its domain
  125. part, the domain part is replaced with its IDNA ASCII equivalent form
  126. in the process of mail transmission. Your mail submission library probably
  127. does this for you transparently. Note that most web browsers are currently
  128. in transition between IDNA 2003 (RFC 3490) and IDNA 2008 (RFC 5891) and
  129. `compliance around the web is not very good <http://archives.miloush.net/michkap/archive/2012/02/27/10273315.html>`__
  130. in any case, so be aware that edge cases are handled differently by different
  131. applications and libraries. This library conforms to IDNA 2008 using the
  132. `idna <https://github.com/kjd/idna>`__ module by Kim Davies.
  133. Internationalized local parts
  134. '''''''''''''''''''''''''''''
  135. The second sort of internationalization is internationalization in the
  136. *local* part of the address (before the @-sign). These email addresses
  137. require that your mail submission library and the mail servers along the
  138. route to the destination, including your own outbound mail server, all
  139. support the `SMTPUTF8 (RFC
  140. 6531) <https://tools.ietf.org/html/rfc6531>`__ extension. Support for
  141. SMTPUTF8 varies.
  142. How this module works
  143. '''''''''''''''''''''
  144. By default all internationalized forms are accepted by the validator.
  145. But if you know ahead of time that SMTPUTF8 is not supported by your
  146. mail submission stack, then you must filter out addresses that require
  147. SMTPUTF8 using the ``allow_smtputf8=False`` keyword argument (see
  148. above). This will cause the validation function to raise a
  149. ``EmailSyntaxError`` if delivery would require SMTPUTF8. That's just
  150. in those cases where non-ASCII characters appear before the @-sign.
  151. If you do not set ``allow_smtputf8=False``, you can also check the
  152. value of the ``smtputf8`` field in the returned dict.
  153. If your mail submission library doesn't support Unicode at all --- even
  154. in the domain part of the address --- then immediately prior to mail
  155. submission you must replace the email address with its ASCII-ized
  156. form. This library gives you back the ASCII-ized form in the
  157. ``email_ascii`` field in the returned dict, which you can get like this:
  158. ::
  159. v = validate_email(email, allow_smtputf8=False)
  160. email = v['email_ascii']
  161. The local part is left alone (if it has internationalized characters
  162. ``allow_smtputf8=False`` will force validation to fail) and the domain
  163. part is converted to `IDNA
  164. ASCII <https://tools.ietf.org/html/rfc5891>`__. (You probably should not
  165. do this at account creation time so you don't change the user's login
  166. information without telling them.)
  167. UCS-4 support required for Python 2.7
  168. '''''''''''''''''''''''''''''''''''''
  169. Note that when using Python 2.7, it is required that it was built with
  170. UCS-4 support (see `here <https://stackoverflow.com/questions/29109944/python-returns-length-of-2-for-single-unicode-character-string>`__); otherwise emails with unicode characters outside
  171. of the BMP (Basic Multilingual Plane) will not validate correctly.
  172. Normalization
  173. -------------
  174. The use of Unicode in email addresses introduced a normalization problem.
  175. Different Unicode strings can look identical and have the same semantic
  176. meaning to the user. The ``email`` field returned on successful validation
  177. provides the correctly normalized form of the given email address:
  178. ::
  179. v = validate_email(email)
  180. email = v['email']
  181. Because you may get an email address in a variety of forms, you ought to replace
  182. it with its normalized form immediately prior to going into your database
  183. (during account creation), querying your database (during login), or sending
  184. outbound mail.
  185. The normalizations include lowercasing the domain part of the email address
  186. (domain names are case-insensitive), `Unicode "NFC" normalization <https://en.wikipedia.org/wiki/Unicode_equivalence>`__
  187. of the whole address (which turns characters plus `combining characters <https://en.wikipedia.org/wiki/Combining_character>`__
  188. into precomposed characters where possible and replaces certain Unicode characters
  189. (such as angstrom and ohm) with other equivalent code points (a-with-ring and omega,
  190. respectively)), replacement of `fullwidth and halfwidth characters <https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms>`__
  191. in the domain part, and possibly other `UTS46 <http://unicode.org/reports/tr46>`__ mappings
  192. on the domain part.
  193. (See `RFC 6532 (internationalized email) section 3.1 <https://tools.ietf.org/html/rfc6532#section-3.1>`__
  194. and `RFC 5895 (IDNA 2008) section 2 <http://www.ietf.org/rfc/rfc5895.txt>`__.)
  195. Examples
  196. --------
  197. For the email address ``test@example.org``, the returned dict is:
  198. ::
  199. {
  200. "email": "test@example.org",
  201. "email_ascii": "test@example.org",
  202. "local": "test",
  203. "domain": "example.org",
  204. "domain_i18n": "example.org",
  205. "smtputf8": false,
  206. "mx": [
  207. [
  208. 0,
  209. "93.184.216.34"
  210. ]
  211. ],
  212. "mx-fallback": "A"
  213. }
  214. For the fictitious address ``example@良好Mail.中国``, which has an
  215. internationalized domain but ASCII local part, the returned dict is:
  216. ::
  217. {
  218. "email": "example@良好mail.中国",
  219. "email_ascii": "example@xn--mail-p86gl01s.xn--fiqs8s",
  220. "local": "example",
  221. "domain": "xn--mail-p86gl01s.xn--fiqs8s",
  222. "domain_i18n": "良好mail.中国",
  223. "smtputf8": false,
  224. "mx": [
  225. [
  226. 0,
  227. "218.241.116.40"
  228. ]
  229. ],
  230. "mx-fallback": "A"
  231. }
  232. Note that ``smtputf8`` is ``False`` even though the domain part is
  233. internationalized because
  234. `SMTPUTF8 <https://tools.ietf.org/html/rfc6531>`__ is only
  235. needed if the local part of the address is internationalized (the domain
  236. part can be converted to IDNA ASCII). Also note that the ``email`` and
  237. ``domain_i18n`` fields provide a normalized form of the email address
  238. and domain name (casefolding and Unicode normalization as required by
  239. IDNA 2008).
  240. For the fictitious address ``树大@occams.info``, which has an
  241. internationalized local part, the returned dict is:
  242. ::
  243. {
  244. "email": "树大@occams.info",
  245. "local": "树大",
  246. "domain": "occams.info",
  247. "domain_i18n": "occams.info",
  248. "smtputf8": true,
  249. "mx": [
  250. [
  251. 10,
  252. "box.occams.info"
  253. ]
  254. ],
  255. "mx-fallback": false
  256. }
  257. Now ``smtputf8`` is ``True`` and ``email_ascii`` is missing because the
  258. local part of the address is internationalized. The ``local`` and ``email``
  259. fields return the normalized form of the address: certain Unicode characters
  260. (such as angstrom and ohm) may be replaced by other equivalent code points
  261. (a-with-ring and omega).
  262. Return value
  263. ------------
  264. When an email address passes validation, the fields in the returned dict
  265. are:
  266. ``email``
  267. The canonical form of the email address, mostly useful for
  268. display purposes. This merely combines the ``local`` and
  269. ``domain_i18n`` fields (see below).
  270. ``email_ascii``
  271. If present, an ASCII-only form of the email address
  272. by replacing the domain part with `IDNA
  273. ASCII <https://tools.ietf.org/html/rfc5891>`__. This field will be
  274. present when an ASCII-only form of the email address exists
  275. (including if the email address is already ASCII). If the local part
  276. of the email address contains internationalized characters,
  277. ``email_ascii`` will not be present.
  278. ``local``
  279. The local part of the given email address (before the
  280. @-sign) with Unicode NFC normalization applied.
  281. ``domain``
  282. The `IDNA ASCII <https://tools.ietf.org/html/rfc5891>`__-encoded form of the
  283. domain part of the given email address (after the @-sign), as it
  284. would be transmitted on the wire.
  285. ``domain_i18n``
  286. The canonical internationalized form of
  287. the domain part of the address, by round-tripping through IDNA ASCII.
  288. If the returned string contains non-ASCII characters, either the
  289. `SMTPUTF8 <https://tools.ietf.org/html/rfc6531>`__ feature of MTAs
  290. will be required to transmit the message or else the email address('s
  291. domain part) must be converted to IDNA ASCII first (given in the
  292. returned ``domain`` field).
  293. ``smtputf8``
  294. A boolean indicating that the `SMTPUTF8 <https://tools.ietf.org/html/rfc6531>`__
  295. feature of MTAs will be required to transmit messages to this address because the
  296. local part of the address has non-ASCII characters (the local part
  297. cannot be IDNA-encoded). If ``allow_smtputf8=False`` is passed as an
  298. argument, this flag will always be false because an exception is raised
  299. if it would have been true.
  300. ``mx``
  301. A list of `(priority, domain)` tuples of MX records specified
  302. in the DNS for the domain (see `RFC 5321 section
  303. 5 <https://tools.ietf.org/html/rfc5321#section-5>`__).
  304. ``mx-fallback``
  305. ``None`` if an ``MX`` record is found. If no MX
  306. records are actually specified in DNS and instead are inferred,
  307. through an obsolete mechanism, from A or AAAA records, the value is
  308. the type of DNS record used instead (``A`` or ``AAAA``).
  309. Assumptions
  310. -----------
  311. By design, this validator does not pass all email addresses that
  312. strictly conform to the standards. Many email address forms are obsolete
  313. or likely to cause trouble:
  314. - The validator assumes the email address is intended to be deliverable
  315. on the public Internet using DNS, and so the domain part of the email
  316. address must be a resolvable domain name.
  317. - The "quoted string" form of the local part of the email address (RFC
  318. 5321 4.1.2) is not permitted --- no one uses this anymore anyway.
  319. Quoted forms allow multiple @-signs, space characters, and other
  320. troublesome conditions.
  321. - The "literal" form for the domain part of an email address (an IP
  322. address) is not accepted --- no one uses this anymore anyway.
  323. Testing
  324. -------
  325. A handful of valid email addresses are pasted in ``test_pass.txt``. Run
  326. them through the validator (without deliverability checks) like so:
  327. ::
  328. python3 email_validator/__init__.py --tests < test_pass.txt
  329. For Project Maintainers
  330. -----------------------
  331. The package is distributed as a universal wheel. The wheel is specified as
  332. universal in the file ``setup.cfg`` by the ``universal = 1`` key in the
  333. ``[bdist_wheel]`` section. To publish a universal wheel to pypi::
  334. pip3 install twine
  335. rm -rf dist
  336. python3 setup.py bdist_wheel
  337. twine upload dist/*
  338. git tag v1.0.XXX
  339. git push --tags