且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用正则表达式python查找电子邮件

更新时间:2022-11-15 07:38:37

python re docs \w 匹配任何字母数字字符和下划线,相当于集合 [a-zA-Z0-9 _] 。因此 [\w\ .-] 将适当地匹配数字和字符。

From the python re docs, \w matches any alphanumeric character and underscores, equivalent to the set [a-zA-Z0-9_]. So [\w\.-] will appropriately match numbers as well as characters.

email = re.findall(r'[\w\.-]+@[\w\.-]+(\.[\w]+)+',line)

这篇文章更加广泛地讨论了匹配的电子邮件地址,您遇到了很多陷阱,导致您的代码无法捕获匹配的电子邮件地址。例如,电子邮件地址不能完全由标点符号组成( ... @ .... )。此外,地址通常有最大长度,具体取决于电子邮件服务器。此外,许多电子邮件服务器都匹配非英语字符。因此,根据您的需求,您可能需要一个更全面的模式。

This post discusses matching email addresses much more extensively, and there are a couple more pitfalls you run into matching email addresses that your code fails to catch. For example, email addresses cannot be made up entirely of punctuation (...@....). Additionally, there is often a maximum length on addresses, depending on the email server. Also, many email servers match non-english characters. So depending on your needs you may need a more comprehensive pattern.