更新时间:2022-11-15 07:38:37
从 python re docs , \w
匹配任何字母数字字符和下划线,相当于集合 [a-zA-Z0-9 _]
。因此 [\w\ .-]
将适当地匹配数字和字符。
From the python re docs, \w
matches any alphanumeric character and underscores, equivalent to the set [a-zA-Z0-9_]
. So [\w\.-]
will appropriately match numbers as well as characters.
email = re.findall(r'[\w\.-]+@[\w\.-]+(\.[\w]+)+',line)
这篇文章更加广泛地讨论了匹配的电子邮件地址,您遇到了很多陷阱,导致您的代码无法捕获匹配的电子邮件地址。例如,电子邮件地址不能完全由标点符号组成( ... @ ....
)。此外,地址通常有最大长度,具体取决于电子邮件服务器。此外,许多电子邮件服务器都匹配非英语字符。因此,根据您的需求,您可能需要一个更全面的模式。
This post discusses matching email addresses much more extensively, and there are a couple more pitfalls you run into matching email addresses that your code fails to catch. For example, email addresses cannot be made up entirely of punctuation (...@....
). Additionally, there is often a maximum length on addresses, depending on the email server. Also, many email servers match non-english characters. So depending on your needs you may need a more comprehensive pattern.