且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用Python在电子邮件正文中查找链接

更新时间:2022-11-07 17:30:17

要检查电子邮件是否存在附件,您可以在邮件头中搜索Content-Type,并查看其是否显示"multipart/*".包含多部分内容类型的电子邮件可能包含附件.

To check if there are attachments to an e-mail you can search the headers for Content-Type and see if it says "multipart/*". E-mails with multipart content types may contain attachments.

要检查文本中的链接,图像等,可以尝试使用正则表达式.事实上,我认为这可能是您***的选择.使用正则表达式(或正则表达式),您可以找到与给定模式匹配的字符串.例如,模式"<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>"应该匹配电子邮件中的所有链接,无论它们是单个单词还是完整URL.希望对您有所帮助! 这是一个如何在Python中实现的示例:

To inspect the text for links, images, etc, you can try using Regular Expressions. As a matter of fact, this is probably your best option in my opinion. With regex (or Regular Expressions) you can find strings that match a given pattern. The pattern "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>", for example, should match all links in your email message regardless of whether they are a single word or a full URL. I hope that helps! Here's an example of how you can implement this in Python:

import re

text = "This is your e-mail body. It contains a link to <a 
href='http//www.google.com'>Google</a>."

link_pattern = re.compile('<a[^>]+href=\'(.*?)\'[^>]*>(.*)?</a>')
search = link_pattern.search(text)
if search is not None:
    print("Link found! -> " + search.group(0))
else:
    print("No links were found.")

对于最终用户",该链接将仅显示为"Google",而没有www,而没有更少的http(s)....但是,源代码将使用html对其进行包装,因此可以检查原始内容您可以在邮件中找到所有链接.

For the "end-user" the link will just appear as "Google", without www and much less http(s)... However, the source code will have the html wrapping it, so by inspecting the raw body of the message you can find all links.

我的代码并不完美,但我希望它能为您提供一个大致的方向...您可以在电子邮件正文中查找多个模式,以查看图像出现情况,视频等.要学习正则表达式,您将需要进行一些研究,这是指向***的另一个链接

My code is not perfect but I hope it gives you a general direction... You can have multiple patterns looked up in your e-mail body text, for image occurences, videos, etc. To learn Regular Expressions you'll need to research a little, here's another link, to Wikipedia