更新时间:2023-02-23 14:31:52
首先,滚动您自己的正则表达式来解析 URL 是一个可怕的想法.你必须想象这是一个很常见的问题,有人已经编写、调试和测试根据RFC,它的库.URI 很复杂 - 查看 URL 代码在 Node.js 中解析 和关于 URI 方案的***页面.
First off, rolling your own regexp to parse URLs is a terrible idea. You must imagine this is a common enough problem that someone has written, debugged and tested a library for it, according to the RFCs. URIs are complex - check out the code for URL parsing in Node.js and the Wikipedia page on URI schemes.
在解析 URL 时有很多边缘情况:国际域名,实际(.museum
)与不存在(.etc
)TLD,奇怪的标点符号,包括括号、URL 末尾的标点符号、IPV6 主机名等.
There are a ton of edge cases when it comes to parsing URLs: international domain names, actual (.museum
) vs. nonexistent (.etc
) TLDs, weird punctuation including parentheses, punctuation at the end of the URL, IPV6 hostnames etc.
I've looked at a ton of libraries, and there are a few worth using despite some downsides:
href
属性").当演示可用时,我会对其进行一些测试.href
attribute inside anchor () tags"). I'll thrown some tests at it when a demo becomes available.我很快取消了这项任务的资格的图书馆:
Libraries that I've disqualified quickly for this task:
如果坚持使用正则表达式,最全面的就是网址来自组件的正则表达式,尽管它会通过查看它错误地检测到一些不存在的两字母 TLD.
If you insist on a regular expression, the most comprehensive is the URL regexp from Component, though it will falsely detect some non-existent two-letter TLDs by looking at it.