如何用链接替换普通 URL?

更新时间：2023-02-23 14:31:52

首先，滚动您自己的正则表达式来解析 URL 是一个可怕的想法.你必须想象这是一个很常见的问题，有人已经编写、调试和测试根据RFC，它的库.URI 很复杂 - 查看 URL 代码在 Node.js 中解析和关于 URI 方案的***页面.

First off, rolling your own regexp to parse URLs is a terrible idea. You must imagine this is a common enough problem that someone has written, debugged and tested a library for it, according to the RFCs. URIs are complex - check out the code for URL parsing in Node.js and the Wikipedia page on URI schemes.

在解析 URL 时有很多边缘情况:国际域名，实际(.museum)与不存在(.etc)TLD，奇怪的标点符号，包括括号、URL 末尾的标点符号、IPV6 主机名等.

There are a ton of edge cases when it comes to parsing URLs: international domain names, actual (.museum) vs. nonexistent (.etc) TLDs, weird punctuation including parentheses, punctuation at the end of the URL, IPV6 hostnames etc.

我看过a大量库a>，尽管有一些缺点，但仍有一些值得使用:

I've looked at a ton of libraries, and there are a few worth using despite some downsides:

Soapbox 的 linkify 已经付出了一些努力，2015 年 6 月的一次重大重构删除了 jQuery 依赖.它仍然存在IDN 问题.
AnchorMe 是声称更快和更精简.还有一些IDN 问题.
Autolinker.js 非常具体地列出了功能(例如将正确处理 HTML 输入. 该实用程序不会更改anchor () 标签内的href 属性").当演示可用时，我会对其进行一些测试.

Soapbox's linkify has seen some serious effort put into it, and a major refactor in June 2015 removed the jQuery dependency. It still has issues with IDNs.
AnchorMe is a newcomer that claims to be faster and leaner. Some IDN issues as well.
Autolinker.js lists features very specifically (e.g. "Will properly handle HTML input. The utility will not change the href attribute inside anchor () tags"). I'll thrown some tests at it when a demo becomes available.

我很快取消了这项任务的资格的图书馆:

Libraries that I've disqualified quickly for this task:

Django 的 urlize 没有正确处理某些 TLD(这里是官方的有效 TLD 列表.没有演示.
autolink-js 不会检测到www.google.com"没有 http://，所以它不太适合自动链接在纯文本中找到的临时 URL"(没有方案/协议).
Ben Alman 的 linkify 自 2009 年以来就没有得到维护.

Django's urlize didn't handle certain TLDs properly (here is the official list of valid TLDs. No demo.
autolink-js wouldn't detect "www.google.com" without http://, so it's not quite suitable for autolinking "casual URLs" (without a scheme/protocol) found in plain text.
Ben Alman's linkify hasn't been maintained since 2009.

如果坚持使用正则表达式，最全面的就是网址来自组件的正则表达式，尽管它会通过查看它错误地检测到一些不存在的两字母 TLD.

If you insist on a regular expression, the most comprehensive is the URL regexp from Component, though it will falsely detect some non-existent two-letter TLDs by looking at it.

上一篇 : ：使用 Apache poi 从 docx 获取文本样式下一篇 : 使用C#在Excel中查找和替换文本

如何用链接替换普通 URL?

相关阅读

技术问答最新文章