且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

删除所有HTML标记

更新时间:2023-12-05 10:31:46

请参见 http://regex101.com/r /gJ2yN2

正则表达式(\\.\d{3,}.*?\s|(\\r|\\n)+)用于删除您指出的内容.

The regex (\\.\d{3,}.*?\s|(\\r|\\n)+) works to remove the things you were pointing out.

结果(用单个空格替换匹配项):

Result (replacing the match with a single space):

normal text here http://a_random_link_here.com Some more text here

如果这不是您想要的结果,请用预期的结果编辑您的问题.

If this was not the result you were looking for, please edit your question with the expected result.

EDIT 正则表达式说明:

()  - match everything inside the parentheses (later, the "match" gets replaced with "space")
\\  - an 'escaped' backslash (i.e. an actual backslash; the first one "protects" the second
      so it is not interpreted as a special character
.   - any character (I saw 'u', but there might be others
\d  - a digit
{3,} - "at least three"
.*? - any characters, "lazy" (stop as soon as possible)
\s  - until you hit a white space
|   - or
()  - one of these things
\\r - backslash - r (again, with escaped '\')
\\n - backslash - n