且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

提取两个标记之间的所有子字符串

更新时间:2023-02-22 12:48:02

该如何解决? 我会的:

import re
mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
found = re.findall(r"\&marker1\n(.*?)/\n", mystr)
print(found)

输出:

['The String that I want ', 'Another string that I want ']

请注意:

    如果需要文字&,
  • &re模式中具有特殊含义.您需要对其进行转义(\&)
  • .匹配除换行符之外的所有内容
  • 如果只想要匹配的子字符串列表而不是search ,则
  • findall更适合选择
  • *?是非贪婪的,在这种情况下.*也可以工作,因为.与换行符不匹配,但是在其他情况下,匹配结束可能会超出您的期望
  • 我使用了所谓的raw-string(r前缀)使转义变得更容易
  • & has special meaning in re patterns, if you want literal & you need to escape it (\&)
  • . does match anything except newlines
  • findall is better suited choiced if you just want list of matched substrings, rather than search
  • *? is non-greedy, in this case .* would work too, because . do not match newline, but in other cases you might ending matching more than you wish
  • I used so-called raw-string (r-prefixed) to make escaping easier

阅读模块re 文档讨论原始字符串的用法以及具有特殊含义的隐式字符列表.

Read module re documentation for discussion of raw-string usage and implicit list of characters with special meaning.