更新时间:2023-01-10 09:53:48
使用:
import re
sequence = "AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC"
matches = re.findall(r'(?:AGATC)+', sequence)
# To find the longest subsequence
longest = max(matches, key=len)
说明:
非捕获组(?: AGATC)+
+
量词-一次和无限次匹配,例如 AGATC
字面上匹配字符AGATC(区分大小写)+
Quantifier — Matches between one and unlimited times, as many times as possible.AGATC
matches the characters AGATC literally (case sensitive)结果:
# print(matches)
['AGATCAGATC', 'AGATCAGATCAGATCAGATCAGATC']
# print(longest)
'AGATCAGATCAGATCAGATCAGATC'
您可以测试正则表达式 此处
。
You can test the regex here
.