且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

RegEx 在有换行符的两个字符串之间获取字符串

更新时间:2023-11-13 22:54:28

使用 re.Sre.DOTALL 标志.或者在正则表达式前加上 (?s) 使 . 匹配所有字符(包括换行符).

没有标志,. 不匹配换行符.

(?s)(?)

示例:

>>>s = ''' ... 我的课程:测试数据
... 测试部分:<br>...我的部分<br>... 我的第 2 部分
... </td>'''>>>>>>进口重新>>>re.findall('(?\n 我的部分
\n 我的部分 2
\n ']>>>re.findall('(?s)(?)', s)[' 部分:
\n 我的部分
\n 我的部分 2
\n ']

I have the following test (formatted just like below):

<td scope="row" align="left">
      My Class: TEST DATA<br>
      Test Section: <br>
      MY SECTION<br>
      MY SECTION 2<br>
    </td>

I'm attempting to get the text between "Test Section: and the after the MY SECTION

I've tried several attempts with different RegEx patterns and I'm not getting anywhere.

If I do:

(?<=Test)(.*?)(?=<br)

Then I get the correct response of:

' Section: '

But, if I do

(?<=Test)(.*?)(?=</td>)

I get no results. The results should be "MY SECTIon
MY SECTION 2
"

I've tried using RegEx Multiline as well with no results.

Any help would be appreciated.

If it matters I'm coding in Python 2.7.

If something is not clear, or you need more info, please let me know.

Use re.S or re.DOTALL flags. Or prepend the regular expression with (?s) to make . matches all character (including newline).

Without the flags, . does not match newline.

(?s)(?<=Test)(.*?)(?=</td>)


Example:

>>> s = '''<td scope="row" align="left">
...       My Class: TEST DATA<br>
...       Test Section: <br>
...       MY SECTION<br>
...       MY SECTION 2<br>
...     </td>'''
>>>
>>> import re
>>> re.findall('(?<=Test)(.*?)(?=</td>)', s)  # without flags
[]
>>> re.findall('(?<=Test)(.*?)(?=</td>)', s, flags=re.S)
[' Section: <br>\n      MY SECTION<br>\n      MY SECTION 2<br>\n    ']
>>> re.findall('(?s)(?<=Test)(.*?)(?=</td>)', s)
[' Section: <br>\n      MY SECTION<br>\n      MY SECTION 2<br>\n    ']

登录 关闭
扫码关注1秒登录
RegEx 在有换行符的两个字符串之间获取字符串
发送“验证码”获取 | 15天全站免登陆