更新时间:2023-02-26 13:13:43
Final(?) edit
答案,包含在一个函数中,带有带注释的交互式会话:
>>>进口重新>>>def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):... return not bool(search(strg))...>>>special_match("")真的>>>special_match("az09.")真的>>>special_match("az09. ")错误的# 上面的测试用例是为了捕捉任何使用 re.match() 的尝试# 用 `$` 代替 `` -- 参见下面的第 (6) 点.>>>special_match("az09.#")错误的>>>special_match("az09.X")错误的>>>注意:在此答案中进一步与使用 re.match() 进行了比较.进一步的时间显示 match() 将赢得更长的字符串;当最终答案为 True 时,match() 似乎比 search() 有更大的开销;这令人费解(也许这是返回 MatchObject 而不是 None 的成本)并且可能需要进一步翻找.
==== 前面的文字 ====
[以前] 接受的答案可以使用一些改进:
(1) 演示文稿看起来像是交互式 Python 会话的结果:
reg=re.compile('^[a-z0-9.]+$')>>>reg.match('jsdlfjdsf12324..3432jsdflsdf')真的
但 match() 不返回 True
(2) 与match()一起使用时,模式开头的^
是多余的,看起来比没有^
的相同模式稍微慢一些>
(3) 对于任何重新模式,应该不假思索地自动使用原始字符串
(4)点/句号前面的反斜杠是多余的
(5) 比 OP 的代码还慢!
prompt>rem OP 的版本 -- 注意:OP 使用原始字符串!prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"1000000 个循环,***的 3 个:每个循环 1.43 微秒prompt>rem OP 的版本,没有反斜杠prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"1000000 个循环,***的 3 个:每个循环 1.44 微秒已接受答案的 prompt>rem 清理版本prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[a-z0-9.]+')" "bool(reg.match(t))"100000 个循环,***的 3 个:每个循环 2.07 微秒prompt>rem 接受答案prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile('^[a-z0-9.]+$')" "bool(reg.match(t))"100000 个循环,***的 3 个:每个循环 2.08 微秒
(6) 会产生错误的答案!!
>>>进口重新>>>bool(re.compile('^[a-z0-9.]+$').match('1234 '))真的#呃哦>>>bool(re.compile('^[a-z0-9.]+').match('1234 '))错误的In Python, how to check if a string only contains certain characters?
I need to check a string containing only a..z, 0..9, and . (period) and no other character.
I could iterate over each character and check the character is a..z or 0..9, or . but that would be slow.
I am not clear now how to do it with a regular expression.
Is this correct? Can you suggest a simpler regular expression or a more efficient approach.
#Valid chars . a-z 0-9
def check(test_str):
import re
#http://docs.python.org/library/re.html
#re.search returns None if no position in the string matches the pattern
#pattern to search for any character other then . a-z 0-9
pattern = r'[^.a-z0-9]'
if re.search(pattern, test_str):
#Character other then . a-z 0-9 was found
print 'Invalid : %r' % (test_str,)
else:
#No character other then . a-z 0-9 was found
print 'Valid : %r' % (test_str,)
check(test_str='abcde.1')
check(test_str='abcde.1#')
check(test_str='ABCDE.12')
check(test_str='_-/>"!@#12345abcde<')
'''
Output:
>>>
Valid : "abcde.1"
Invalid : "abcde.1#"
Invalid : "ABCDE.12"
Invalid : "_-/>"!@#12345abcde<"
'''
Final(?) edit
Answer, wrapped up in a function, with annotated interactive session:
>>> import re
>>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):
... return not bool(search(strg))
...
>>> special_match("")
True
>>> special_match("az09.")
True
>>> special_match("az09.
")
False
# The above test case is to catch out any attempt to use re.match()
# with a `$` instead of `` -- see point (6) below.
>>> special_match("az09.#")
False
>>> special_match("az09.X")
False
>>>
Note: There is a comparison with using re.match() further down in this answer. Further timings show that match() would win with much longer strings; match() seems to have a much larger overhead than search() when the final answer is True; this is puzzling (perhaps it's the cost of returning a MatchObject instead of None) and may warrant further rummaging.
==== Earlier text ====
The [previously] accepted answer could use a few improvements:
(1) Presentation gives the appearance of being the result of an interactive Python session:
reg=re.compile('^[a-z0-9.]+$')
>>>reg.match('jsdlfjdsf12324..3432jsdflsdf')
True
but match() doesn't return True
(2) For use with match(), the ^
at the start of the pattern is redundant, and appears to be slightly slower than the same pattern without the ^
(3) Should foster the use of raw string automatically unthinkingly for any re pattern
(4) The backslash in front of the dot/period is redundant
(5) Slower than the OP's code!
prompt>rem OP's version -- NOTE: OP used raw string!
prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"
1000000 loops, best of 3: 1.43 usec per loop
prompt>rem OP's version w/o backslash
prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"
1000000 loops, best of 3: 1.44 usec per loop
prompt>rem cleaned-up version of accepted answer
prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[a-z0-9.]+')" "bool(reg.match(t))"
100000 loops, best of 3: 2.07 usec per loop
prompt>rem accepted answer
prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile('^[a-z0-9.]+$')" "bool(reg.match(t))"
100000 loops, best of 3: 2.08 usec per loop
(6) Can produce the wrong answer!!
>>> import re
>>> bool(re.compile('^[a-z0-9.]+$').match('1234
'))
True # uh-oh
>>> bool(re.compile('^[a-z0-9.]+').match('1234
'))
False