在Python中，如何检查字符串是否只包含某些字符？

更新时间：2023-02-26 12:47:12

（？）编辑

回答，包含在带有注释交互式会话的函数中：

 >>> import re 
>>>> def special_match（strg，search = re.compile（r'[^ a-z0-9。]'。）搜索）：
 ... return not bool（search（strg））
 .. 。
>>>> special_match（）
 True 
>>> special_match（az09。）
 True 
>>>上面的测试用例是捕获任何使用re.match（）的尝试
＃使用`$`而不是`````````````````` \Z`  - 见第（6）点。 
>>>> special_match（az09。＃）
 False 
>>> special_match（az09.X）
 False 
>>>

注意：在这个答案中使用re.match更多的时间显示match（）将赢得更长的字符串; match（）似乎比search（）有更大的开销，当最终的答案是True;这可能是令人费解的（也许是返回MatchObject而不是None的成本），并且可能需要进一步翻页。

  ====早期文本====

[以前接受的答案可以使用一些改进：

（1）演示表示是交互式Python会话的结果：

  reg = re.compile（'^ [a-z0-9\。] + $'）
>>> reg.match（'jsdlfjdsf12324..3432jsdflsdf'）
 True

但match（）不会返回 True

（2）与match（）一起使用时，模式开头的 ^ ，并且似乎比没有 ^

的相同模式原始字符串对于任何重新模式自动不引起注意

（4）圆点/句点前面的反斜杠是多余的

（5）比OP的代码更慢！

 提示> ：OP使用原始字符串！ 

 prompt> \python26\python -mtimeit -st ='jsdlfjdsf12324..3432jsdflsdf'; import 
 re; reg = re.compile（r'[^ a-z0 -9 \。]'）bool（reg.search（t））
 1000000循环，***的3：每循环1.43 usec 

提示> rem OP's version w / o backslash 

提示> \python26\python -mtimeit -st ='jsdlfjdsf12324..3432jsdflsdf'; import 
 re; reg = re.compile（r'[^ 
 1000000循环，***的3：1.44 usec每个循环

提示> rem cleansed-a-z0-9。]'bool（reg.search（t）接受回答的最新版本

提示> \python26\python -mtimeit -st ='jsdlfjdsf12324..3432jsdflsdf'; import 
 re; reg = re.compile（r' [a-z0-9。] + \Z'）bool（reg.match（t））
 100000循环，***的3：2.07 usec每个循环

提示&gt ; rem accepted answer 

 prompt> \python26\python -mtimeit -st ='jsdlfjdsf12324..3432jsdflsdf'; import 
 re; reg = re.compile（'^ [ a-z0-9 \。] + $'）bool（reg.match（t））
 100000循环，***的3：2.08 usec per loop 
  
 $ b  （6）可能产生错误的答案！ p $ p> >>> import re 
>>>> bool（re.compile（'^ [a-z0-9\。] + $'）。match（'1234\\\
'））
 True＃uh-oh 
>> > bool（re.compile（'^ [a-z0-9\。] + \Z'）。match（'1234\\\
'））
 False

In Python, how to check if a string only contains certain characters?

I need to check a string containing only a..z, 0..9, and . (period) and no other character.

I could iterate over each character and check the character is a..z or 0..9, or . but that would be slow.

I am not clear now how to do it with a regular expression.

Is this correct? Can you suggest a simpler regular expression or a more efficient approach.

#Valid chars . a-z 0-9 
def check(test_str):
    import re
    #http://docs.python.org/library/re.html
    #re.search returns None if no position in the string matches the pattern
    #pattern to search for any character other then . a-z 0-9
    pattern = r'[^\.a-z0-9]'
    if re.search(pattern, test_str):
        #Character other then . a-z 0-9 was found
        print 'Invalid : %r' % (test_str,)
    else:
        #No character other then . a-z 0-9 was found
        print 'Valid   : %r' % (test_str,)

check(test_str='abcde.1')
check(test_str='abcde.1#')
check(test_str='ABCDE.12')
check(test_str='_-/>"!@#12345abcde<')

'''
Output:
>>> 
Valid   : "abcde.1"
Invalid : "abcde.1#"
Invalid : "ABCDE.12"
Invalid : "_-/>"!@#12345abcde<"
'''

Final(?) edit

Answer, wrapped up in a function, with annotated interactive session:

>>> import re
>>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):
...     return not bool(search(strg))
...
>>> special_match("")
True
>>> special_match("az09.")
True
>>> special_match("az09.\n")
False
# The above test case is to catch out any attempt to use re.match()
# with a `$` instead of `\Z` -- see point (6) below.
>>> special_match("az09.#")
False
>>> special_match("az09.X")
False
>>>

Note: There is a comparison with using re.match() further down in this answer. Further timings show that match() would win with much longer strings; match() seems to have a much larger overhead than search() when the final answer is True; this is puzzling (perhaps it's the cost of returning a MatchObject instead of None) and may warrant further rummaging.

==== Earlier text ====

The [previously] accepted answer could use a few improvements:

(1) Presentation gives the appearance of being the result of an interactive Python session:

reg=re.compile('^[a-z0-9\.]+$')
>>>reg.match('jsdlfjdsf12324..3432jsdflsdf')
True

but match() doesn't return True

(2) For use with match(), the ^ at the start of the pattern is redundant, and appears to be slightly slower than the same pattern without the ^

(3) Should foster the use of raw string automatically unthinkingly for any re pattern

(4) The backslash in front of the dot/period is redundant

(5) Slower than the OP's code!

prompt>rem OP's version -- NOTE: OP used raw string!

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[^a-z0-9\.]')" "not bool(reg.search(t))"
1000000 loops, best of 3: 1.43 usec per loop

prompt>rem OP's version w/o backslash

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"
1000000 loops, best of 3: 1.44 usec per loop

prompt>rem cleaned-up version of accepted answer

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile(r'[a-z0-9.]+\Z')" "bool(reg.match(t))"
100000 loops, best of 3: 2.07 usec per loop

prompt>rem accepted answer

prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
re;reg=re.compile('^[a-z0-9\.]+$')" "bool(reg.match(t))"
100000 loops, best of 3: 2.08 usec per loop

(6) Can produce the wrong answer!!

>>> import re
>>> bool(re.compile('^[a-z0-9\.]+$').match('1234\n'))
True # uh-oh
>>> bool(re.compile('^[a-z0-9\.]+\Z').match('1234\n'))
False

上一篇 : ：Oracle空检查字符串字段下一篇 : 将HTML \ PHP页面加载到另一个页面的div中

在Python中，如何检查字符串是否只包含某些字符？

相关阅读

技术问答最新文章