在python中使用正则表达式捕获表情符号

更新时间：2023-02-22 13:44:53

我认为它终于点击"了您在这里问的问题.看看下面的内容:

I think it finally "clicked" exactly what you're asking about here. Take a look at the below:

import re

smiley_pattern = '^(:\(|:\))+$' # matches only the smileys ":)" and ":("

def test_match(s):
    print 'Value: %s; Result: %s' % (
        s,
        'Matches!' if re.match(smiley_pattern, s) else 'Doesn\'t match.'
    )

should_match = [
    ':)',   # Single smile
    ':(',   # Single frown
    ':):)', # Two smiles
    ':(:(', # Two frowns
    ':):(', # Mix of a smile and a frown
]
should_not_match = [
    '',         # Empty string
    ':(foo',    # Extraneous characters appended
    'foo:(',    # Extraneous characters prepended
    ':( :(',    # Space between frowns
    ':( (',     # Extraneous characters and space appended
    ':(('       # Extraneous duplicate of final character appended
]

print('The following should all match:')
for x in should_match: test_match(x);

print('')   # Newline for output clarity

print('The following should all not match:')
for x in should_not_match: test_match(x);

您的原始代码的问题在于您的正则表达式错误:(:\().让我们分解一下.

The problem with your original code is that your regex is wrong: (:\(). Let's break it down.

外面的括号是一个分组".如果您要进行字符串替换，它们就是您要引用的内容，并且用于一次对字符组应用正则表达式运算符.所以，你真的是在说:

The outside parentheses are a "grouping". They're what you'd reference if you were going to do a string replacement, and are used to apply regex operators on groups of characters at once. So, you're really saying:

( 开始一组
- :$ ...做正则表达式的东西...
- ( begin a group
  - :\( ... do regex stuff ...
  : 不是正则表达式保留字符，所以它只是一个冒号.\ 是，它的意思是下面的字符是文字，不是正则表达式".这称为转义序列".完全解析成英文，你的正则表达式说
  
  The : isn't a regex reserved character, so it's just a colon. The \ is, and it means "the following character is literal, not a regex operator". This is called an "escape sequence". Fully parsed into English, your regex says
  - ( 开始一组
    - : 一个冒号字符
    - \( 一个左括号字符
    - ( begin a group
      - : a colon character
      - \( a left parenthesis character
      我使用的正则表达式稍微复杂一些，但还不错.让我们分解一下:^(:\(|:$)+$.
      
      The regex I used is slightly more complex, but not bad. Let's break it down: ^(:$|:$)+$.
      
      ^ 和 $ 分别表示行首"和行尾".现在我们有...
      
      ^ and $ mean "the beginning of the line" and "the end of the line" respectively. Now we have ...
      - ^ 行首
        
        (:$|:$)+ ... 做正则表达式 ...
        
        ^ beginning of line
        
        (:$|:$)+ ... do regex stuff ...
        
        ...所以它只匹配构成整行的内容，而不是简单地出现在字符串的中间.
        
        ... so it only matches things that comprise the entire line, not simply occur in the middle of the string.
        
        我们知道 ( 和 ) 表示一个分组.+ 表示其中之一".现在我们有:
        
        We know that ( and ) denote a grouping. + means "one of more of these". Now we have:
        
        ^ 行首
        
        ( 开始组
        
        :$|:$ ... 做正则表达式 ...
        
        ^ beginning of line
        
        ( start a group
        
        :$|:$ ... do regex stuff ...
        
        最后是 |(管道)运算符.它的意思是或".因此，应用我们从上文中了解的有关转义字符的知识，我们已准备好完成翻译:
        
        Finally, there's the | (pipe) operator. It means "or". So, applying what we know from above about escaping characters, we're ready to complete the translation:
        
        ^ 行首
        
        ( 开始组
        
        : 一个冒号字符
        
        $ 一个左括号字符
        
        ^ beginning of line
        
        ( start a group
        
        : a colon character
        
        \( a left parenthesis character
        
        : 一个冒号字符
        
        $ 一个右括号字符
        
        : a colon character
        
        \) a right parenthesis character
        
        我希望这会有所帮助.如果没有，请告诉我，我很乐意通过回复编辑我的答案.
        
        I hope this helps. If not, let me know and I'll be happy to edit my answer with a reply.

上一篇 : ：正则表达式:使用多个分隔符验证欧洲日期格式下一篇 : Java正则表达式中的新行和美元符号

在python中使用正则表达式捕获表情符号

相关阅读

技术问答最新文章