且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

正则表达式以匹配文件中的特定功能及其参数

更新时间:2022-12-23 15:04:24

注意: 如果您不熟悉递归,请阅读此答案.

Note: Read this answer if you're not familiar with recursion.

第1部分:匹配特定功能

谁说正则表达式不能模块化? PCRE正则表达式可助您一臂之力!

Part 1: match specific functions

Who said that regex can't be modular? Well PCRE regex to the rescue!

~                      # Delimiter
(?(DEFINE)             # Start of definitions
   (?P<str_double_quotes>
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote
   )

   (?P<str_single_quotes>
      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote
   )

   (?P<brackets>
      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
         )*
      \)
   )
)                                 # End of definitions
# Let's start matching for real now:
_n?                               # Match _ or _n
\s*                               # Optional white spaces
(?P<results>(?&brackets))         # Recurse/use the brackets pattern and put it in the results group
~sx

s用于将换行符与.匹配,并且x修饰符用于此花哨的空格和对正则表达式的注释.

The s is for matching newlines with . and the x modifier is for this fancy spacing and commenting of our regex.

在线正则表达式演示 > 在线php演示

由于我们的正则表达式也将获得左括号和右括号(),因此我们可能需要对其进行过滤.我们将在结果上使用preg_replace()

Since our regex will also get the opening and closing brackets (), we might need to filter them. We will use preg_replace() on the results:

~           # Delimiter
^           # Assert begin of string
\(          # Match an opening bracket
\s*         # Match optional whitespaces
|           # Or
\s*         # Match optional whitespaces
\)          # Match a closing bracket
$           # Assert end of string
~x

> 在线php演示

所以这是另一个模块化正则表达式,您甚至可以添加自己的语法:

So here's another modular regex, you could even add your own grammar:

~                      # Delimiter
(?(DEFINE)             # Start of definitions
   (?P<str_double_quotes>
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote
   )

   (?P<str_single_quotes>
      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote
   )

   (?P<array>
      Array\s*
      (?&brackets)
   )

   (?P<variable>
      [^\s,()]+        # I don't know the exact grammar for a variable in ECMAScript
   )

   (?P<brackets>
      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            (?&array)             # Recurse/use the array pattern
            |                     # Or
            (?&variable)          # Recurse/use the array pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
         )*
      \)
   )
)                                 # End of definitions
# Let's start matching for real now:
(?&array)
|
(?&variable)
|
(?&str_double_quotes)
|
(?&str_single_quotes)
~xis

我们将循环并使用preg_match_all().最终代码如下所示:

We will loop and use preg_match_all(). The final code would look like this:

$functionPattern = <<<'regex'
~                      # Delimiter
(?(DEFINE)             # Start of definitions
   (?P<str_double_quotes>
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote
   )

   (?P<str_single_quotes>
      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote
   )

   (?P<brackets>
      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
         )*
      \)
   )
)                                 # End of definitions
# Let's start matching for real now:
_n?                               # Match _ or _n
\s*                               # Optional white spaces
(?P<results>(?&brackets))         # Recurse/use the brackets pattern and put it in the results group
~sx
regex;


$argumentsPattern = <<<'regex'
~                      # Delimiter
(?(DEFINE)             # Start of definitions
   (?P<str_double_quotes>
      (?<!\\)          # Not escaped
      "                # Match a double quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      "                # Match the ending double quote
   )

   (?P<str_single_quotes>
      (?<!\\)          # Not escaped
      '                # Match a single quote
      (?:              # Non-capturing group
         [^\\]         # Match anything not a backslash
         |             # Or
         \\.           # Match a backslash and a single character (ie: an escaped character)
      )*?              # Repeat the non-capturing group zero or more times, ungreedy/lazy
      '                # Match the ending single quote
   )

   (?P<array>
      Array\s*
      (?&brackets)
   )

   (?P<variable>
      [^\s,()]+        # I don't know the exact grammar for a variable in ECMAScript
   )

   (?P<brackets>
      \(                          # Match an opening bracket
         (?:                      # A non capturing group
            (?&str_double_quotes) # Recurse/use the str_double_quotes pattern
            |                     # Or
            (?&str_single_quotes) # Recurse/use the str_single_quotes pattern
            |                     # Or
            (?&array)             # Recurse/use the array pattern
            |                     # Or
            (?&variable)          # Recurse/use the array pattern
            |                     # Or
            [^()]                 # Anything not a bracket
            |                     # Or
            (?&brackets)          # Recurse the bracket pattern
         )*
      \)
   )
)                                 # End of definitions
# Let's start matching for real now:
(?&array)
|
(?&str_double_quotes)
|
(?&str_single_quotes)
|
(?&variable)
~six
regex;

$input = <<<'input'
_  ("foo") // want "foo"
_n("bar", "baz", 42); // want "bar", "baz", 42
_n(domain, "bux", var); // want domain, "bux", var
_( "one (optional)" ); // want "one (optional)"
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) // could have on the same line two calls..

// misleading cases
_n("foo (")
_n("foo (\)", 'foo)', aa)
_n( Array(1, 2, 3), Array(")",   '(')   );
_n(function(foo){return foo*2;}); // Is this even valid?
_n   ();   // Empty
_ (   
    "Foo",
    'Bar',
    Array(
        "wow",
        "much",
        'whitespaces'
    ),
    multiline
); // PCRE is awesome
input;

if(preg_match_all($functionPattern, $input, $m)){
    $filtered = preg_replace(
        '~          # Delimiter
        ^           # Assert begin of string
        \(          # Match an opening bracket
        \s*         # Match optional whitespaces
        |           # Or
        \s*         # Match optional whitespaces
        \)          # Match a closing bracket
        $           # Assert end of string
        ~x', // Regex
        '', // Replace with nothing
        $m['results'] // Subject
    ); // Getting rid of opening & closing brackets

    // Part 3: extract arguments:
    $parsedTree = array();
    foreach($filtered as $arguments){   // Loop
        if(preg_match_all($argumentsPattern, $arguments, $m)){ // If there's a match
            $parsedTree[] = array(
                'all_arguments' => $arguments,
                'branches' => $m[0]
            ); // Add an array to our tree and fill it
        }else{
            $parsedTree[] = array(
                'all_arguments' => $arguments,
                'branches' => array()
            ); // Add an array with empty branches
        }
    }

    print_r($parsedTree); // Let's see the results;
}else{
    echo 'no matches';
}

> 在线php演示

您可能想创建一个递归函数来生成完整的树. 查看此答案.

You might want to create a recursive function to generate a full tree. See this answer.

您可能会注意到function(){}部分没有正确解析.我将其作为读者的练习:)

You might notice that the function(){} part isn't parsed correctly. I will let that as an exercise for the readers :)