且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

RegEx 在 Powershell 中匹配两个字符串之间的字符串

更新时间:2023-02-22 12:26:29

如果 -match 返回整行,则含义是-match 操作的 LHS 是一个数组这反过来表明您使用了 Get-Content没有-Raw,它产生的输入是数组在这种情况下-match 充当过滤器.

If -match is returning a whole line, the implication is that the LHS of your -match operation is an array, which in turn suggests that you used Get-Content without -Raw, which yields the input as an array of lines, in which case -match acts as a filter.

相反,使用 Get-Content -Raw 将您的文件作为单行多行字符串读取;使用标量 LHS,
-match然后返回一个[bool]
匹配操作的结果报告在自动变量 $Matches(一个哈希表,其 0 条目包含整体匹配,1 是什么第一个捕获组匹配,...):

Instead, read your file as a single, multi-line string with Get-Content -Raw; with a scalar LHS,
-match then returns a [bool]
, and the results of the matching operation are reported in automatic variable $Matches (a hashtable whose 0 entry contains the overall match, 1 what the 1st capture group matched, ...):

# Read file as a whole, into a single, multi-line string.
$doc = Get-Content -Raw file.txt 

if ($doc -match '(?<=hostkey=")(.*)(?=")') {
   # Output what the 1st capture group captured
   $Matches[1]
}

使用您的样本输入,上述结果
ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

With your sample input, the above yields
ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

然后您可以扩展该方法以捕获多个令牌,在这种情况下,我建议使用 named 捕获组 ((?...));以下示例使用此类命名捕获组来提取多个感兴趣的标记:

You can then extend the approach to capture multiple tokens, in which case I suggest using named capture groups ((?<name>...)); the following example uses such named capture groups to extract several of the tokens of interest:

if ($doc -match '(?<=sftp://)(?<username>[^:]+):(?<password>[^@]+)@(?<host>[^:]+)'){
  # Output the named capture-group values.
  # Note that index notation (['username']) and property
  # notation (.username) can be used interchangeably.
  $Matches.username
  $Matches.password
  $Matches.host
}

使用您的样本输入,上述结果:

With your sample input, the above yields:

username
password
host.name.net

您可以扩展上述内容以捕获所有感兴趣的令牌.
请注意,. 默认情况下不匹配 \n(换行符)字符.

You can extend the above to capture all tokens of interest.
Note that . by default doesn't match \n (newline) characters.

提取这么多标记可能会导致难以阅读的复杂正则表达式,在这种情况下,x (IgnoreWhiteSpace) 正则表达式选项可以提供帮助(作为内联选项,(?x) 在正则表达式的开头):

Extracting that many tokens can result in a complex regex that is hard to read, in which case the x (IgnoreWhiteSpace) regex option, can help (as an inline option, (?x) at the start of the regex):

if ($doc -match '(?x)
    (?<=sftp://)(?<username>[^:]+)
    :(?<password>[^@]+)
    @(?<host>[^:]+)
    :(?<port>\d+)
    \s+hostkey="(?<sshkey>.+?)"
    \n+get\ File\*\.txt\ (?<localpath>.+)
    \nmv\ File\*\.txt\ (?<remotepath>.+)
  '){
    # Output the named capture-group values.
    $Matches.GetEnumerator() | ? Key -ne 0
}

注意在匹配时如何忽略用于使正则表达式更具可读性(将其扩展到多行)的空格,而输入中要匹配的空格必须转义(例如,要匹配单个空格,\ [ ],或 \s 以匹配任何空白字符.)

Note how the whitespace used for making the regex more readable (spreading it across multiple lines) is ignored while matching, whereas whitespace to be matched in the input must be escaped (e.g., to match a single space, or [ ], or \s to match any whitespace char.)

使用您的示例输入,上面的结果如下:

With your sample input, the above yields the following:

Name                           Value
----                           -----
host                           host.name.net
localpath                      \local\path\Client\File.txt
port                           22
sshkey                         ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
remotepath                     /remote/archive/
password                       password
username                       username

请注意,捕获组乱序的原因是 $Matches 是一个 哈希表(类型为 [hashtable]),其键枚举顺序是一个实现工件:不保证特定的枚举顺序.

Note that the reason the capture groups are out of order is that $Matches is a hash table (of type [hashtable]), whose key enumeration order is an implementation artifact: no particular enumeration order is guaranteed.

然而,随机访问捕获组工作得很好;例如,$Matches.port 将返回 22.

However, random access to capture groups works just fine; e.g., $Matches.port will return 22.