更新时间:2023-02-03 09:30:07
所以,你说的是,你想替换 150,000 行中每行的 600 个字符串中的任何一个,并且你想每行运行一个替换操作?
So, what you're saying is that you want to replace any of 600 strings in each of 150,000 lines, and you want to run one replace operation per line?
是的,有一种方法可以做到,但在 PowerShell 中没有,至少我想不出一个方法.它可以在 Perl 中完成.
Yes, there is a way to do it, but not in PowerShell, at least I can't think of one. It can be done in Perl.
方法:
问题:
令人沮丧的是,PowerShell 没有在正则表达式替换调用之外公开匹配变量.它不适用于 -replace 运算符,也不适用于 [regex]::replace.
Frustratingly, PowerShell doesn't expose the match variables outside the regex replace call. It doesn't work with the -replace operator and it doesn't work with [regex]::replace.
在 Perl 中,您可以这样做,例如:
In Perl, you can do this, for example:
$string =~ s/(1|2|3)/@{[$1 + 5]}/g;
这会将整个字符串的数字 1、2 和 3 加 5,所以如果字符串是1224526123 [2] [6]",它就会变成6774576678 [7] [6]".
This will add 5 to the digits 1, 2, and 3 throughout the string, so if the string is "1224526123 [2] [6]", it turns into "6774576678 [7] [6]".
但是,在 PowerShell 中,这两种方法都失败了:
However, in PowerShell, both of these fail:
$string -replace '(1|2|3)',"$($1 + 5)"
[regex]::replace($string,'(1|2|3)',"$($1 + 5)")
在这两种情况下,$1 的计算结果为 null,表达式计算结果为普通的 old 5.替换中的匹配变量仅在结果字符串中有意义,即单引号字符串或任何双引号字符串计算为.它们基本上只是看起来像匹配变量的反向引用.当然,您可以在双引号字符串中的数字前引用 $ ,因此它将评估为相应的匹配组,但这违背了目的 - 它不能参与表达式.
In both cases, $1 evaluates to null, and the expression evaluates to plain old 5. The match variables in replacements are only meaningful in the resulting string, i.e. a single-quoted string or whatever the double-quoted string evaluates to. They're basically just backreferences that look like match variables. Sure, you can quote the $ before the number in a double-quoted string, so it will evaluate to the corresponding match group, but that defeats the purpose - it can't participate in an expression.
解决方案:
[此答案已根据原始答案进行了修改.它已被格式化以适合具有正则表达式元字符的匹配字符串.当然还有你的电视屏幕.]
如果您可以接受使用另一种语言,那么下面的 Perl 脚本非常有用:
If using another language is acceptable to you, the following Perl script works like a charm:
$filePath = $ARGV[0]; # Or hard-code it or whatever
open INPUT, "< $filePath";
open OUTPUT, '> C:\log.txt';
%replacements = (
'something0' => 'somethingelse0',
'something1' => 'somethingelse1',
'something2' => 'somethingelse2',
'something3' => 'somethingelse3',
'something4' => 'somethingelse4',
'something5' => 'somethingelse5',
'X:\Group_14\DACU' => '\\DACU$',
'.*[^xyz]' => 'oO{xyz}',
'moresomethings' => 'moresomethingelses'
);
foreach (keys %replacements) {
push @strings, qr/\Q$_\E/;
$replacements{$_} =~ s/\\/\\\\/g;
}
$pattern = join '|', @strings;
while (<INPUT>) {
s/($pattern)/$replacements{$1}/g;
print OUTPUT;
}
close INPUT;
close OUTPUT;
它搜索散列的键(=> 的左侧),并用相应的值替换它们.这是发生的事情:
It searches for the keys of the hash (left of the =>), and replaces them with the corresponding values. Here's what's happening:
顺便说一句,您可能已经注意到原始脚本的其他一些修改.在我最近的 PowerShell 踢球过程中,我的 Perl 收集了一些灰尘,再次查看时我发现有几件事可以做得更好.
BTW, you might have noticed several other modifications from the original script. My Perl has collected some dust during my recent PowerShell kick, and on a second look I noticed several things that could be done better.
while ()
一次读取文件一行.比将整个 150,000 行读入数组要明智得多,尤其是当您的目标是效率时.@{[$replacements{$1}]}
简化为$replacements{$1}
.Perl 没有像 PowerShell 的 $() 这样的插入表达式的内置方法,因此 @{[ ]} 用作解决方法 - 它创建一个文字数组包含表达式的一个元素.但我意识到,如果表达式只是一个标量变量,则没有必要(我将它作为初始测试的保留,当时我将计算应用于 $1 匹配变量).while (<INPUT>)
reads the file one line at a time. A lot more sensible than reading the entire 150,000 lines into an array, especially when your goal is efficiency.@{[$replacements{$1}]}
to $replacements{$1}
. Perl doesn't have a built-in way of interpolating expressions like PowerShell's $(), so @{[ ]} is used as a workaround - it creates a literal array of one element containing the expression. But I realized that it's not necessary if the expression is just a single scalar variable (I had it in there as a holdover from my initial testing, where I was applying calculations to the $1 match variable).