且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

删除字符串中分隔符之间的文本(使用正则表达式?)

更新时间:2023-02-17 22:26:58

简单的正则表达式是:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "(\[.*\])|(".*")|('.*')|(\(.*\))";
string output = Regex.Replace(input, regex, "");

至于以自定义方式构建正则表达式,您只需要构建部分:

As for doing it a custom way where you want to build up the regex you would just need to build up the parts:

('.*')  // example of the single quote check

然后将每个单独的正则表达式部分与一个 OR(正则表达式中的 |)连接,就像我原来的例子一样.构建正则表达式字符串后,只需运行一次即可.关键是让正则表达式成为单一检查,因为对一个项目执行多个正则表达式匹配,然后遍历大量项目可能会导致性能显着下降.

Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.

在我的第一个示例中,它将代替以下行:

In my first example that would take the place of the following line:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "Your built up regex here";
string sOutput = Regex.Replace(input, regex, "");

我相信有人会发布一个很酷的 linq 表达式来基于要匹配的分隔符对象数组或其他东西来构建正则表达式.

I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.