且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

替换多个字符串的更好方法 - C# 中的混淆

更新时间:2023-02-12 18:44:20

不要在一个巨大的字符串中进行替换(这意味着您要移动大量数据),而是要遍历整个字符串并一次替换一个标记.

Instead of doing replacements in a huge string (which means that you move around a lot of data), work through the string and replace a token at a time.

为每个标记创建一个包含下一个索引的列表,找到第一个标记,然后将文本复制到标记到结果,然后替换标记.然后检查该标记的下一次出现在字符串中的位置以保持列表是最新的.重复直到找不到更多的标记,然后将剩余的文本复制到结果中.

Make a list containing the next index for each token, locate the token that is first, then copy the text up to the token to the result followed by the replacement for the token. Then check where the next occurance of that token is in the string to keep the list up to date. Repeat until there are no more tokens found, then copy the remaining text to the result.

我做了一个简单的测试,这个方法在 208 毫秒内对 1000000 个字符串做了 125000 次替换.

I made a simple test, and this method did 125000 replacements on a 1000000 character string in 208 milliseconds.

Token 和 TokenList 类:

Token and TokenList classes:

public class Token {

    public string Text { get; private set; }
    public string Replacement { get; private set; }
    public int Index { get; set; }

    public Token(string text, string replacement) {
        Text = text;
        Replacement = replacement;
    }

}

public class TokenList : List<Token>{

    public void Add(string text, string replacement) {
        Add(new Token(text, replacement));
    }

    private Token GetFirstToken() {
        Token result = null;
        int index = int.MaxValue;
        foreach (Token token in this) {
            if (token.Index != -1 && token.Index < index) {
                index = token.Index;
                result = token;
            }
        }
        return result;
    }

    public string Replace(string text) {
        StringBuilder result = new StringBuilder();
        foreach (Token token in this) {
            token.Index = text.IndexOf(token.Text);
        }
        int index = 0;
        Token next;
        while ((next = GetFirstToken()) != null) {
            if (index < next.Index) {
                result.Append(text, index, next.Index - index);
                index = next.Index;
            }
            result.Append(next.Replacement);
            index += next.Text.Length;
            next.Index = text.IndexOf(next.Text, index);
        }
        if (index < text.Length) {
            result.Append(text, index, text.Length - index);
        }
        return result.ToString();
    }

}

用法示例:

string text =
    "This is a text with some words that will be replaced by tokens.";

var tokens = new TokenList();
tokens.Add("text", "TXT");
tokens.Add("words", "WRD");
tokens.Add("replaced", "RPL");

string result = tokens.Replace(text);
Console.WriteLine(result);

输出:

This is a TXT with some WRD that will be RPL by tokens.

注意:此代码不处理重叠标记.例如,如果您有令牌菠萝"和苹果",则代码无法正常工作.

Note: This code does not handle overlapping tokens. If you for example have the tokens "pineapple" and "apple", the code doesn't work properly.


要使代码与重叠标记一起工作,请替换以下行:


To make the code work with overlapping tokens, replace this line:

next.Index = text.IndexOf(next.Text, index);

使用此代码:

foreach (Token token in this) {
    if (token.Index != -1 && token.Index < index) {
        token.Index = text.IndexOf(token.Text, index);
    }
}