且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Java中的字符串xml节点中转义xml字符

更新时间:2022-06-10 04:32:05

您可以使用正则表达式匹配来查找尖括号之间的所有字符串,并循环遍历/处理每个字符串。在此示例中,我使用了 Apache Commons Lang 进行XML转义。 / p>

You could use regular expression matching to find all the strings between angled brackets, and loop through/process each of those. In this example I've used the Apache Commons Lang to do the XML escaping.

public String sanitiseXml(String xml)
{
    // Match the pattern <something>text</something>
    Pattern xmlCleanerPattern = Pattern.compile("(<[^/<>]*>)([^<>]*)(</[^<>]*>)");

    StringBuilder xmlStringBuilder = new StringBuilder();

    Matcher matcher = xmlCleanerPattern.matcher(xml);
    int lastEnd = 0;
    while (matcher.find())
    {
        // Include any non-matching text between this result and the previous result
        if (matcher.start() > lastEnd) {
            xmlStringBuilder.append(xml.substring(lastEnd, matcher.start()));
        }
        lastEnd = matcher.end();

        // Sanitise the characters inside the tags and append the sanitised version
        String cleanText = StringEscapeUtils.escapeXml10(matcher.group(2));
        xmlStringBuilder.append(matcher.group(1)).append(cleanText).append(matcher.group(3));
    }
    // Include any leftover text after the last result
    xmlStringBuilder.append(xml.substring(lastEnd));

    return xmlStringBuilder.toString();
}

这会查找< something> text< / something>的匹配项,并捕获标签名称和包含的文本,对包含的文本进行消毒,然后将其放回原处。

This looks for matches of <something>text</something>, captures the tag names and contained text, sanitises the contained text, and then puts it back together.