在Javascript中使用HTML标签的子字符串文本

更新时间：2023-11-07 10:43:46

编辑：只是要清楚：这不是一个有效的解决方案，它是作为一个练习，对输入字符串的宽松假设，并且应该以一粒盐来进行。阅读上面的链接，看看为什么使用正则表达式解析HTML永远不能完成。
$ b

 函数htmlSubstring（s，n ）{
 var m，r = /（<（> \s] *）[^>]> / g，
 stack = []，
 lasti = 0，
 result =''; 

 //为每个标签，而我们没有足够的字符
 while（（m = r.exec（s））&& n）{
 / /获得最后一个标签和这个之间的文本子字符串
 var temp = s.substring（lasti，m.index）.substr（0，n）; 
 //追加到结果并计算添加的字符数
 result + = temp; 
 n  -  = temp.length; 
 lasti = r.lastIndex; 

 if（n）{
 result + = m [0]; 
 if（m [1] .indexOf（'/'）=== 0）{
 //如果这是一个结束标记，比弹出堆栈（不包含错误的html）
 stack.pop（）; 
} else if（m [1] .lastIndexOf（'/'）！== m [1] .length  -  1）{
 //如果这不是一个自闭标签而不是推入堆栈
 stack.push（m [1]）; 



 $ b //如果需要，添加字符串的其余部分（这里没有更多标签）
 result + = s.substr（lasti，n）; 

 //修复未关闭的标记
 while（stack.length）{
 result + ='< /'+ stack.pop（）+'>'; 
} 

返回结果; 

 $ b

示例： http://jsfiddle.net/danmana/5mNNU/

注意：帕特里克dw的解决方案可能会对于不好的html更安全，但我不确定它处理空白的效果如何。

Do you have solution to substring text with HTML tags in Javascript?

For example:

var str = 'Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, consectetur adipiscing elit.'

html_substr(str, 20)
// return Lorem ipsum <a href="#">dolor <strong>si</strong></a>

html_substr(str, 30)
// return Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, co

Taking into consideration that parsing html with regex is a bad idea, here is a solution that does just that :)

EDIT: Just to be clear: This is not a valid solution, it was meant as an exercise that made very lenient assumptions about the input string, and as such should be taken with a grain of salt. Read the link above and see why parsing html with regex can never be done.

function htmlSubstring(s, n) {
    var m, r = /<([^>\s]*)[^>]*>/g,
        stack = [],
        lasti = 0,
        result = '';

    //for each tag, while we don't have enough characters
    while ((m = r.exec(s)) && n) {
        //get the text substring between the last tag and this one
        var temp = s.substring(lasti, m.index).substr(0, n);
        //append to the result and count the number of characters added
        result += temp;
        n -= temp.length;
        lasti = r.lastIndex;

        if (n) {
            result += m[0];
            if (m[1].indexOf('/') === 0) {
                //if this is a closing tag, than pop the stack (does not account for bad html)
                stack.pop();
            } else if (m[1].lastIndexOf('/') !== m[1].length - 1) {
                //if this is not a self closing tag than push it in the stack
                stack.push(m[1]);
            }
        }
    }

    //add the remainder of the string, if needed (there are no more tags in here)
    result += s.substr(lasti, n);

    //fix the unclosed tags
    while (stack.length) {
        result += '</' + stack.pop() + '>';
    }

    return result;

}

Example: http://jsfiddle.net/danmana/5mNNU/

Note: patrick dw's solution may be safer regarding bad html, but I'm not sure how well it handles white spaces.

上一篇 : ：如何存储然后打印二维字符/字符串数组?下一篇 : 我怎么能德code字符串？

在Javascript中使用HTML标签的子字符串文本

相关阅读

推荐文章