更新时间:2023-11-07 10:43:46
用正则表达式解析html是一个坏主意,在这里是一个解决方案,就是这样做的:)
编辑:只是要清楚:这不是一个有效的解决方案,它是作为一个练习,对输入字符串的宽松假设,并且应该以一粒盐来进行。阅读上面的链接,看看为什么使用正则表达式解析HTML永远不能完成。
$ b
函数htmlSubstring(s,n ){
var m,r = /(<(> \s] *)[^>]> / g,
stack = [],
lasti = 0,
result ='';
//为每个标签,而我们没有足够的字符
while((m = r.exec(s))&& n){
/ /获得最后一个标签和这个之间的文本子字符串
var temp = s.substring(lasti,m.index).substr(0,n);
//追加到结果并计算添加的字符数
result + = temp;
n - = temp.length;
lasti = r.lastIndex;
if(n){
result + = m [0];
if(m [1] .indexOf('/')=== 0){
//如果这是一个结束标记,比弹出堆栈(不包含错误的html)
stack.pop();
} else if(m [1] .lastIndexOf('/')!== m [1] .length - 1){
//如果这不是一个自闭标签而不是推入堆栈
stack.push(m [1]);
$ b //如果需要,添加字符串的其余部分(这里没有更多标签)
result + = s.substr(lasti,n);
//修复未关闭的标记
while(stack.length){
result + ='< /'+ stack.pop()+'>';
}
返回结果;
$ b
示例: http://jsfiddle.net/danmana/5mNNU/
注意:帕特里克dw的解决方案可能会对于不好的html更安全,但我不确定它处理空白的效果如何。
Do you have solution to substring text with HTML tags in Javascript?
For example:
var str = 'Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, consectetur adipiscing elit.'
html_substr(str, 20)
// return Lorem ipsum <a href="#">dolor <strong>si</strong></a>
html_substr(str, 30)
// return Lorem ipsum <a href="#">dolor <strong>sit</strong> amet</a>, co
Taking into consideration that parsing html with regex is a bad idea, here is a solution that does just that :)
EDIT: Just to be clear: This is not a valid solution, it was meant as an exercise that made very lenient assumptions about the input string, and as such should be taken with a grain of salt. Read the link above and see why parsing html with regex can never be done.
function htmlSubstring(s, n) {
var m, r = /<([^>\s]*)[^>]*>/g,
stack = [],
lasti = 0,
result = '';
//for each tag, while we don't have enough characters
while ((m = r.exec(s)) && n) {
//get the text substring between the last tag and this one
var temp = s.substring(lasti, m.index).substr(0, n);
//append to the result and count the number of characters added
result += temp;
n -= temp.length;
lasti = r.lastIndex;
if (n) {
result += m[0];
if (m[1].indexOf('/') === 0) {
//if this is a closing tag, than pop the stack (does not account for bad html)
stack.pop();
} else if (m[1].lastIndexOf('/') !== m[1].length - 1) {
//if this is not a self closing tag than push it in the stack
stack.push(m[1]);
}
}
}
//add the remainder of the string, if needed (there are no more tags in here)
result += s.substr(lasti, n);
//fix the unclosed tags
while (stack.length) {
result += '</' + stack.pop() + '>';
}
return result;
}
Example: http://jsfiddle.net/danmana/5mNNU/
Note: patrick dw's solution may be safer regarding bad html, but I'm not sure how well it handles white spaces.