且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用 Nokogiri 解析表

更新时间:2023-12-05 08:59:52

Use:

td//text()[normalize-space()]

This selects all non-white-space-only text node descendents of any td child of the current node (the tr already selected in your code).

Or if you want to select all text-node descendents, regardles whether they are white-space-only or not:

td//text()

UPDATE:

The OP has signaled in a comment that he is getting an unwanted td with content just a ' ' (aka non-breaking space).

To exclude also tds whose content is composed only of (one or more) nbsp characters, use:

td//text()[translate(normalize-space(), ' ', '')]

登录 关闭
扫码关注1秒登录
使用 Nokogiri 解析表
发送“验证码”获取 | 15天全站免登陆