且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

用Nokogiri解析div元素

更新时间:2023-12-05 09:04:16

我所有关于在nokogiri中使用css选择器。

  doc = Nokogiri :: HTML(open(http://somewebsite.com/# {内容


The following code successfully extracts tid and term data:

(answered generously by Uri Agassi)

for i in (1..10)
  doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
  tids =  doc.xpath("//div[contains(concat(' ', @class, ' '),' thing ')]").collect {|node|    node['data-thing-id']}
  terms = doc.xpath("//div[contains(concat(' ', @class, ' '),' col_a ')]").collect {|node| node.text.strip }

  tids.zip(terms).each do |tid, term|
    puts tid+" "+term
  end
end

from the following sample html:

<div class="thing text-text" data-thing-id="29966403">
  <div class="thinguser"><i class="ico ico-water ico-blue"></i>
    <div class="status">in 7 days
    </div>
  </div>
  <div class="ignore-ui pull-right"><input type="check box" >
  </div>
  <div class="col_a col text">
    <div class="text">foobar
    </div>
  </div>
  <div class="col_b col text">
    <div class="text">foobar desc
    </div>
  </div>
</div>

If I wanted to pull status (the "in 7 days" string) info in the same fashion, what's the best way to do that? I can't seem to figure it out.

Would someone be kind enough to explain in detail what the tids and terms assignment lines are actually doing? I don't get it and the Nokogiri documentation doesn't seem to cover this.

Big thanks in advance.

~Chris

I'm all about using css selectors in nokogiri. Something like this should work.

doc = Nokogiri::HTML(open("http://somewebsite.com/#{i}/"))
seven_days = doc.css('status').content