ruby on rails - How can I scrape HTML with Nokogiri without tags? -


i need parse local html file using nokogiri, html doesn't have <div>s classes. starts text.

this html:

high prices in <a href="example 1">example 1</a><br> low prices in <a href="example 2">example 2</a><br> 

in case need "high" , "low", , "example 1", , "example 2".

how can text, no elements? tutorials saw, needs <div class= ...> text.

doc.xpath('//a/@href').each |node|   #get performance indicators       link = node.text        @test << entry2.new(link)      end      @title = doc.xpath('//p').text.scan(/^(high|low)/) 

my view:

   <% @test.each |entry| %>       <p>  <%= entry.link %></p>  <% end %>   <% @title.each |f| %>     <p>  <%= f %></p>   <% end %> 

and output this:

example 1example 2  [["high"], ["low"]] 

it's listing @ same time instead of 1 one. how can change nokogiri code in output?

high prices in example 1 low prices in example 2 

well, nokogiri wrap string in implicit <html><body><p>... text in single <p>

so yes, able links in structured form with:

doc.xpath "//a" 

the "high" , "low" strings in single blob of text. need pull them out regex depend lot on requirements , data, here's regex you're showing , asking for:

doc.xpath('//p').text.scan(/^(high|low)/) 

i can't sure how helpful actual requirements, gives direction take.


Comments