How can I extract paragraphs from html file (portion between <p> </p>) containing a specific word using R? -
for exemple html code :
<p>hello world</p> <p>the weather fine today</p> <p>it fine in lot of places in world<p>
for key word "world" result :
hello world fine in lot of places in world
oh, code-writing service. huh. perhaps can xpath , not spin needless cycles in r:
library(xml2) library(rvest) doc_txt <- "<p>hello world</p> <p>the weather fine today</p> <p>it fine in lot of places in world<p>" doc <- read_html(doc_txt) xml_text(xml_nodes(doc, xpath="//p[text()[contains(.,'world')]]")) ## [1] "hello world" ## [2] "it fine in lot of places in world"
similar idiom work in xml
package if can't level hadleyverse:
library(xml) xdoc <- htmlparse(doc_txt) xpathsapply(xdoc, "//p[text()[contains(.,'world')]]", xmlvalue) ## [1] "hello world" ## [2] "it fine in lot of places in world"
Comments
Post a Comment