How can I extract paragraphs from html file (portion between <p> </p>) containing a specific word using R? -

for exemple html code :

<p>hello world</p>  <p>the weather fine today</p>  <p>it fine in lot of places in world<p>

for key word "world" result :

hello world  fine in lot of places in world

oh, code-writing service. huh. perhaps can xpath , not spin needless cycles in r:

library(xml2) library(rvest)  doc_txt <- "<p>hello world</p> <p>the weather fine today</p> <p>it fine in lot of places in world<p>"  doc <- read_html(doc_txt)  xml_text(xml_nodes(doc, xpath="//p[text()[contains(.,'world')]]"))  ## [1] "hello world"                                 ## [2] "it fine in lot of places in world"

similar idiom work in xml package if can't level hadleyverse:

library(xml)  xdoc <- htmlparse(doc_txt) xpathsapply(xdoc, "//p[text()[contains(.,'world')]]", xmlvalue)  ## [1] "hello world"                                 ## [2] "it fine in lot of places in world"

Club Open

Search This Blog

How can I extract paragraphs from html file (portion between <p> </p>) containing a specific word using R? -

Comments

Post a Comment