this question has answer here:
i want extract nodes html or xml file not commented out. following regex currect approach.
my regex
/<span.*?>([\s\s]*?)<\/span>/gi
here example xml
<div> <p> <span style="font-size: 20px;">hello</span> <span style="font-size: 20px;">world</span> </p> <p> <!-- <span>hello</span> <span>world</span> --> </p> <p> <span>hello</span> <span>world</span> </p> <!-- <p> <span>hello</span> <span>world</span> </p> -->
i appreciate help.
best regards, michael
well, can remove comments decent parser (domdocument
in case) , analyze remaining part afterwards. consider following code (mind changed numbers in hello world
strings make clear being removed):
<?php $html = '<div> <p> <span style="font-size: 20px;">hello</span> <span style="font-size: 20px;">world</span> </p> <p> <!-- <span>hello2</span> <span>world2</span> --> </p> <p> <span>hello3</span> <span>world3</span> </p> <!-- <p> <span>hello4</span> <span>world4</span> </p> --> </div> '; $dom = new domdocument; $dom->loadhtml($html); $xpath = new domxpath($dom); foreach ($xpath->query('//comment()') $comment) $comment->parentnode->removechild($comment); $body = $xpath->query('//body')->item(0); echo $dom->savexml($body); # yields hello world , hello world3 ?>
now commented tags have been removed. obviously, can fiddle around xpath
more precise.
Comments
Post a Comment