i have below html snippet. want web scraping page topics , subtopics , store in objects.
the desired result something:
{ 'topic': 'java basics', 'subtopics':['define scope of variables', 'define structure of java class', ...] }
i trying make work jsdom node.js , jquery:
var jsdom = require('jsdom'); var fs = require("fs"); var topicos = fs.readfilesync("topic.html", "utf-8"); jsdom.env(topicos, ["http://code.jquery.com/jquery.js"], function (error, window) { var $ = window.$; var length = $('div ~ ').each(function () { //??? var topic = $(this); var text = topic.text(); console.log(text.trim()) }); })
but due lack of experience in jquery, not able organize hierarchy properly.
html snippet:
<div> <strong>java basics </strong></div> <ul> <li> define scope of variables </li> <li> define structure of java class </li> <li> create executable java applications main method; run java program command line; including console output. </li> <li> import other java packages make them accessible in code </li> <li> compare , contrast features , components of java such as: platform independence, object orientation, encapsulation, etc. </li> </ul> <div> <strong>working java data types </strong></div> <ul> <li> declare , initialize variables (including casting of primitive data types) </li> <li> differentiate between object reference variables , primitive variables </li> <li> know how read or write object fields </li> <li> explain object's lifecycle (creation, "dereference reassignment" , garbage collection) </li> <li> develop code uses wrapper classes such boolean, double, , integer. </li> </ul> ...
here working snippet fiddle
var topicos = []; jquery('div').each(function(){ var data = {}; var jthis = jquery(this); data.topic = jthis.find('strong').text(); data.subtopics = []; jthis.next('ul').find('li').each(function(){ var jthis = jquery(this); data.subtopics.push(jthis.text()); }); topicos.push(data); }); console.log(topicos);
but highly recommend add classes markup , use them selectors instead of tag-names:
<div class="js-topic-data"> <div> <strong class="js-topic">java basics </strong> </div> <ul> <li class="js-sub-topic"> define scope of variables </li> <li> </ul> </div>
then like:
jquery('.js-topic-data').each(function(){ var data = {}; var jthis = jquery(this); data.topic = jthis.find('.js-topic').text(); data.subtopics = []; jthis.next('.js-sub-topic').each(function(){ var jthis = jquery(this); data.subtopics.push(jthis.text()); }); topicos.push(data); });
which more robust markup changes etc
Comments
Post a Comment