javascript - How to build hierarchical objects with siblings tags using jquery selectors -


i have below html snippet. want web scraping page topics , subtopics , store in objects.

the desired result something:

{ 'topic': 'java basics',  'subtopics':['define scope of variables', 'define structure of java class', ...] } 

i trying make work jsdom node.js , jquery:

var jsdom = require('jsdom'); var fs = require("fs");   var topicos = fs.readfilesync("topic.html", "utf-8");      jsdom.env(topicos, ["http://code.jquery.com/jquery.js"], function (error, window) {         var $ = window.$;         var length = $('div ~ ').each(function () {             //???             var topic = $(this);             var text = topic.text();                              console.log(text.trim())         });     }) 

but due lack of experience in jquery, not able organize hierarchy properly.

html snippet:

<div>     <strong>java basics&nbsp;</strong></div> <ul>     <li>         define scope of variables&nbsp;</li>     <li>         define structure of java class     </li>     <li>         create executable java applications main method; run java program command line; including         console output.     </li>     <li>         import other java packages make them accessible in code     </li>     <li>         compare , contrast features , components of java such as:         platform independence, object orientation, encapsulation, etc.     </li> </ul> <div>     <strong>working java data types&nbsp;</strong></div> <ul>     <li>         declare , initialize variables (including casting of primitive data types)     </li>     <li>         differentiate between object reference variables , primitive variables     </li>     <li>         know how read or write object fields     </li>     <li>         explain object's lifecycle (creation, "dereference reassignment" , garbage collection)     </li>     <li>         develop code uses wrapper classes such boolean, double, , integer. &nbsp;</li> </ul>  ... 

here working snippet fiddle

var topicos = [];  jquery('div').each(function(){ var data = {}; var jthis = jquery(this);   data.topic = jthis.find('strong').text();   data.subtopics = [];   jthis.next('ul').find('li').each(function(){   var jthis = jquery(this);     data.subtopics.push(jthis.text());   }); topicos.push(data); });  console.log(topicos); 

but highly recommend add classes markup , use them selectors instead of tag-names:

<div class="js-topic-data">   <div>     <strong class="js-topic">java basics&nbsp;</strong>   </div>   <ul>     <li class="js-sub-topic">        define scope of variables&nbsp;</li>     <li>   </ul> </div> 

then like:

jquery('.js-topic-data').each(function(){ var data = {}; var jthis = jquery(this);   data.topic = jthis.find('.js-topic').text();   data.subtopics = [];   jthis.next('.js-sub-topic').each(function(){   var jthis = jquery(this);     data.subtopics.push(jthis.text());   }); topicos.push(data); }); 

which more robust markup changes etc


Comments