i happily scrapping property data www.century21.com python requests
, beautifulsoup
. there pagination in site , able scrap results of first page, when tried same second page, got data of first page output.
here example of first page results: http://www.century21.com/real-estate/ada-oh/lcohada/#t=0&s=0
and here results of second page same search term: http://www.century21.com/real-estate/ada-oh/lcohada/#t=0&s=10
i noticed when manually click second url open in browser, results of first url showing few seconds , page seems load , show results of second page.
as can imagine, python request
grabbing results of first load of second page happens same results of first page. same if request third page results, fourth , on.
below code. if run it, print address of first property of first page twice.
any idea how grab correct page results?
from bs4 import beautifulsoup import requests page1=requests.get("http://www.century21.com/real-estate/ada-oh/lcohada/#t=0&s=0") c1=page1.content soup1=beautifulsoup(c1,"html.parser").find_all("div",{"class":"propertyrow"})[0].find_all("span",{"class":"propaddresscollapse"})[0].text page2=requests.get("http://www.century21.com/real-estate/ada-oh/lcohada/#t=0&s=10") c2=page2.content soup2=beautifulsoup(c2,"html.parser").find_all("div",{"class":"propertyrow"})[0].find_all("span",{"class":"propaddresscollapse"})[0].text print(soup1) print(soup2)
make requests "search.c21" endpoint, html string "list" key , parse it:
from bs4 import beautifulsoup import requests page1 = requests.get("http://www.century21.com/search.c21?lid=cohada&t=0&s=0&subview=searchview.allsubview") c1 = page1.json()["list"] soup1 = beautifulsoup(c1, "html.parser").find_all("div", {"class": "propertyrow"})[0].find_all("span", { "class": "propaddresscollapse"})[0].text page2 = requests.get("http://www.century21.com/search.c21?lid=cohada&t=0&s=10&subview=searchview.allsubview") c2 = page2.json()["list"] soup2 = beautifulsoup(c2, "html.parser").find_all("div", {"class": "propertyrow"})[0].find_all("span", { "class": "propaddresscollapse"})[0].text print(soup1) print(soup2)
prints:
5489 sr 235 202 w highland ave
Comments
Post a Comment