nutch with selenium to fetch javascript content

Was trying to enable selenium plugin for crawling dynamic rendered content of javascript for the websites having https enabled (https://nseindia.com/live_market/dynaContent/live_analysis/top_gainers_losers.htm?cat=L, https://nseindia.com/live_market/dynaContent/live_analysis/top_gainers_losers.htm?cat=G).

I have followed all the instructions mentioned @ https://github.com/apache/nutch/tree/master/src/plugin/protocol-selenium. keep the plugin, protocol-httpclient along with protocol-selenium, in nutch-site.xml @NUTCH_HOME/conf as the crawling websites are of https. Enabled selenium.take.screenshot property and the selenium is running as well.

When I started crawling, I don’t see javascript data fetched from the websites as well selenium screen captured.

Had any one tried the same, pls do let me know, Thanks!

Apache nutch version: 1.12 FireFox version: 60.3.0 Selenium version: 3.4.0 (standalone).

All Questions Answered

Search This Blog

Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

nutch with selenium to fetch javascript content

Comments

Post a Comment