Was trying to enable selenium plugin for crawling dynamic rendered content of javascript for the websites having https enabled (https://nseindia.com/live_market/dynaContent/live_analysis/top_gainers_losers.htm?cat=L, https://nseindia.com/live_market/dynaContent/live_analysis/top_gainers_losers.htm?cat=G).
I have followed all the instructions mentioned @ https://github.com/apache/nutch/tree/master/src/plugin/protocol-selenium. keep the plugin, protocol-httpclient along with protocol-selenium, in nutch-site.xml @NUTCH_HOME/conf as the crawling websites are of https. Enabled selenium.take.screenshot property and the selenium is running as well.
When I started crawling, I don’t see javascript data fetched from the websites as well selenium screen captured.
Had any one tried the same, pls do let me know, Thanks!
Apache nutch version: 1.12 FireFox version: 60.3.0 Selenium version: 3.4.0 (standalone).
Comments
Post a Comment