i'm working on small PHP project a simple crawler every thing is good but i get an error message with some websites that i try to crawl since i follow robots.txt rules and i use sleep function also i use IP rotating
The error message is : 429 too many request.
this is my PHP code :
$proxies = array(); // Declaring an array to store the proxy list
$proxies[] = "xx.xxxx.xxx.xx";
$proxies[] = "xx.xxxx.xxx.xx";
$proxies[] = "xx.xxxx.xxx.xx";
if (isset($proxies)) // If the $proxies array contains items, then
$proxy = $proxies[array_rand($proxies)]; // Select a random proxy from the array and assign to $proxy variable
$curl = curl_init();
if (isset($proxy)) // If the $proxy variable is set, then
curl_setopt($curl, CURLOPT_PROXY, $proxy); // Set CURLOPT_PROXY with proxy in $proxy variable
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
$str = curl_exec($curl);
sleep(20);
curl_close($curl);
$content = str_get_html(str_replace(':///','://', $str));
$endTime = microtime(true);
$elapsedTime = $endTime - $startTime;
foreach($content->find('a') as $element){
$href = $element->href;
if(strpos($href, 'http') !== 0)
$href = $url;
$this->crawler($robotsSrc, $href, $depth -1);
}
Comments
Post a Comment