PHPcrawlerigettoomanyrequest429

i'm working on small PHP project a simple crawler every thing is good but i get an error message with some websites that i try to crawl since i follow robots.txt rules and i use sleep function also i use IP rotating

The error message is : 429 too many request.

this is my PHP code :

$proxies = array(); // Declaring an array to store the proxy list

                $proxies[] = "xx.xxxx.xxx.xx";
                $proxies[] = "xx.xxxx.xxx.xx";
                $proxies[] = "xx.xxxx.xxx.xx";
                if (isset($proxies))  // If the $proxies array contains items, then
                    $proxy = $proxies[array_rand($proxies)];    // Select a random proxy from the array and assign to $proxy variable

                $curl = curl_init();
                if (isset($proxy))     // If the $proxy variable is set, then
                    curl_setopt($curl, CURLOPT_PROXY, $proxy);    // Set CURLOPT_PROXY with proxy in $proxy variable

                curl_setopt($curl, CURLOPT_URL, $url);  
                curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
                curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
                $str = curl_exec($curl); 
                sleep(20); 
                curl_close($curl);

                $content = str_get_html(str_replace(':///','://', $str));
                $endTime = microtime(true);
                $elapsedTime = $endTime - $startTime;
                foreach($content->find('a') as $element){
                    $href = $element->href;
                    if(strpos($href, 'http') !== 0)
                        $href = $url;
                    $this->crawler($robotsSrc, $href, $depth -1);                       
                }

All Questions Answered

Search This Blog

Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

PHPcrawlerigettoomanyrequest429

Comments

Post a Comment