[php] czytanie stron - curl

[php] czytanie stron - curl, same problemy - min "CURLOPT_FOLLOWLOCATION cannot be activated

konrados Zobacz profil	14.02.2008, 17:16:00 Post #1
Grupa: Zarejestrowani Postów: 623 Pomógł: 79 Dołączył: 16.01.2008 Ostrzeżenie: (0%)	Witam znowu, Chcę by mój skrypt czytał strony www, ale z limitem czasu (przecież to chyba musi być częsty problem?). Korzystam więc z file_get_contents i robię: CODE function WGetWebPage($url,$nTimeout=25,$nMaxLen=500000) { $options = array( 'http' => array( 'user_agent' => 'test', 'timeout' => $nTimeout, ) ); $context = stream_context_create( $options ); //declar: string file_get_contents ( string $filename [, bool $use_include_path [, resource $context [, int $offset [, int $maxlen]]]] ) $page = @file_get_contents( $url, false, $context,0,$nMaxLen); return $page; } } return $result; } Problem w tym, że timeout nie działa - cokolwiek dam nie zapodam - zawsze wykorzysta tyle czasu by ściągnąć plik. Zapodam 1 sek - funkcja zajmie 6 sekund... Więc pomyślałem o curl. Robię więc to: CODE $curl_session = curl_init(); curl_setopt($curl_session, CURLOPT_HEADER, false); curl_setopt($curl_session, CURLOPT_FOLLOWLOCATION, true); curl_setopt($curl_session, CURLOPT_RETURNTRANSFER, true); curl_setopt ($curl_session, CURLOPT_HTTPGET, true); curl_setopt($curl_session, CURLOPT_URL, 'http://www.google.com'); curl_setopt($curl_session, CURLOPT_TIMEOUT, 2);//działa, całkowite tylko $string = curl_exec($curl_session); Już nie mam problemu z timeout'em, natomiast występuje następujący błąd: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set Wychodzi na to, że nie mogę używać tej flagi, ale jej wyłączenie w większości wypadków daje niechciane rezultaty - strona daje błędy typu 301 lub 302. Znacie jakieś rozwiązanie? (Poza wyłączeniem safe_mode bo nie mam takiej możliwości) Pomoże ktoś ? //////////////////////////////////////////////////////////////////////////////////////////////////// Połowiczne rozwiązanie. Po dwóch dniach poszukiwań i goglowań trafiłem na :http://pl2.php.net/manual/en/function.curl-setopt.php#79787 Ta funkcja wykonuje rekursywnie curl_exec uprzednio pobierając headers i sprawdzając czy kod == 301 lub 302. Teraz mam jednak taki problem: ta funkcja zwróci mi stronę razem z headers, nie wiecie jak się ich pozbyć? Na razie zrobiłem to prymitywnie - raz jeszcze (niepotrzebnie) wywołuję curl_exec, tym razem z curl_setopt($ch, CURLOPT_HEADER, false); znaczy bez nagłówków, ale wolałbym raczej tego nie robić.... Alternatywa dla curl_exec: CODE //wzięte z <a href="http://pl2.php.net/manual/en/function.curl-setopt.php#79787" target="_blank">http://pl2.php.net/manual/en/function.curl-setopt.php#79787</a> function curl_redir_exec($ch) { echo " calling curl_redir_exec "; static $curl_loops = 0; static $curl_max_loops = 5; if ($curl_loops++ >= $curl_max_loops) { $curl_loops = 0; return FALSE; } curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $data = curl_exec($ch); $debbbb = $data; list($header, $data) = explode("\n\n", $data, 2); $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE); if ($http_code == 301 \|\| $http_code == 302) { $matches = array(); preg_match('/Location:(.*?)\n/', $header, $matches); $url = @parse_url(trim(array_pop($matches))); //print_r($url); if (!$url) { //couldn't process the url to redirect to $curl_loops = 0; return $data; } $last_url = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL)); $new_url = $url['scheme'] . '://' . $url['host'] . $url['path'] . ($url['query']?'?'.$url['query']:''); curl_setopt($ch, CURLOPT_URL, $new_url); //debug('Redirecting to', $new_url); return curl_redir_exec($ch); } else { $curl_loops=0; //return $debbbb;//:oryginalnie //moje testy, powtórne wywołanie curl_exec, tym razem z CURLOPT_HEADER ustaw. na false: curl_setopt($ch, CURLOPT_HEADER, false); return curl_exec($ch); } } Powód edycji: Dodaje tag /~strife/