function page_exists($url){ $parts=parse_url($url); if(!$parts) return false; /* the URL was seriously wrong */ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); /* set the user agent - might help, doesn't hurt */ curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)'); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); /* try to follow redirects */ curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); /* timeout after the specified number of seconds. assuming that this script runs on a server, 20 seconds should be plenty of time to verify a valid URL. */ curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15); curl_setopt($ch, CURLOPT_TIMEOUT, 20); /* don't download the page, just the header (much faster in this case) */ curl_setopt($ch, CURLOPT_NOBODY, true); curl_setopt($ch, CURLOPT_HEADER, true); /* handle HTTPS links */ if($parts['scheme']=='https'){ curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); } $response = curl_exec($ch); curl_close($ch); /* get the status code from HTTP headers */ if(preg_match('/HTTP\/1\.\d+\s+(\d+)/', $response, $matches)){ $code=intval($matches[1]); } else { return false; }; /* see if code indicates success */ return (($code>=200) && ($code<400));}
Notes on implementation
I've used a somewhat liberal interpretation of "exists" here – this function will return TRUE even when URL redirects to a different page. I think that this is generally a good idea.
Another thing to note is that this function expects a fully qualified and well-formed URL. Checking if a random string represents a syntactically valid URL is not the it's purpose and would be very inefficient + error-prone.
If you're familiar with CURL you might know about the CURLOPT_FAILONERROR option which is supposed to make curl_exec() treat a non-existent page as an error. It might seem that with this option set page_exists() might be simplified by only checking if $response equals FALSE (indicating an error). Well, that doesn't really work, at least not as expected. In my tests CURLOPT_FAILONERROR made curl_exec() fail when the returned HTTP status code was 302 – a form of temporary redirect. Needless to say the URL in question worked fine in my browser so I decided to blame CURL and revise the function to explicitly check the status code, treating all codes in the 2XX – 3XX range as success.
Alternatives
If you can't or don't want to use CURL there are other ways to see if a page exists.
- fopen() – try opening the URL as a file and hope the fopen() URL wrapper is enabled. You can find lots of similar examples on Google.
$handle = @fopen($url,'r');if($handle !== false){ echo 'Page Exists';} else { echo 'Page Not Found';}-
fsockopen() – use sockets to connect to the target host, build the HTTP request by hand and analyze the server's response. See some page-checking examples in the comments for the fsockopen() function on php.net. IMHO this method is a bit of overkill – it's complex and may lead to strange bugs if you don't know exactly what you're doing.