How To Check If Page Exists With CURL

Here’s a relatively simple PHP function that will check if an URL really leads to a valid page (as opposed to generating “404 Not Found” or some other kind of error). It uses the CURL library – if your server doesn’t have it installed, see “Alternatives” at the end of this post. This script may be useful for finding broken links and similar tasks.

function page_exists($url){

$parts=parse_url($url);

if(!$parts) return false; /* the URL was seriously wrong */

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);

/* set the user agent - might help, doesn't hurt */

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)');

curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

/* try to follow redirects */

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

/* timeout after the specified number of seconds. assuming that this script runs

on a server, 20 seconds should be plenty of time to verify a valid URL. */

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);

curl_setopt($ch, CURLOPT_TIMEOUT, 20);

/* don't download the page, just the header (much faster in this case) */

curl_setopt($ch, CURLOPT_NOBODY, true);

curl_setopt($ch, CURLOPT_HEADER, true);

/* handle HTTPS links */

if($parts['scheme']=='https'){

curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

}

$response = curl_exec($ch);

curl_close($ch);

/* get the status code from HTTP headers */

if(preg_match('/HTTP\/1\.\d+\s+(\d+)/', $response, $matches)){

$code=intval($matches[1]);

} else {

return false;

};

/* see if code indicates success */

return (($code>=200) && ($code<400));

}

 Notes on implementation
I've used a somewhat liberal interpretation of "exists" here – this function will return TRUE even when URL redirects to a different page. I think that this is generally a good idea. 
 Another thing to note is that this function expects a fully qualified and well-formed URL. Checking if a random string represents a syntactically valid URL is not the it's purpose and would be very inefficient + error-prone.
 If you're familiar with CURL you might know about the CURLOPT_FAILONERROR option which is supposed to make curl_exec() treat a non-existent page as an error. It might seem that with this option set page_exists() might be simplified by only checking if $response equals FALSE (indicating an error). Well, that doesn't really work, at least not as expected. In my tests CURLOPT_FAILONERROR made curl_exec() fail when the returned HTTP status code was 302 – a form of temporary redirect. Needless to say the URL in question worked fine in my browser so I decided to blame CURL and revise the function to explicitly check the status code, treating all codes in the 2XX – 3XX range as success. 
 Alternatives
If you can't or don't want to use CURL there are other ways to see if a page exists.
   fopen() – try opening the URL as a file and hope the fopen() URL wrapper is enabled. You can find lots of similar examples on Google. 
 
 
 $url = 'http://www.example.com';
 $handle = @fopen($url,'r');
 if($handle !== false){
    echo 'Page Exists';
 }  else {
    echo 'Page Not Found';
 }
    fsockopen() – use sockets to connect to the target host, build the HTTP request by hand and analyze the server's response. See some page-checking examples in the comments for the fsockopen() function on php.net. IMHO this method is a bit of overkill – it's complex and may lead to strange bugs if you don't know exactly what you're doing.

Post Views: 1,777

Checking If Page Contains a Link In PHP

How To *Really* Upload Files With PHP

You may also like

Leave a Comment Cancel Reply

How To Really Upload Files With PHP