Skip to Content

TheBlogReaders.com

Salesforce.com, PHP, MySQL, Javascript, Ajax, Htacces

How To Check If Page Exists With CURL

Be First!
by May 7, 2012 Uncategorized
Here’s a relatively simple PHP function that will check if an URL really leads to a valid page (as opposed to generating “404 Not Found” or some other kind of error). It uses the CURL library – if your server doesn’t have it installed, see “Alternatives” at the end of this post. This script may be useful for finding broken links and similar tasks.
 
function page_exists($url){
  $parts=parse_url($url);
  if(!$parts) return false; /* the URL was seriously wrong */
 
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
 
  /* set the user agent - might help, doesn't hurt */
  curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)');
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
 
  /* try to follow redirects */
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 
  /* timeout after the specified number of seconds. assuming that this script runs
    on a server, 20 seconds should be plenty of time to verify a valid URL.  */
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
  curl_setopt($ch, CURLOPT_TIMEOUT, 20);
 
  /* don't download the page, just the header (much faster in this case) */
  curl_setopt($ch, CURLOPT_NOBODY, true);
  curl_setopt($ch, CURLOPT_HEADER, true);
 
  /* handle HTTPS links */
  if($parts['scheme']=='https'){
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,  1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
  }
 
  $response = curl_exec($ch);
  curl_close($ch);
 
  /*  get the status code from HTTP headers */
  if(preg_match('/HTTP\/1\.\d+\s+(\d+)/', $response, $matches)){
    $code=intval($matches[1]);
  } else {
    return false;
  };
 
  /* see if code indicates success */
  return (($code>=200) && ($code<400));
}
 
 

Notes on implementation
I've used a somewhat liberal interpretation of "exists" here – this function will return TRUE even when URL redirects to a different page. I think that this is generally a good idea.

Another thing to note is that this function expects a fully qualified and well-formed URL. Checking if a random string represents a syntactically valid URL is not the it's purpose and would be very inefficient + error-prone.

If you're familiar with CURL you might know about the CURLOPT_FAILONERROR option which is supposed to make curl_exec() treat a non-existent page as an error. It might seem that with this option set page_exists() might be simplified by only checking if $response equals FALSE (indicating an error). Well, that doesn't really work, at least not as expected. In my tests CURLOPT_FAILONERROR made curl_exec() fail when the returned HTTP status code was 302 – a form of temporary redirect. Needless to say the URL in question worked fine in my browser so I decided to blame CURL and revise the function to explicitly check the status code, treating all codes in the 2XX – 3XX range as success.

Alternatives
If you can't or don't want to use CURL there are other ways to see if a page exists.

  • fopen() – try opening the URL as a file and hope the fopen() URL wrapper is enabled. You can find lots of similar examples on Google.
$handle = @fopen($url,'r');
if($handle !== false){
   echo 'Page Exists';
else {
   echo 'Page Not Found';
}
  • fsockopen() – use sockets to connect to the target host, build the HTTP request by hand and analyze the server's response. See some page-checking examples in the comments for the fsockopen() function on php.net. IMHO this method is a bit of overkill – it's complex and may lead to strange bugs if you don't know exactly what you're doing.

(510)

Previous
Next

Leave a Reply