function
page_exists(
$url
){
$parts
=
parse_url
(
$url
);
if
(!
$parts
)
return
false;
/* the URL was seriously wrong */
$ch
= curl_init();
curl_setopt(
$ch
, CURLOPT_URL,
$url
);
/* set the user agent - might help, doesn't hurt */
curl_setopt(
$ch
, CURLOPT_USERAGENT,
'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)'
);
curl_setopt(
$ch
, CURLOPT_RETURNTRANSFER,1);
/* try to follow redirects */
curl_setopt(
$ch
, CURLOPT_FOLLOWLOCATION, 1);
/* timeout after the specified number of seconds. assuming that this script runs
on a server, 20 seconds should be plenty of time to verify a valid URL. */
curl_setopt(
$ch
, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt(
$ch
, CURLOPT_TIMEOUT, 20);
/* don't download the page, just the header (much faster in this case) */
curl_setopt(
$ch
, CURLOPT_NOBODY, true);
curl_setopt(
$ch
, CURLOPT_HEADER, true);
/* handle HTTPS links */
if
(
$parts
[
'scheme'
]==
'https'
){
curl_setopt(
$ch
, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt(
$ch
, CURLOPT_SSL_VERIFYPEER, false);
}
$response
= curl_exec(
$ch
);
curl_close(
$ch
);
/* get the status code from HTTP headers */
if
(preg_match(
'/HTTP\/1\.\d+\s+(\d+)/'
,
$response
,
$matches
)){
$code
=
intval
(
$matches
[1]);
}
else
{
return
false;
};
/* see if code indicates success */
return
((
$code
>=200) && (
$code
<400));
}
Notes on implementation
I've used a somewhat liberal interpretation of "exists" here – this function will return TRUE even when URL redirects to a different page. I think that this is generally a good idea.
Another thing to note is that this function expects a fully qualified and well-formed URL. Checking if a random string represents a syntactically valid URL is not the it's purpose and would be very inefficient + error-prone.
If you're familiar with CURL you might know about the CURLOPT_FAILONERROR option which is supposed to make curl_exec() treat a non-existent page as an error. It might seem that with this option set page_exists() might be simplified by only checking if $response equals FALSE (indicating an error). Well, that doesn't really work, at least not as expected. In my tests CURLOPT_FAILONERROR made curl_exec() fail when the returned HTTP status code was 302 – a form of temporary redirect. Needless to say the URL in question worked fine in my browser so I decided to blame CURL and revise the function to explicitly check the status code, treating all codes in the 2XX – 3XX range as success.
Alternatives
If you can't or don't want to use CURL there are other ways to see if a page exists.
- fopen() – try opening the URL as a file and hope the fopen() URL wrapper is enabled. You can find lots of similar examples on Google.
$handle
= @
fopen
(
$url
,
'r'
);
if
(
$handle
!== false){
echo
'Page Exists'
;
}
else
{
echo
'Page Not Found'
;
}
-
fsockopen() – use sockets to connect to the target host, build the HTTP request by hand and analyze the server's response. See some page-checking examples in the comments for the fsockopen() function on php.net. IMHO this method is a bit of overkill – it's complex and may lead to strange bugs if you don't know exactly what you're doing.