View Single Post
Old 07-29-2007, 12:17 PM   · #1
Barrucadu
Formally Mikor.
 
Barrucadu's Avatar
 
Name: Michael Walker
Location: East Yorkshire, England
Trader Rating: (7)
Join Date: Aug 2005
Posts: 2,508
NP$: 144.25 (Donate)
Barrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to behold
Get all links from a page

This code will get all links from a page, example. I developed it as part of a simple spider i'm working on.

This is what i'm using it for, obviously it's not finished, but I think its a pretty good (if strange) idea. Needs JavaScript. Only tested in Opera.

PHP Code:
<pre><?php

$url
= $_GET['url'];
$html = file_get_contents($url);
$preg = array();
$base = array();
$links = array();
$parsed = parse_url($url);

preg_match_all("/\<a(\s*)href(\s*)=(\s*)\"(.*?)\"(.*?)\>(.*?)\<\/a\>/i", $html, $preg[0]);
preg_match_all("/\<a(\s*)href(\s*)=(\s*)'(.*?)'(.*?)\>(.*?)\<\/a\>/i", $html, $preg[1]);
preg_match("/\<base(\s*)href(\s*)=(\s*)\"(.*?)\"(\s*)\/\>/i", $html, $base);

$title = array_merge($preg[0][6], $preg[1][6]);
$href = array_merge($preg[0][4], $preg[1][4]);
$base = $base[4];

if(empty(
$base))
    
$base = (!empty($parsed['user'])) ? "{$parsed['scheme']}://{$parsed['user']}:{$parsed['pass']}@{$parsed['host']}" : "{$parsed['scheme']}://{$parsed['host']}";

for(
$i = 0; $i < count($href); $i ++){
    if(
substr($href[$i], 0, 1) == '/')
        
$href[$i] = "{$base}{$href[$i]}";
    if(
substr($href[$i], 0, 1) == '?' || substr($href[$i], 0, 1) == '#')
        
$href[$i] = "{$url}{$href[$i]}";
    
$links[$i] = array("title" => htmlentities($title[$i]), "url" => htmlentities($href[$i]));
}

print_r($links);

?></pre>


Please register or log-in into NamePros to hide ads
__________________
Me | Blog | Last.fm | F@h | Archlinux.co.uk

archlinux User
Barrucadu is offline   Reply With Quote
Site Sponsors
http://www.mobisitetrader.com/ Find out how! Grow your forum!
Advertise your business at NamePros
All times are GMT -7. The time now is 04:37 PM.


Powered by: vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.