Extracting urls out of a document?

SpaceshipSpaceship
Watch

The_Inferno

Established Member
Impact
1
Extracting urls from a document?

Extracting urls from a document?
Thought I had it but I don't. :(
Any suggestions?
 
Last edited:
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
AfternicAfternic
The following is assuming you are getting the links from a web page.

PHP:
class fetchlinks
{
	var $_isolated_links;
	var $_txt;
	var $href;
	var $url;
	var $urls = array();
	function fetchlinks($_url)
	{
		$this->_txt=file_get_contents($_url);
		$this->_txt=str_replace("<A",'<a',$this->_txt);
		$this->_txt=str_replace("</A>",'</a>',$this->_txt);
		$this->url=$_url;
	}
	function _isolate()
		{
			$j = 0;
			for ($i=0;$i<=strlen($this->_txt);$i++)
			{
	
				if (substr($this->_txt,$i,1+1)=="<a") 
				{
					$j++;
					$st=$i;
					$k=$i;
					while (substr($this->_txt,$k,3)!="/a>")
					{
						$k++;
					}
					$en=$k+2+1;
					$this->_isolated_links[$j]=substr($this->_txt,$st,$en-$st);
				}
				
			}
		}
	function get_links()
		{
		$this->_isolate();
			$k=0;
			$n=0;
			for ($i=1;$i<=count($this->_isolated_links);$i++)
			{
			
				for ($j=1;$j<=strlen($this->_isolated_links[$i]);$j++)
				{
					if (substr($this->_isolated_links[$i],$j,5)=='href=') 
					{
						$n++;
						$st=$j+5;
						$m=$j+5;
							while (substr($this->_isolated_links[$i],$m,1)!='>')
							{
								$m++;
							}
							$en=$m;
							$temp=substr($this->_isolated_links[$i],$st,$en-$st);
							if (strpos($temp,' ')>0) $temp=substr($temp,0,strpos($temp,' '));			
							$temp=str_replace('"','',$temp);
							$temp=str_replace("'","",$temp);
							if (substr($temp,0,1)=="/") $temp="$this->url".$temp;
							if (substr($temp,0,7)!="http://") $temp="$this->url/".$temp;
							if (!isset($p)) $p=0;
							$this->href[0]="";
							if (substr($temp,0,7)=="http://" && $temp!=$this->href[$p]) 
							{
								$p++;
								$this->href[$p]=$temp;
							}
					}
				}
			}
		}
	function get_array()
		{
		$this->get_links();
			for ($i=1;$i<count($this->href);$i++)
			{
				if (!in_array($this->href[$i], $this->urls))
				{
					array_push($this->urls, $this->href[$i]);
				}
			}
			return $this->urls;
		}
}

Usage

PHP:
$spider=new fetchlinks("CHANGE ME");
foreach ($spider->get_array() as $url)
{
	echo $url.'<br>';
}

I did not write this so can't really take credit for it. Can't remember where I found it (possibly the php manual comments). Change 'CHANGE ME' to the actual document name and path to where the document you wish to open is.

P.S. This is assuming you wanted a php script you never said. It is always best to post in which language you want a solution.
 
0
•••
1
•••
Dynadot — .com TransferDynadot — .com Transfer
Appraise.net

We're social

Spaceship
Domain Recover
CatchDoms
DomainEasy — Payment Flexibility
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back