NamePros
Welcome, Guest! Ready to make a name for yourself in the domain business? We welcome both the hobbyist and professional domainer to join the discussion as part of the NamePros community.

Click here to create your profile to start earning reputation for posting, and trader ratings for buying & selling in our free e-marketplace. Build your trader rating with each successful sale. Our system has tracked over 100,000 sales and counting!
FAQ & TOS Register Search Today's Posts Mark Forums Read

Go Back   NamePros.com > Website Development Discussion Forums > Programming
Reload this Page extracting urls out of a document?

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.

Advanced Search


Closed Thread
 
LinkBack Thread Tools
Old 02-12-2006, 12:06 AM THREAD STARTER               #1 (permalink)
NamePros Regular
Join Date: Sep 2005
Posts: 471
The_Inferno is a jewel in the roughThe_Inferno is a jewel in the roughThe_Inferno is a jewel in the rough
 




Extracting urls from a document?


Extracting urls from a document?
Thought I had it but I don't.
Any suggestions?
__________________
$2000 dollars worth of products for 2.50!!! http://webtoolsmembership.com
http://rmember.tradebit.com
Last edited by The_Inferno; 02-12-2006 at 03:12 AM.
The_Inferno is offline  
Old 02-12-2006, 05:40 AM   #2 (permalink)
NamePros Expert
 
Peter's Avatar
Join Date: Nov 2003
Location: Scotland
Posts: 5,069
Peter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond repute
 


Child Abuse Save The Children Save The Children Help The Homeless - Holiday 2009 Help The Homeless - Holiday 2009 Help The Homeless - Holiday 2009 Help The Homeless - Holiday 2009
The following is assuming you are getting the links from a web page.

PHP Code:
class fetchlinks
{
    var 
$_isolated_links;
    var 
$_txt;
    var 
$href;
    var 
$url;
    var 
$urls = array();
    function 
fetchlinks($_url)
    {
        
$this->_txt=file_get_contents($_url);
        
$this->_txt=str_replace("<A",'<a',$this->_txt);
????: NamePros.com http://www.namepros.com/programming/166765-extracting-urls-out-of-a-document.html
        
$this->_txt=str_replace("</A>",'</a>',$this->_txt);
        
$this->url=$_url;
    }
    function 
_isolate()
        {
            
$j 0;
            for (
$i=0;$i<=strlen($this->_txt);$i++)
            {
    
                if (
substr($this->_txt,$i,1+1)=="<a"
                {
                    
$j++;
                    
$st=$i;
                    
$k=$i;
                    while (
substr($this->_txt,$k,3)!="/a>")
                    {
                        
$k++;
                    }
                    
$en=$k+2+1;
                    
$this->_isolated_links[$j]=substr($this->_txt,$st,$en-$st);
                }
                
            }
        }
    function 
get_links()
        {
        
$this->_isolate();
            
$k=0;
            
$n=0;
            for (
$i=1;$i<=count($this->_isolated_links);$i++)
            {
            
                for (
$j=1;$j<=strlen($this->_isolated_links[$i]);$j++)
                {
                    if (
substr($this->_isolated_links[$i],$j,5)=='href='
                    {
                        
$n++;
                        
$st=$j+5;
                        
$m=$j+5;
                            while (
substr($this->_isolated_links[$i],$m,1)!='>')
                            {
                                
$m++;
                            }
                            
$en=$m;
                            
$temp=substr($this->_isolated_links[$i],$st,$en-$st);
                            if (
strpos($temp,' ')>0$temp=substr($temp,0,strpos($temp,' '));            
                            
$temp=str_replace('"','',$temp);
                            
$temp=str_replace("'","",$temp);
                            if (
substr($temp,0,1)=="/"$temp="$this->url".$temp;
                            if (
substr($temp,0,7)!="http://"$temp="$this->url/".$temp;
                            if (!isset(
$p)) $p=0;
                            
$this->href[0]="";
                            if (
substr($temp,0,7)=="http://" && $temp!=$this->href[$p]) 
                            {
                                
$p++;
                                
$this->href[$p]=$temp;
                            }
                    }
                }
            }
        }
    function 
get_array()
        {
        
$this->get_links();
            for (
$i=1;$i<count($this->href);$i++)
            {
                if (!
in_array($this->href[$i], $this->urls))
                {
                    
array_push($this->urls$this->href[$i]);
                }
            }
            return 
$this->urls;
        }

????: NamePros.com http://www.namepros.com/showthread.php?t=166765
Usage

PHP Code:
$spider=new fetchlinks("CHANGE ME");
foreach (
$spider->get_array() as $url)
{
    echo 
$url.'<br>';

I did not write this so can't really take credit for it. Can't remember where I found it (possibly the php manual comments). Change 'CHANGE ME' to the actual document name and path to where the document you wish to open is.

P.S. This is assuming you wanted a php script you never said. It is always best to post in which language you want a solution.
Peter is offline  
Old 02-13-2006, 08:14 PM   #3 (permalink)
Senior Member
Join Date: Aug 2002
Posts: 1,255
deadserious has a spectacular aura aboutdeadserious has a spectacular aura about
 



I have a tool here http://dnextractor.com/tools/dnlistcleaner.pl that will extract domains out of text, so you can paste the text of a document and get all the domain names out of it. But if you want the full url's, then that might not be the tool for you. You can try it here Domain List Cleaner.
deadserious is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Pointing other urls to one main url Pantigbox Domain Newbies 5 11-16-2005 05:01 PM
Inserting the html code of 1 document into another JoshHendo Programming 2 09-24-2005 11:02 PM
linking to another page and opening a specific document into an IFRAME makermet Web Design Discussion 1 06-24-2005 10:20 AM
ABSOLUTELY FREE: Web Hosting, Image Hosting, Forums, Web Proxy, Webmail & Short URLs! IncognitoNet For Sale / Advertising Board 0 02-06-2005 03:07 PM
Question about SE friendly URL's selfmindead Web Design Discussion 0 12-04-2004 06:53 AM

Liquid Web Smart Servers  
All times are GMT -7. The time now is 07:55 AM.

Managed Web Hosting by Liquid Web
Domain name forum recommended by Domaining.com Powered by: vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.6.0 Ad Management plugin by RedTyger