[advanced search]
 

Go Back   NamePros.com > Discussion > Web Design & Development > Programming

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.


Closed Thread
 
LinkBack Thread Tools
Old 02-11-2006, 11:06 PM   #1 (permalink)
NamePros Regular
 
Join Date: Sep 2005
Posts: 252
179.85 NP$ (Donate)

The_Inferno will become famous soon enoughThe_Inferno will become famous soon enough


Extracting urls from a document?

Extracting urls from a document?
Thought I had it but I don't.
Any suggestions?
__________________
http://peewii.info - Nintend Wii...
http://v3x.us - Hosting...

Last edited by The_Inferno; 02-12-2006 at 02:12 AM.
The_Inferno is offline  
Old 02-12-2006, 04:40 AM   #2 (permalink)
Senior Member
 
Peter's Avatar
 
Join Date: Nov 2003
Location: Scotland
Posts: 4,900
0.60 NP$ (Donate)

Peter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond reputePeter has a reputation beyond repute

Child Abuse Save The Children Save The Children Help The Homeless - Holiday 2009 Help The Homeless - Holiday 2009 Help The Homeless - Holiday 2009 Help The Homeless - Holiday 2009
The following is assuming you are getting the links from a web page.

PHP Code:
class fetchlinks
{
    var
$_isolated_links;
    var
$_txt;
    var
$href;
    var
$url;
    var
$urls = array();
    function
fetchlinks($_url)
    {
        
$this->_txt=file_get_contents($_url);
        
$this->_txt=str_replace("<A",'<a',$this->_txt);
        
$this->_txt=str_replace("</A>",'</a>',$this->_txt);
        
$this->url=$_url;
    }
    function
_isolate()
        {
            
$j = 0;
            for (
$i=0;$i<=strlen($this->_txt);$i++)
            {
    
                if (
substr($this->_txt,$i,1+1)=="<a")
                {
                    
$j++;
                    
$st=$i;
                    
$k=$i;
                    while (
substr($this->_txt,$k,3)!="/a>")
                    {
                        
$k++;
                    }
                    
$en=$k+2+1;
                    
$this->_isolated_links[$j]=substr($this->_txt,$st,$en-$st);
                }
                
            }
        }
    function
get_links()
        {
        
$this->_isolate();
            
$k=0;
            
$n=0;
            for (
$i=1;$i<=count($this->_isolated_links);$i++)
            {
            
                for (
$j=1;$j<=strlen($this->_isolated_links[$i]);$j++)
                {
                    if (
substr($this->_isolated_links[$i],$j,5)=='href=')
                    {
                        
$n++;
                        
$st=$j+5;
                        
$m=$j+5;
                            while (
substr($this->_isolated_links[$i],$m,1)!='>')
                            {
                                
$m++;
                            }
                            
$en=$m;
                            
$temp=substr($this->_isolated_links[$i],$st,$en-$st);
                            if (
strpos($temp,' ')>0) $temp=substr($temp,0,strpos($temp,' '));            
                            
$temp=str_replace('"','',$temp);
                            
$temp=str_replace("'","",$temp);
                            if (
substr($temp,0,1)=="/") $temp="$this->url".$temp;
                            if (
substr($temp,0,7)!="http://") $temp="$this->url/".$temp;
                            if (!isset(
$p)) $p=0;
                            
$this->href[0]="";
                            if (
substr($temp,0,7)=="http://" && $temp!=$this->href[$p])
                            {
                                
$p++;
                                
$this->href[$p]=$temp;
                            }
                    }
                }
            }
        }
    function
get_array()
        {
        
$this->get_links();
            for (
$i=1;$i<count($this->href);$i++)
            {
                if (!
in_array($this->href[$i], $this->urls))
                {
                    
array_push($this->urls, $this->href[$i]);
                }
            }
            return
$this->urls;
        }
}
Usage

PHP Code:
$spider=new fetchlinks("CHANGE ME");
foreach (
$spider->get_array() as $url)
{
    echo
$url.'<br>';
}
I did not write this so can't really take credit for it. Can't remember where I found it (possibly the php manual comments). Change 'CHANGE ME' to the actual document name and path to where the document you wish to open is.

P.S. This is assuming you wanted a php script you never said. It is always best to post in which language you want a solution.
Peter is offline  
Old 02-13-2006, 07:14 PM   #3 (permalink)
Senior Member
 
Join Date: Aug 2002
Posts: 1,300
2.85 NP$ (Donate)

deadserious has a spectacular aura aboutdeadserious has a spectacular aura about


I have a tool here http://dnextractor.com/tools/dnlistcleaner.pl that will extract domains out of text, so you can paste the text of a document and get all the domain names out of it. But if you want the full url's, then that might not be the tool for you. You can try it here Domain List Cleaner.
deadserious is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Pointing other urls to one main url Pantigbox Domain Newbies 5 11-16-2005 04:01 PM
Inserting the html code of 1 document into another JoshHendo Programming 2 09-24-2005 10:02 PM
linking to another page and opening a specific document into an IFRAME makermet Web Design Discussion 1 06-24-2005 09:20 AM
ABSOLUTELY FREE: Web Hosting, Image Hosting, Forums, Web Proxy, Webmail & Short URLs! IncognitoNet For Sale / Advertising Board 0 02-06-2005 02:07 PM
Question about SE friendly URL's selfmindead Web Design Discussion 0 12-04-2004 05:53 AM

Site Sponsors
Advertise your business at NamePros

All times are GMT -7. The time now is 09:30 AM.


Powered by: vBulletin® Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0
Template-Modifications by TMS
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85