NamePros
Welcome, Guest! Ready to make a name for yourself in the domain business? We welcome both the hobbyist and professional domainer to join the discussion as part of the NamePros community.

Click here to create your profile to start earning reputation for posting, and trader ratings for buying & selling in our free e-marketplace. Build your trader rating with each successful sale. Our system has tracked over 100,000 sales and counting!
FAQ & TOS Register Search Today's Posts Mark Forums Read

Go Back   NamePros.com > Website Development Discussion Forums > Programming
Reload this Page RegEx Help

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.

Advanced Search


Closed Thread
 
LinkBack Thread Tools
Old 05-03-2006, 03:57 PM THREAD STARTER               #1 (permalink)
NamePros Regular
Join Date: Jun 2005
Posts: 518
boomers has a spectacular aura aboutboomers has a spectacular aura about
 



RegEx Help


Hey there...
????: NamePros.com http://www.namepros.com/programming/193686-regex-help.html

Im trying to make a regex query that will strip down all the non-relevant HTML to leave just the hyperlink info.

An example of the hyperlink HTML is...
HTML Code:
<a href="http://www.url.com/blah.htm" class=underline><b>Vist My Page</b></a>
And all I want to be left with is the actual URL http://www.url.com/blah.htm And the wording for this link 'Visit My Page'

Obviously the link changes as there are alot in the actual HTML of the page... as does the text for it, but theyre always inbetween the bold tags.

So far I think ive got the URL by using:
http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?

But Im not 100% sure on how to get the text of the link along with it. Any help would be greatly appreciated

If it makes any difference im planning on using this with .net
__________________
Free Horse Racing Tips
boomers is offline  
Old 05-07-2006, 08:00 AM   #2 (permalink)
tm
Senior Member
 
tm's Avatar
Join Date: Nov 2005
Location: on a oil rig just off Ireland
Posts: 1,408
tm is a glorious beacon of lighttm is a glorious beacon of lighttm is a glorious beacon of lighttm is a glorious beacon of lighttm is a glorious beacon of light
 



Perhaps this might be useful to you.

PHP Code:
<?php
// $document should contain an HTML document.
// This will remove HTML tags, javascript sections
// and white space. It will also convert some
// common HTML entities to their text equivalent.
$search = array ('@<script[^>]*?>.*?</script>@si'// Strip out javascript
                 
'@<[\/\!]*?[^<>]*?>@si',          // Strip out HTML tags
                 
'@([\r\n])[\s]+@',                // Strip out white space
                 
'@&(quot|#34);@i',                // Replace HTML entities
                 
'@&(amp|#38);@i',
                 
'@&(lt|#60);@i',
                 
'@&(gt|#62);@i',
                 
'@&(nbsp|#160);@i',
                 
'@&(iexcl|#161);@i',
????: NamePros.com http://www.namepros.com/showthread.php?t=193686
                 
'@&(cent|#162);@i',
                 
'@&(pound|#163);@i',
                 
'@&(copy|#169);@i',
                 
'@&#(\d+);@e');                    // evaluate as php

$replace = array ('',
                 
'',
                 
'\1',
                 
'"',
                 
'&',
                 
'<',
                 
'>',
                 
' ',
                 
chr(161),
                 
chr(162),
                 
chr(163),
                 
chr(169),
                 
'chr(\1)');

$text preg_replace($search$replace$document);
????: NamePros.com http://www.namepros.com/showthread.php?t=193686
?>
Found on http://uk.php.net/manual/en/function.preg-replace.php
__________________
You design in photoshop, I code into valid XHTML/CSS.
Professional PSD, PNG or HTML to tableless XHTML/CSS designs.
For more info, send me a PM.
tm is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


Liquid Web Smart Servers  
All times are GMT -7. The time now is 11:42 PM.

Managed Web Hosting by Liquid Web
Domain name forum recommended by Domaining.com Powered by: vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.6.0 Ad Management plugin by RedTyger