Hey there...
Im trying to make a regex query that will strip down all the non-relevant HTML to leave just the hyperlink info.
An example of the hyperlink HTML is...
And all I want to be left with is the actual URL http://www.url.com/blah.htm And the wording for this link 'Visit My Page'
Obviously the link changes as there are alot in the actual HTML of the page... as does the text for it, but theyre always inbetween the bold tags.
So far I think ive got the URL by using:
http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
But Im not 100% sure on how to get the text of the link along with it. Any help would be greatly appreciated
If it makes any difference im planning on using this with .net
Im trying to make a regex query that will strip down all the non-relevant HTML to leave just the hyperlink info.
An example of the hyperlink HTML is...
HTML:
<a href="http://www.url.com/blah.htm" class=underline><b>Vist My Page</b></a>
And all I want to be left with is the actual URL http://www.url.com/blah.htm And the wording for this link 'Visit My Page'
Obviously the link changes as there are alot in the actual HTML of the page... as does the text for it, but theyre always inbetween the bold tags.
So far I think ive got the URL by using:
http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
But Im not 100% sure on how to get the text of the link along with it. Any help would be greatly appreciated
If it makes any difference im planning on using this with .net








