Dynadot โ€” .com Registration $8.99

Extract url [PHP]

Spaceship Spaceship
Watch

w1ww

Established Member
Impact
9
Ok,

Sometime ago I've got this code to extract urls from the Google News website:


PHP:
<?php

$url = "http://news.google.com/news/url?sa=T&ct=pt/17-0&fd=R&url=http://www.estadao.com.br/ultimas/nacional/noticias/2006/dez/13/2.htm&cid=1103238007&ei=8mmARcKVMb7maOid1dIO";

preg_match('/&url=(.*?)&cid=/', $url, $result);

echo $result[1]; // http://www.estadao.com.br/ultimas/nacional/noticias/2006/dez/13/2.htm

?>

I've been trying to adapt it to for example on the url "http://www.google.com" it should result only on "google"

Already tried :
PHP:
preg_match('/http://www.(.*?)./', $url, $result);

But I get
Warning: preg_match() [function.preg-match]: Unknown modifier '/' in D:\wamp\www\1.php on line 5

So, does anybody has the solution to this little problem?

Thank you,
Tiago
 
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
GoDaddyGoDaddy
If you are going to use /'s in your pattern with /'s as your barrier, you need to escape them. Try making it http:\/\/ or using @'s as your barriers.

(You could also just use str_replace to remove the http:// and www.)
 
0
•••
this should work

PHP:
preg_match('/\\b(?P<subdomain>(?:[-a-z0-9]+\\.)+)?(?P<domain>(?P<host>[-a-z0-9]+)\\.(?P<tld>[a-z]{2,6}))/', $url, $result));

This will extract multiple subdomains, host ,tld and domain. Hope that helps.

Bax
 
0
•••
PHP:
preg_match('/http:\/\/www.(.*?)./', $url, $result);

returns nothing , just a blank space (tried with the @ too, and it returns blank too!

and
PHP:
preg_match('/\\b(?P<subdomain>(?:[-a-z0-9]+\\.)+)?(?P<domain>(?P<host>[-a-z0-9]+)\\.(?P<tld>[a-z]{2,6}))/', $url, $result));

returns
Parse error: parse error, unexpected ')' in D:\wamp\www\1.php on line 5


Any suggestion?

Thanks
 
0
•••
PHP:
$var = str_replace(array('http://', 'www.'), '', $var);
 
0
•••
Alternatively sticking with the regex:

PHP:
$var = 'http://www.google.com';
$var = preg_replace("/(((http|ftp|https):\/\/)|(www\.?))/i", null, $var); 

echo $var; // will output google.com

Just a bit more versitile :)
 
0
•••
sorry my mistake I used it in an if statement and left the trailing bracker.

PHP:
preg_match('/\\b(?P<subdomain>(?:[-a-z0-9]+\\.)+)?(?P<domain>(?P<host>[-a-z0-9]+)\\.(?P<tld>[a-z]{2,6}))/', $url, $result);
 
0
•••
Ok,

Didnt explain correctly, for example if the url is 'http://www.google.com/blabalbla.php'

I want it to extract the google part..

Matt, your code does indeed output the google.com but if there is something more after it, it will be outputted too .

baxter, same error as before :\

Thanks for your help till now!
 
0
•••
w1ww, how about this:

PHP:
$url = 'http://www.google.com/images/';

preg_match('#^(?:http://)(www\.?)?([^/]+)#i', $url , $matches);
// echo $matches[2];
 
Last edited:
0
•••
Wow, its working!

I'll test it on my code, it hope it works as I want to :)! !

Thank you!!
 
0
•••
Sorry I didn't update the first post but the second one work I just checked it.

example:

PHP:
<?php                   

$url = 'http://www.namepros.com/programming/282450-script-selling-script.html';

preg_match('/\\b(?P<subdomain>(?:[-a-z0-9]+\\.)+)?(?P<domain>(?P<host>[-a-z0-9]+)\\.(?P<tld>[a-z]{2,6}))/', $url, $result); 

print_r($result);
  
?>

Would ourput

Code:
Array ( 
[0] => www.namepros.com 
[subdomain] => www. 
[1] => www. 
[domain] => namepros.com 
[2] => namepros.com 
[host] => namepros 
[3] => namepros
[tld] => com 
[4] => com 
)

Cheers,

Baxter
w1ww said:
Ok,

Didnt explain correctly, for example if the url is 'http://www.google.com/blabalbla.php'

I want it to extract the google part..

Matt, your code does indeed output the google.com but if there is something more after it, it will be outputted too .

baxter, same error as before :\

Thanks for your help till now!
 
0
•••
Dynadot โ€” .com Registration $8.99Dynadot โ€” .com Registration $8.99
Appraise.net
Unstoppable Domains
Domain Recover
DomainEasy โ€” Zero Commission
  • The sidebar remains visible by scrolling at a speed relative to the pageโ€™s height.
Back