NamePros
Welcome, Guest! Ready to make a name for yourself in the domain business? We welcome both the hobbyist and professional domainer to join the discussion as part of the NamePros community.

Click here to create your profile to start earning reputation for posting, and trader ratings for buying & selling in our free e-marketplace. Build your trader rating with each successful sale. Our system has tracked over 100,000 sales and counting!
FAQ & TOS Register Search Today's Posts Mark Forums Read

Go Back   NamePros.com > Website Development Discussion Forums > Programming
Reload this Page [PHP] file_get_contents and regex

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.

Advanced Search


Closed Thread
 
LinkBack Thread Tools
Old 06-20-2008, 11:25 AM THREAD STARTER               #1 (permalink)
Munky Designs
Join Date: May 2005
Posts: 996
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
 



[PHP] file_get_contents and regex


Hey,

I have this terrible feeling i'm missing something really simple here.

I'm trying to get my head around file_get_contents, combined with regex, to get specific bits of data from web pages.

Lets say I want the about my part on my plurk page: http://www.plurk.com/user/Toddish

the source code is:

Code:
 <p id="about_me">
                    I'm a web designer from the UK who loves games, music, and films :)
<br>If you like any of my Plurks, feel free to add me!
                </p>
so, I tried this:
????: NamePros.com http://www.namepros.com/programming/483965-php-file_get_contents-and-regex.html

PHP Code:
<?php
$data 
file_get_contents('http://www.plurk.com/user/Toddish');
$regex '/<p id="about_me">[.*]<\/p>/';
preg_match($regex,$data,$match);
var_dump($match);
?>
but I get nothing.

any idea what I'm missing?

cheers, rep etc as usual
Albino is offline  
Old 06-20-2008, 12:01 PM   #2 (permalink)
NamePros Member
Join Date: Sep 2006
Posts: 99
Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough
 



You're not escaping any characters in the regex match.
The white space is also screws it up. If you really need that, there are workarounds, but (.*) excludes white space.
????: NamePros.com http://www.namepros.com/showthread.php?t=483965

Code:
$data = str_replace("\n", '', file_get_contents('http://www.plurk.com/user/Toddish'));
echo preg_match("/\<p id\=\"about_me\"\>(.*)<\/p\>/",$data,$match);
print_r($match);

Bruce

P.S. I think this is the forum for code snippets you're sharing with others; it's parent forum (just "Programming") is for code help. Though I may be wrong, I'm fairly new
Bruce_KD is offline  
Old 06-20-2008, 12:29 PM THREAD STARTER               #3 (permalink)
Munky Designs
Join Date: May 2005
Posts: 996
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
 



cheers, didn't realise you had to escape all of them as well.

Now I am getting all the text after it aswell. I tried to limit it via:

PHP Code:
/\<p id\=\"about_me\"\>(.*){1,400}<\/p\>/ 
????: NamePros.com http://www.namepros.com/showthread.php?t=483965
but I now get NULL.

cheers so far, rep added

oh, and it does look as though I mis clicked on the wrong forum, if a mod cold move it please
Albino is offline  
Old 06-20-2008, 01:25 PM   #4 (permalink)
NamePros Member
Join Date: Sep 2006
Posts: 99
Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough
 



You missed escaping the first tag on both p's and equals sign.
Try
Code:
/\<p id\=\"about_me\"\>(.{1,400})<\/p\>/

Bruce
Bruce_KD is offline  
Old 06-21-2008, 03:22 AM THREAD STARTER               #5 (permalink)
Munky Designs
Join Date: May 2005
Posts: 996
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
 



I can't even copy and paste properly, oh dear!

my actual code is:

PHP Code:
preg_match("/\<p id\=\"about_me\"\>(.*){1,1000}<\/p\>/",$data,$match); 
????: NamePros.com http://www.namepros.com/showthread.php?t=483965
Albino is offline  
Old 06-21-2008, 04:06 AM   #6 (permalink)
Senior Member
 
Barrucadu's Avatar
Join Date: Aug 2005
Location: East Yorkshire, England
Posts: 2,689
Barrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to beholdBarrucadu is a splendid one to behold
 




This will just match up until the first </p>, even across multiple lines:
PHP Code:
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match); 
????: NamePros.com http://www.namepros.com/showthread.php?t=483965
Barrucadu is offline  
Old 06-21-2008, 06:49 AM THREAD STARTER               #7 (permalink)
Munky Designs
Join Date: May 2005
Posts: 996
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
 



perfect, thanks
Albino is offline  
Old 06-21-2008, 12:10 PM THREAD STARTER               #8 (permalink)
Munky Designs
Join Date: May 2005
Posts: 996
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
 



sorry, another few small things

im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character? but i'm guessing it references the </p> instead, just not sure how.

Also, i need a .htaccess file to change some urls. So I have this:

Code:
RewriteEngine on
RewriteRule ^plurk/([a-z]+)/$ /plurk.php?user=$1
to turn /todd/plurk.php?user=Toddish to /todd/plurk/Toddish
????: NamePros.com http://www.namepros.com/showthread.php?t=483965

but this doesn't seem to work, I get a 404

any ideas?
Last edited by Albino; 06-21-2008 at 01:11 PM.
Albino is offline  
Old 06-22-2008, 03:23 PM   #9 (permalink)
NamePros Member
Join Date: Sep 2006
Posts: 99
Bruce_KD will become famous soon enoughBruce_KD will become famous soon enough
 



I got this working as follows:
Code:
RewriteEngine on
RewriteRule plurk/(.*) plurk.php?user=$1
I also tested Mikor's regex, and it does match across multiple lines.
If you want to throw a number of characters limit in, you can use this:
Code:
$data = file_get_contents('http://www.plurk.com/user/Toddish');
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match); 
print_r($match);

????: NamePros.com http://www.namepros.com/showthread.php?t=483965
Bruce
Bruce_KD is offline  
Old 06-23-2008, 05:45 AM   #10 (permalink)
NamePros Regular
 
qbert220's Avatar
Join Date: Jul 2007
Location: UK
Posts: 395
qbert220 is a splendid one to beholdqbert220 is a splendid one to beholdqbert220 is a splendid one to beholdqbert220 is a splendid one to beholdqbert220 is a splendid one to beholdqbert220 is a splendid one to beholdqbert220 is a splendid one to behold
 


Protect Our Planet
Originally Posted by Albino
im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character?
The ? has 2 different meanings, depending on context.
????: NamePros.com http://www.namepros.com/showthread.php?t=483965

After a normal character or expression is means one or zero occurrences of.

After a normally "greedy" operator it makes it non-greedy.

Greedy means that the operator will match as much as possible. * is normally greedy, so the if you just use .* it will match everything until the last </p> on the page. By making it non-greedy, it matches as little as it can. So it matches up to the next </p> on the page, i.e. up to the </p> at the end of the <p id="about_me"> paragraph.
Last edited by qbert220; 06-23-2008 at 05:48 AM.
qbert220 is offline  
Old 06-24-2008, 05:40 AM THREAD STARTER               #11 (permalink)
Munky Designs
Join Date: May 2005
Posts: 996
Albino is a jewel in the roughAlbino is a jewel in the roughAlbino is a jewel in the rough
 



cheers, bruce,works fine

and thanks for the explanation qbert, that helped a lot!
Albino is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


Liquid Web Smart Servers  
All times are GMT -7. The time now is 04:15 AM.

Managed Web Hosting by Liquid Web
Domain name forum recommended by Domaining.com Powered by: vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.6.0 Ad Management plugin by RedTyger