| | |||||
| ||||||||
| Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics. |
![]() |
| | LinkBack | Thread Tools |
| | THREAD STARTER #1 (permalink) |
| Munky Designs Join Date: May 2005
Posts: 996
![]() ![]() ![]() | [PHP] file_get_contents and regex Hey, I have this terrible feeling i'm missing something really simple here. I'm trying to get my head around file_get_contents, combined with regex, to get specific bits of data from web pages. Lets say I want the about my part on my plurk page: http://www.plurk.com/user/Toddish the source code is: Code: <p id="about_me">
I'm a web designer from the UK who loves games, music, and films :)
<br>If you like any of my Plurks, feel free to add me!
</p> ????: NamePros.com http://www.namepros.com/programming/483965-php-file_get_contents-and-regex.html PHP Code: any idea what I'm missing? cheers, rep etc as usual |
| |
| | #2 (permalink) |
| NamePros Member Join Date: Sep 2006
Posts: 99
![]() ![]() | You're not escaping any characters in the regex match. The white space is also screws it up. If you really need that, there are workarounds, but (.*) excludes white space. ????: NamePros.com http://www.namepros.com/showthread.php?t=483965 Code: $data = str_replace("\n", '', file_get_contents('http://www.plurk.com/user/Toddish'));
echo preg_match("/\<p id\=\"about_me\"\>(.*)<\/p\>/",$data,$match);
print_r($match); Bruce P.S. I think this is the forum for code snippets you're sharing with others; it's parent forum (just "Programming") is for code help. Though I may be wrong, I'm fairly new |
| |
| | THREAD STARTER #3 (permalink) |
| Munky Designs Join Date: May 2005
Posts: 996
![]() ![]() ![]() | cheers, didn't realise you had to escape all of them as well. Now I am getting all the text after it aswell. I tried to limit it via: PHP Code: cheers so far, rep added ![]() oh, and it does look as though I mis clicked on the wrong forum, if a mod cold move it please |
| |
| | THREAD STARTER #5 (permalink) |
| Munky Designs Join Date: May 2005
Posts: 996
![]() ![]() ![]() | I can't even copy and paste properly, oh dear! my actual code is: PHP Code: |
| |
| | #6 (permalink) |
| Senior Member Join Date: Aug 2005 Location: East Yorkshire, England
Posts: 2,689
![]() ![]() ![]() ![]() ![]() ![]() ![]() | This will just match up until the first </p>, even across multiple lines: PHP Code: |
| |
| | THREAD STARTER #8 (permalink) |
| Munky Designs Join Date: May 2005
Posts: 996
![]() ![]() ![]() | sorry, another few small things ![]() im not sure on the (.*?). the ? means 0 or 1, so surely that would mean 0 or 1 of any character? but i'm guessing it references the </p> instead, just not sure how. Also, i need a .htaccess file to change some urls. So I have this: Code: RewriteEngine on RewriteRule ^plurk/([a-z]+)/$ /plurk.php?user=$1 ????: NamePros.com http://www.namepros.com/showthread.php?t=483965 but this doesn't seem to work, I get a 404 any ideas?
__________________ Toddish.co.uk - Portfolio/Blog Powcomics.com - Webcomic Hosting/Directory Erant.co.uk - vent your rage!
Last edited by Albino; 06-21-2008 at 01:11 PM.
|
| |
| | #9 (permalink) |
| NamePros Member Join Date: Sep 2006
Posts: 99
![]() ![]() | I got this working as follows: Code: RewriteEngine on RewriteRule plurk/(.*) plurk.php?user=$1 If you want to throw a number of characters limit in, you can use this: Code: $data = file_get_contents('http://www.plurk.com/user/Toddish');
preg_match("/\<p id\=\"about_me\"\>(.*?)<\/p\>/s",$data,$match);
print_r($match); ????: NamePros.com http://www.namepros.com/showthread.php?t=483965 Bruce |
| |
| | #10 (permalink) | ||||
| NamePros Regular Join Date: Jul 2007 Location: UK
Posts: 395
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
????: NamePros.com http://www.namepros.com/showthread.php?t=483965 After a normal character or expression is means one or zero occurrences of. After a normally "greedy" operator it makes it non-greedy. Greedy means that the operator will match as much as possible. * is normally greedy, so the if you just use .* it will match everything until the last </p> on the page. By making it non-greedy, it matches as little as it can. So it matches up to the next </p> on the page, i.e. up to the </p> at the end of the <p id="about_me"> paragraph.
__________________
Last edited by qbert220; 06-23-2008 at 05:48 AM.
| ||||
| |